https://wiki.nikhef.nl/atlas/api.php?action=feedcontributions&user=Dgeerts%40nikhef.nl&feedformat=atomAtlas Wiki - User contributions [en]2024-03-28T22:47:55ZUser contributionsMediaWiki 1.35.3https://wiki.nikhef.nl/atlas/index.php?title=Stoomboot&diff=4823Stoomboot2013-08-15T20:35:13Z<p>Dgeerts@nikhef.nl: /* Questions, communication and announcements on stoomboot */ Fixed link to mailman page</p>
<hr />
<div>== What is stoomboot ==<br />
<br />
Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all<br />
NIKHEF users, but a login account does not give automatic access to stoomboot.<br />
Contact helpdesk@nikhef.nl to gain access<br />
<br />
=== Hardware ===<br />
<br />
Stoomboot consists of 32 nodes (<tt>stbc-01</tt> through <tt>stbc-32</tt>) that are each<br />
a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and<br />
16 Gb of memory. The total number of cores is 256.<br />
<br />
=== Software & disk access ===<br />
<br />
All stoomboot nodes run Scientific Linux 5. All NFS mountable disks<br />
at NIKHEF are visible (<tt>/project/*</tt> and <tt>/data/*</tt>), as well as all GlusterFS disks (<tt>/glusterfs/atlas*</tt>). Stoomboot does not run<br />
AFS so no AFS directories including <tt>/afs/cern.ch</tt> are not visible. This<br />
may indirectly impact you as certain experimental software installations<br />
attempt to access files on <tt>/afs/cern.ch</tt> ([[CVMFS]] is available for software installations). As stoomboot is intended as a local<br />
batch farm there are no plans to install AFS.<br />
<br />
== How to use stoomboot ==<br />
<br />
=== Submitting batch jobs ===<br />
<br />
Stoomboot is a batch-only facilities and jobs can be submitted through<br />
the PBS <tt>qsub</tt> command<br />
<br />
<pre><br />
unix> qsub test.sh<br />
9714.allier.nikhef.nl<br />
</pre><br />
<br />
The argument passed to <tt>qsub</tt> is a script that will be executed in your home<br />
directory. The returned string is the job identifier and can be used to look<br />
up the status of the job, or to manipulate it later. Jobs can be submitted from any <br />
linux desktop at nikhef as well as <tt>login.nikhef.nl</tt>. If you cannot submit jobs<br />
from your local desktop, contact <tt>helpdesk@nikhef.nl</tt> to have the batch client software installed.<br />
<br />
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings<br />
apply when you submit a batch job<br />
<br />
<br />
* Job runs in home directory (<tt>$HOME</tt>)<br />
* Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script<br />
* Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file <tt>test.sh.o9714</tt> contains stdout and file <tt>test.sh.e9714</tt> contains stderr. If there is no stdout or stderr, an empty file is created<br />
* A mail is sent to you the output files cannot be created<br />
<br />
<br />
Here is a listed of frequently desired changes in default behavior and their corresponding<br />
option in <tt>qsub</tt><br />
<br />
<br />
*''Merge stdout and stderr in a single file''. Add option <tt>-j oe</tt> to <tt>qsub</tt> command (single file <tt>*.o*</tt> is written)<br />
<br />
* ''Choose batch queue''. Right now there are five queues: <tt>stbcq</tt> (the default queue, 8 hours), <tt>express</tt> (10 min), <tt>short</tt> (4 hours), <tt>qlong</tt> (48h) and <tt>budget</tt> (low priority). Add option <tt>-q <queuename></tt> to qsub command<br />
<br />
* ''Choose different output file for stdout''. Add option <tt>-o <filename></tt> to <tt>qsub</tt> command<br />
<br />
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''. Add option <tt>-V</tt> to <tt>qsub</tt> command<br />
<br />
* '' Run the job on a specific stoomboot node ''. Add option <tt> -l host=stbc-XX</tt> to the <tt> qsub</tt> command line.<br />
<br />
<br />
A full list of options can be obtained from <tt>man qsub</tt><br />
<br />
=== Examining the status of your jobs ===<br />
<br />
The <tt>qstat -u <username></tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.<br />
<br />
<pre><br />
unix> qstat<br />
Job id Name User Time Use S Queue<br />
------------------- ---------------- --------------- -------- - -----<br />
9714.allier test.sh verkerke 00:00:00 C test<br />
</pre><br />
<br />
This <tt>qstat</tt> command only shows your own jobs, not those of other users.<br />
Only completed jobs that completed less than 10 minutes ago are listed with status 'C'.<br />
Output of jobs that completed longer ago is kept, but they are simply no longer<br />
listed in the status overview.<br />
<br />
To see activity of other users on the system you can use the lower-level maui command<br />
<tt>showq</tt> which will show jobs of all users. The <tt>showq</tt> command works without<br />
arguments on <tt>login</tt>, on any other host add <tt>--host=allier</tt> to run it successfully.<br />
<br />
The general level of activity on stoomboot is graphically monitored in this location<br />
http://www.nikhef.nl/grid/stats/stbc/<br />
<br />
== Common practical issues in stoomboot user ==<br />
<br />
=== LSF job submission emulator ===<br />
<br />
The interface of the PBS batch system is notably different from the LSF batch system<br />
that is run at e.g. CERN and SLAC. One of the convenient features of LSF <tt>bsub</tt> is that the<br />
user does not need to write a script for every batch job, but that a command line that<br />
is passed to <tt>bsub</tt> is executed. An emulator is available for the LSF <tt>bsub</tt><br />
command that submits a job that executes the <tt>bsub</tt> command line in the present<br />
working directory and the complete present envirnment. For example one can do<br />
<br />
<pre><br />
bsub ls -l <br />
</pre><br />
<br />
which will submit a batch job that executes <tt>ls -l</tt> in the working directory<br />
from which the <tt>bsub</tt> command was executed. This script expressly allows the<br />
user to setup e.g. the complete ATLAS software environment in a shell on the local desktop <br />
and then substitute local desktop running of an ATLAS software job with a batch-run job by prefixing <tt>bsub</tt> to the executed command line. The scope of the LSF <tt>bsub</tt> emulator is limited to<br />
its ability to execute the command line in batch in an identical environment. It does<br />
not emulate the various command line flags of LSF <tt>bsub</tt>. You can find the <tt>bsub</tt><br />
emulator for now in <tt>~verkerke/bin/bsub</tt><br />
<br />
<br />
=== Suggestions for debugging and trouble shooting ===<br />
<br />
If you want to debug a problem that occurs on a stoomboot batch job, or you want to make a short trial run for a larger series of batch jobs there are two ways to gain interactive login access to stoomboot.<br />
<br />
* You can directly login to nodes stbc-i1 through stbc-i4 (these nodes ''only'') to test and/or debug your problem. You should try to keep CPU consumption and testing time to a minimum, and run your real jobs through <tt>qsub</tt> on the actual nodes.<br />
<br />
* You can request an 'interactive' batch job through <tt>qsub -q qlong -X -I</tt>. In this mode you can consume as much CPU resources as the queue that the interactive job was submitted to allows. The 'look and feel' of interactive batch jobs is nearly identical to that of <tt>ssh</tt>. The main exception is that when no free job slot is available the <tt>qsub</tt> command will hang until one becomes available.<br />
<br />
=== Scratch disk usages and NFS disk access ===<br />
<br />
When running on stoomboot please be sure to locate all local 'scratch' files to the directory pointed to by the environment variable <tt>$TMPDIR</tt> and ''not'' <tt>/tmp</tt>. The latter is very small (a few Gb) and when filled up will give all kinds of problems for you and other users. The disk pointed to by <tt>$TMPDIR</tt> is typically 200 Gb. Also here be sure to clean up when your job ends to avoid filling up these disk as well.<br />
<br />
When accessing NFS mounted disks (<tt>/project/*</tt>, <tt>/data/*</tt>) please keep in mind that the network bandwidth between stoomboot nodes and the NFS server is limited and that the NFS server capacity is also limited. Running e.g. 50 jobs that read from or write to files on NFS disks at a high rate ('ntuple analysis') may result in poor performance of both the NFS server and your jobs.<br />
<br />
=== Scheduling policies and CPU quota ===<br />
<br />
This section is sensitive to changes as scheduling policies and quota allocation are still evolving.<br />
At the time of writing (December 2008) each group (atlas,bphys etc...) is allowed to use at most 96 run slots (i.e. 75% of the available capacity, this is the hard limit). When the system is 'busy', as determined by the maui scheduler a lower soft limit of 64 run slots is enforced (50% of the capacity). Each individual user is entitled to use all run slots of his group. To see what policy prevents your queued jobs from running use the <tt>checkjob <jobid></tt> command.<br />
<br />
<br />
== Questions, communication and announcements on stoomboot ==<br />
<br />
To ask questions and to receive announcements on stoomboot operations, subscribe<br />
to the stoomboot users mailing list (stbc-users@nikhef.nl). To subscribe yourself<br />
to this list go to https://mailman.nikhef.nl/mailman/listinfo/stbc-users.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Stoomboot&diff=3710Stoomboot2013-06-25T13:54:09Z<p>Dgeerts@nikhef.nl: /* Examining the status of your jobs */ Small updates</p>
<hr />
<div>== What is stoomboot ==<br />
<br />
Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all<br />
NIKHEF users, but a login account does not give automatic access to stoomboot.<br />
Contact helpdesk@nikhef.nl to gain access<br />
<br />
=== Hardware ===<br />
<br />
Stoomboot consists of 32 nodes (<tt>stbc-01</tt> through <tt>stbc-32</tt>) that are each<br />
a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and<br />
16 Gb of memory. The total number of cores is 256.<br />
<br />
=== Software & disk access ===<br />
<br />
All stoomboot nodes run Scientific Linux 5. All NFS mountable disks<br />
at NIKHEF are visible (<tt>/project/*</tt> and <tt>/data/*</tt>), as well as all GlusterFS disks (<tt>/glusterfs/atlas*</tt>). Stoomboot does not run<br />
AFS so no AFS directories including <tt>/afs/cern.ch</tt> are not visible. This<br />
may indirectly impact you as certain experimental software installations<br />
attempt to access files on <tt>/afs/cern.ch</tt> ([[CVMFS]] is available for software installations). As stoomboot is intended as a local<br />
batch farm there are no plans to install AFS.<br />
<br />
== How to use stoomboot ==<br />
<br />
=== Submitting batch jobs ===<br />
<br />
Stoomboot is a batch-only facilities and jobs can be submitted through<br />
the PBS <tt>qsub</tt> command<br />
<br />
<pre><br />
unix> qsub test.sh<br />
9714.allier.nikhef.nl<br />
</pre><br />
<br />
The argument passed to <tt>qsub</tt> is a script that will be executed in your home<br />
directory. The returned string is the job identifier and can be used to look<br />
up the status of the job, or to manipulate it later. Jobs can be submitted from any <br />
linux desktop at nikhef as well as <tt>login.nikhef.nl</tt>. If you cannot submit jobs<br />
from your local desktop, contact <tt>helpdesk@nikhef.nl</tt> to have the batch client software installed.<br />
<br />
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings<br />
apply when you submit a batch job<br />
<br />
<br />
* Job runs in home directory (<tt>$HOME</tt>)<br />
* Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script<br />
* Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file <tt>test.sh.o9714</tt> contains stdout and file <tt>test.sh.e9714</tt> contains stderr. If there is no stdout or stderr, an empty file is created<br />
* A mail is sent to you the output files cannot be created<br />
<br />
<br />
Here is a listed of frequently desired changes in default behavior and their corresponding<br />
option in <tt>qsub</tt><br />
<br />
<br />
*''Merge stdout and stderr in a single file''. Add option <tt>-j oe</tt> to <tt>qsub</tt> command (single file <tt>*.o*</tt> is written)<br />
<br />
* ''Choose batch queue''. Right now there are five queues: <tt>stbcq</tt> (the default queue, 8 hours), <tt>express</tt> (10 min), <tt>short</tt> (4 hours), <tt>qlong</tt> (48h) and <tt>budget</tt> (low priority). Add option <tt>-q <queuename></tt> to qsub command<br />
<br />
* ''Choose different output file for stdout''. Add option <tt>-o <filename></tt> to <tt>qsub</tt> command<br />
<br />
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''. Add option <tt>-V</tt> to <tt>qsub</tt> command<br />
<br />
* '' Run the job on a specific stoomboot node ''. Add option <tt> -l host=stbc-XX</tt> to the <tt> qsub</tt> command line.<br />
<br />
<br />
A full list of options can be obtained from <tt>man qsub</tt><br />
<br />
=== Examining the status of your jobs ===<br />
<br />
The <tt>qstat -u <username></tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.<br />
<br />
<pre><br />
unix> qstat<br />
Job id Name User Time Use S Queue<br />
------------------- ---------------- --------------- -------- - -----<br />
9714.allier test.sh verkerke 00:00:00 C test<br />
</pre><br />
<br />
This <tt>qstat</tt> command only shows your own jobs, not those of other users.<br />
Only completed jobs that completed less than 10 minutes ago are listed with status 'C'.<br />
Output of jobs that completed longer ago is kept, but they are simply no longer<br />
listed in the status overview.<br />
<br />
To see activity of other users on the system you can use the lower-level maui command<br />
<tt>showq</tt> which will show jobs of all users. The <tt>showq</tt> command works without<br />
arguments on <tt>login</tt>, on any other host add <tt>--host=allier</tt> to run it successfully.<br />
<br />
The general level of activity on stoomboot is graphically monitored in this location<br />
http://www.nikhef.nl/grid/stats/stbc/<br />
<br />
== Common practical issues in stoomboot user ==<br />
<br />
=== LSF job submission emulator ===<br />
<br />
The interface of the PBS batch system is notably different from the LSF batch system<br />
that is run at e.g. CERN and SLAC. One of the convenient features of LSF <tt>bsub</tt> is that the<br />
user does not need to write a script for every batch job, but that a command line that<br />
is passed to <tt>bsub</tt> is executed. An emulator is available for the LSF <tt>bsub</tt><br />
command that submits a job that executes the <tt>bsub</tt> command line in the present<br />
working directory and the complete present envirnment. For example one can do<br />
<br />
<pre><br />
bsub ls -l <br />
</pre><br />
<br />
which will submit a batch job that executes <tt>ls -l</tt> in the working directory<br />
from which the <tt>bsub</tt> command was executed. This script expressly allows the<br />
user to setup e.g. the complete ATLAS software environment in a shell on the local desktop <br />
and then substitute local desktop running of an ATLAS software job with a batch-run job by prefixing <tt>bsub</tt> to the executed command line. The scope of the LSF <tt>bsub</tt> emulator is limited to<br />
its ability to execute the command line in batch in an identical environment. It does<br />
not emulate the various command line flags of LSF <tt>bsub</tt>. You can find the <tt>bsub</tt><br />
emulator for now in <tt>~verkerke/bin/bsub</tt><br />
<br />
<br />
=== Suggestions for debugging and trouble shooting ===<br />
<br />
If you want to debug a problem that occurs on a stoomboot batch job, or you want to make a short trial run for a larger series of batch jobs there are two ways to gain interactive login access to stoomboot.<br />
<br />
* You can directly login to nodes stbc-i1 through stbc-i4 (these nodes ''only'') to test and/or debug your problem. You should try to keep CPU consumption and testing time to a minimum, and run your real jobs through <tt>qsub</tt> on the actual nodes.<br />
<br />
* You can request an 'interactive' batch job through <tt>qsub -q qlong -X -I</tt>. In this mode you can consume as much CPU resources as the queue that the interactive job was submitted to allows. The 'look and feel' of interactive batch jobs is nearly identical to that of <tt>ssh</tt>. The main exception is that when no free job slot is available the <tt>qsub</tt> command will hang until one becomes available.<br />
<br />
=== Scratch disk usages and NFS disk access ===<br />
<br />
When running on stoomboot please be sure to locate all local 'scratch' files to the directory pointed to by the environment variable <tt>$TMPDIR</tt> and ''not'' <tt>/tmp</tt>. The latter is very small (a few Gb) and when filled up will give all kinds of problems for you and other users. The disk pointed to by <tt>$TMPDIR</tt> is typically 200 Gb. Also here be sure to clean up when your job ends to avoid filling up these disk as well.<br />
<br />
When accessing NFS mounted disks (<tt>/project/*</tt>, <tt>/data/*</tt>) please keep in mind that the network bandwidth between stoomboot nodes and the NFS server is limited and that the NFS server capacity is also limited. Running e.g. 50 jobs that read from or write to files on NFS disks at a high rate ('ntuple analysis') may result in poor performance of both the NFS server and your jobs.<br />
<br />
=== Scheduling policies and CPU quota ===<br />
<br />
This section is sensitive to changes as scheduling policies and quota allocation are still evolving.<br />
At the time of writing (December 2008) each group (atlas,bphys etc...) is allowed to use at most 96 run slots (i.e. 75% of the available capacity, this is the hard limit). When the system is 'busy', as determined by the maui scheduler a lower soft limit of 64 run slots is enforced (50% of the capacity). Each individual user is entitled to use all run slots of his group. To see what policy prevents your queued jobs from running use the <tt>checkjob <jobid></tt> command.<br />
<br />
<br />
== Questions, communication and announcements on stoomboot ==<br />
<br />
To ask questions and to receive announcements on stoomboot operations, subscribe<br />
to the stoomboot users mailing list (stbc-users@nikhef.nl). To subscribe yourself<br />
to this list go to https://mailman.nikhef.nl/cgi-bin/listinfo/stbc-users.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Stoomboot&diff=3706Stoomboot2013-06-25T13:49:54Z<p>Dgeerts@nikhef.nl: Small updates</p>
<hr />
<div>== What is stoomboot ==<br />
<br />
Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all<br />
NIKHEF users, but a login account does not give automatic access to stoomboot.<br />
Contact helpdesk@nikhef.nl to gain access<br />
<br />
=== Hardware ===<br />
<br />
Stoomboot consists of 32 nodes (<tt>stbc-01</tt> through <tt>stbc-32</tt>) that are each<br />
a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and<br />
16 Gb of memory. The total number of cores is 256.<br />
<br />
=== Software & disk access ===<br />
<br />
All stoomboot nodes run Scientific Linux 5. All NFS mountable disks<br />
at NIKHEF are visible (<tt>/project/*</tt> and <tt>/data/*</tt>), as well as all GlusterFS disks (<tt>/glusterfs/atlas*</tt>). Stoomboot does not run<br />
AFS so no AFS directories including <tt>/afs/cern.ch</tt> are not visible. This<br />
may indirectly impact you as certain experimental software installations<br />
attempt to access files on <tt>/afs/cern.ch</tt> ([[CVMFS]] is available for software installations). As stoomboot is intended as a local<br />
batch farm there are no plans to install AFS.<br />
<br />
== How to use stoomboot ==<br />
<br />
=== Submitting batch jobs ===<br />
<br />
Stoomboot is a batch-only facilities and jobs can be submitted through<br />
the PBS <tt>qsub</tt> command<br />
<br />
<pre><br />
unix> qsub test.sh<br />
9714.allier.nikhef.nl<br />
</pre><br />
<br />
The argument passed to <tt>qsub</tt> is a script that will be executed in your home<br />
directory. The returned string is the job identifier and can be used to look<br />
up the status of the job, or to manipulate it later. Jobs can be submitted from any <br />
linux desktop at nikhef as well as <tt>login.nikhef.nl</tt>. If you cannot submit jobs<br />
from your local desktop, contact <tt>helpdesk@nikhef.nl</tt> to have the batch client software installed.<br />
<br />
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings<br />
apply when you submit a batch job<br />
<br />
<br />
* Job runs in home directory (<tt>$HOME</tt>)<br />
* Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script<br />
* Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file <tt>test.sh.o9714</tt> contains stdout and file <tt>test.sh.e9714</tt> contains stderr. If there is no stdout or stderr, an empty file is created<br />
* A mail is sent to you the output files cannot be created<br />
<br />
<br />
Here is a listed of frequently desired changes in default behavior and their corresponding<br />
option in <tt>qsub</tt><br />
<br />
<br />
*''Merge stdout and stderr in a single file''. Add option <tt>-j oe</tt> to <tt>qsub</tt> command (single file <tt>*.o*</tt> is written)<br />
<br />
* ''Choose batch queue''. Right now there are five queues: <tt>stbcq</tt> (the default queue, 8 hours), <tt>express</tt> (10 min), <tt>short</tt> (4 hours), <tt>qlong</tt> (48h) and <tt>budget</tt> (low priority). Add option <tt>-q <queuename></tt> to qsub command<br />
<br />
* ''Choose different output file for stdout''. Add option <tt>-o <filename></tt> to <tt>qsub</tt> command<br />
<br />
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''. Add option <tt>-V</tt> to <tt>qsub</tt> command<br />
<br />
* '' Run the job on a specific stoomboot node ''. Add option <tt> -l host=stbc-XX</tt> to the <tt> qsub</tt> command line.<br />
<br />
<br />
A full list of options can be obtained from <tt>man qsub</tt><br />
<br />
=== Examining the status of your jobs ===<br />
<br />
The <tt>qstat</tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.<br />
<br />
<pre><br />
unix> qstat<br />
Job id Name User Time Use S Queue<br />
------------------- ---------------- --------------- -------- - -----<br />
9714.allier test.sh verkerke 00:00:00 C test<br />
</pre><br />
<br />
The <tt>qstat</tt> command only shows your own jobs, not those of other users.<br />
Only completed jobs that completed less than 10 minutes ago are listsed with status 'C'.<br />
Output of jobs that completed longer ago is kept, but they are simply no longer<br />
listed in the status overview.<br />
<br />
To see activity of other users on the system you can use the lower-level maui command<br />
<tt>showq</tt> which will show jobs of all users. The <tt>showq</tt> command works without<br />
arguments on <tt>login</tt>, on any other host add <tt>--host=allier</tt> to run it successfully.<br />
<br />
The general level of activity on stoomboot is graphically monitored in this location<br />
http://www.nikhef.nl/grid/stats/stbc/<br />
<br />
== Common practical issues in stoomboot user ==<br />
<br />
=== LSF job submission emulator ===<br />
<br />
The interface of the PBS batch system is notably different from the LSF batch system<br />
that is run at e.g. CERN and SLAC. One of the convenient features of LSF <tt>bsub</tt> is that the<br />
user does not need to write a script for every batch job, but that a command line that<br />
is passed to <tt>bsub</tt> is executed. An emulator is available for the LSF <tt>bsub</tt><br />
command that submits a job that executes the <tt>bsub</tt> command line in the present<br />
working directory and the complete present envirnment. For example one can do<br />
<br />
<pre><br />
bsub ls -l <br />
</pre><br />
<br />
which will submit a batch job that executes <tt>ls -l</tt> in the working directory<br />
from which the <tt>bsub</tt> command was executed. This script expressly allows the<br />
user to setup e.g. the complete ATLAS software environment in a shell on the local desktop <br />
and then substitute local desktop running of an ATLAS software job with a batch-run job by prefixing <tt>bsub</tt> to the executed command line. The scope of the LSF <tt>bsub</tt> emulator is limited to<br />
its ability to execute the command line in batch in an identical environment. It does<br />
not emulate the various command line flags of LSF <tt>bsub</tt>. You can find the <tt>bsub</tt><br />
emulator for now in <tt>~verkerke/bin/bsub</tt><br />
<br />
<br />
=== Suggestions for debugging and trouble shooting ===<br />
<br />
If you want to debug a problem that occurs on a stoomboot batch job, or you want to make a short trial run for a larger series of batch jobs there are two ways to gain interactive login access to stoomboot.<br />
<br />
* You can directly login to nodes stbc-i1 through stbc-i4 (these nodes ''only'') to test and/or debug your problem. You should try to keep CPU consumption and testing time to a minimum, and run your real jobs through <tt>qsub</tt> on the actual nodes.<br />
<br />
* You can request an 'interactive' batch job through <tt>qsub -q qlong -X -I</tt>. In this mode you can consume as much CPU resources as the queue that the interactive job was submitted to allows. The 'look and feel' of interactive batch jobs is nearly identical to that of <tt>ssh</tt>. The main exception is that when no free job slot is available the <tt>qsub</tt> command will hang until one becomes available.<br />
<br />
=== Scratch disk usages and NFS disk access ===<br />
<br />
When running on stoomboot please be sure to locate all local 'scratch' files to the directory pointed to by the environment variable <tt>$TMPDIR</tt> and ''not'' <tt>/tmp</tt>. The latter is very small (a few Gb) and when filled up will give all kinds of problems for you and other users. The disk pointed to by <tt>$TMPDIR</tt> is typically 200 Gb. Also here be sure to clean up when your job ends to avoid filling up these disk as well.<br />
<br />
When accessing NFS mounted disks (<tt>/project/*</tt>, <tt>/data/*</tt>) please keep in mind that the network bandwidth between stoomboot nodes and the NFS server is limited and that the NFS server capacity is also limited. Running e.g. 50 jobs that read from or write to files on NFS disks at a high rate ('ntuple analysis') may result in poor performance of both the NFS server and your jobs.<br />
<br />
=== Scheduling policies and CPU quota ===<br />
<br />
This section is sensitive to changes as scheduling policies and quota allocation are still evolving.<br />
At the time of writing (December 2008) each group (atlas,bphys etc...) is allowed to use at most 96 run slots (i.e. 75% of the available capacity, this is the hard limit). When the system is 'busy', as determined by the maui scheduler a lower soft limit of 64 run slots is enforced (50% of the capacity). Each individual user is entitled to use all run slots of his group. To see what policy prevents your queued jobs from running use the <tt>checkjob <jobid></tt> command.<br />
<br />
<br />
== Questions, communication and announcements on stoomboot ==<br />
<br />
To ask questions and to receive announcements on stoomboot operations, subscribe<br />
to the stoomboot users mailing list (stbc-users@nikhef.nl). To subscribe yourself<br />
to this list go to https://mailman.nikhef.nl/cgi-bin/listinfo/stbc-users.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Stoomboot&diff=3705Stoomboot2013-06-19T08:31:58Z<p>Dgeerts@nikhef.nl: /* Software & disk access */ Slightly updated info</p>
<hr />
<div>== What is stoomboot ==<br />
<br />
Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all<br />
NIKHEF users, but a login account does not give automatic access to stoomboot.<br />
Contact helpdesk@nikhef.nl to gain access<br />
<br />
=== Hardware ===<br />
<br />
Stoomboot consists of 32 nodes (<tt>stbc-01</tt> through <tt>stbc-32</tt>) that are each<br />
a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and<br />
16 Gb of memory. The total number of cores is 256.<br />
<br />
=== Software & disk access ===<br />
<br />
All stoomboot nodes run Scientific Linux 5. All NFS mountable disks<br />
at NIKHEF are visible (<tt>/project/*</tt> and <tt>/data/*</tt>), as well as all GlusterFS disks (<tt>/glusterfs/atlas*</tt>). Stoomboot does not run<br />
AFS so no AFS directories including <tt>/afs/cern.ch</tt> are not visible. This<br />
may indirectly impact you as certain experimental software installations<br />
attempt to access files on <tt>/afs/cern.ch</tt> ([[CVMFS]] is available for software installations). As stoomboot is intended as a local<br />
batch farm there are no plans to install AFS.<br />
<br />
== How to use stoomboot ==<br />
<br />
=== Submitting batch jobs ===<br />
<br />
Stoomboot is a batch-only facilities and jobs can be submitted through<br />
the PBS <tt>qsub</tt> command<br />
<br />
<pre><br />
unix> qsub test.sh<br />
9714.allier.nikhef.nl<br />
</pre><br />
<br />
The argument passed to <tt>qsub</tt> is a script that will be executed in your home<br />
directory. The returned string is the job identifier and can be used to look<br />
up the status of the job, or to manipulate it later. Jobs can be submitted from any <br />
linux desktop at nikhef as well as <tt>login.nikhef.nl</tt>. If you cannot submit jobs<br />
from your local desktop, contact <tt>helpdesk@nikhef.nl</tt> to have the batch client software installed.<br />
<br />
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings<br />
apply when you submit a batch job<br />
<br />
<br />
* Job runs in home directory (<tt>$HOME</tt>)<br />
* Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script<br />
* Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file <tt>test.sh.o9714</tt> contains stdout and file <tt>test.sh.e9714</tt> contains stderr. If there is no stdout or stderr, an empty file is created<br />
* A mail is sent to you the output files cannot be created<br />
<br />
<br />
Here is a listed of frequently desired changes in default behavior and their corresponding<br />
option in <tt>qsub</tt><br />
<br />
<br />
*''Merge stdout and stderr in a single file''. Add option <tt>-j oe</tt> to <tt>qsub</tt> command (single file <tt>*.o*</tt> is written)<br />
<br />
* ''Choose batch queue''. Right now there are two queues: <tt>test</tt> (30 min) and <tt>qlong</tt> (48h) Add option <tt>-q <queuename></tt> to qsub command<br />
<br />
* ''Choose different output file for stdout''. Add option <tt>-o <filename></tt> to <tt>qsub</tt> command<br />
<br />
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''. Add option <tt>-V</tt> to <tt>qsub</tt> command<br />
<br />
* '' Run the job on a specific stoomboot node ''. Add option <tt> -l host=stbc-XX</tt> to the <tt> qsub</tt> command line.<br />
<br />
<br />
A full list of options can be obtained from <tt>man qsub</tt><br />
<br />
=== Examining the status of your jobs ===<br />
<br />
The <tt>qstat</tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.<br />
<br />
<pre><br />
unix> qstat<br />
Job id Name User Time Use S Queue<br />
------------------- ---------------- --------------- -------- - -----<br />
9714.allier test.sh verkerke 00:00:00 C test<br />
</pre><br />
<br />
The <tt>qstat</tt> command only shows your own jobs, not those of other users.<br />
Only completed jobs that completed less than 10 minutes ago are listsed with status 'C'.<br />
Output of jobs that completed longer ago is kept, but they are simply no longer<br />
listed in the status overview.<br />
<br />
To see activity of other users on the system you can use the lower-level maui command<br />
<tt>showq</tt> which will show jobs of all users. The <tt>showq</tt> command works without<br />
arguments on <tt>login</tt>, on any other host add <tt>--host=allier</tt> to run it successfully.<br />
<br />
The general level of activity on stoomboot is graphically monitored in this location<br />
http://www.nikhef.nl/grid/stats/stbc/<br />
<br />
== Common practical issues in stoomboot user ==<br />
<br />
=== LSF job submission emulator ===<br />
<br />
The interface of the PBS batch system is notably different from the LSF batch system<br />
that is run at e.g. CERN and SLAC. One of the convenient features of LSF <tt>bsub</tt> is that the<br />
user does not need to write a script for every batch job, but that a command line that<br />
is passed to <tt>bsub</tt> is executed. An emulator is available for the LSF <tt>bsub</tt><br />
command that submits a job that executes the <tt>bsub</tt> command line in the present<br />
working directory and the complete present envirnment. For example one can do<br />
<br />
<pre><br />
bsub ls -l <br />
</pre><br />
<br />
which will submit a batch job that executes <tt>ls -l</tt> in the working directory<br />
from which the <tt>bsub</tt> command was executed. This script expressly allows the<br />
user to setup e.g. the complete ATLAS software environment in a shell on the local desktop <br />
and then substitute local desktop running of an ATLAS software job with a batch-run job by prefixing <tt>bsub</tt> to the executed command line. The scope of the LSF <tt>bsub</tt> emulator is limited to<br />
its ability to execute the command line in batch in an identical environment. It does<br />
not emulate the various command line flags of LSF <tt>bsub</tt>. You can find the <tt>bsub</tt><br />
emulator for now in <tt>~verkerke/bin/bsub</tt><br />
<br />
<br />
=== Suggestions for debugging and trouble shooting ===<br />
<br />
If you want to debug a problem that occurs on a stoomboot batch job, or you want to make a short trial run for a larger series of batch jobs there are two ways to gain interactive login access to stoomboot.<br />
<br />
* You can directly login to node stbc-32 (this node ''only'') to test and/or debug your problem. You should keep CPU consumption and testing time to a minimum as regularly scheduled batch jobs run on this machine too.<br />
<br />
* You can request an 'interactive' batch job through <tt>qsub -q qlong -X -I</tt>. In this mode you can consume as much CPU resources as the queue that the interactive job was submitted to allows. The 'look and feel' of interactive bacth jobs is nearly identical to that of <tt>ssh</tt>. The main exception is that when no free job slot is available the <tt>qsub</tt> command will hang until one becomes available.<br />
<br />
=== Scratch disk usages and NFS disk access ===<br />
<br />
When running on stoomboot please be sure to locate all local 'scratch' files to the directory pointed to by the environment variable <tt>$TMPDIR</tt> and ''not'' <tt>/tmp</tt>. The latter is very small (a few Gb) and when filled up will give all kinds of problems for you and other users. The disk pointed to by <tt>$TMPDIR</tt> is typically 200 Gb. Also here be sure to clean up when your job ends to avoid filling up these disk as well.<br />
<br />
When accessing NFS mounted disks (<tt>/project/*</tt>, <tt>/data/*</tt>) please keep in mind that the network bandwidth between stoomboot nodes and the NFS server is limited and that the NFS server capacity is also limited. Running e.g. 50 jobs that read from or write to files on NFS disks at a high rate ('ntuple analysis') may result in poor performance of both the NFS server and your jobs.<br />
<br />
=== Scheduling policies and CPU quota ===<br />
<br />
This section is sensitive to changes as scheduling policies and quota allocation are still evolving.<br />
At the time of writing (December 2008) each group (atlas,bphys etc...) is allowed to use at most 96 run slots (i.e. 75% of the available capacity, this is the hard limit). When the system is 'busy', as determined by the maui scheduler a lower soft limit of 64 run slots is enforced (50% of the capacity). Each individual user is entitled to use all run slots of his group. To see what policy prevents your queued jobs from running use the <tt>checkjob <jobid></tt> command.<br />
<br />
<br />
== Questions, communication and announcements on stoomboot ==<br />
<br />
To ask questions and to receive announcements on stoomboot operations, subscribe<br />
to the stoomboot users mailing list (stbc-users@nikhef.nl). To subscribe yourself<br />
to this list go to https://mailman.nikhef.nl/cgi-bin/listinfo/stbc-users.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Software&diff=3604Software2012-09-14T07:31:09Z<p>Dgeerts@nikhef.nl: Fixed a typo</p>
<hr />
<div>==Atlas Software Pages==<br />
<br />
Here you will find the pages on software issues, specific for Atlas. In the past we tried to keep up with a number of pages, but unfortunately they have been outdated by now. Nevertheless you can have a look at them [http://www.nikhef.nl/pub/experiments/atlas/software/ here]<br />
<br />
<br />
----<br />
'''External software pages:'''<br />
*[http://www.nikhef.nl/~barison/Production.php Marcello Barisonzi Single top production page]<br />
*[http://www.nikhef.nl/~stanb/Production.php Stan's DC2 production page]<br />
<br />
----<br />
'''Tutorials'''<br />
*[[Using Athena at Nikhef]]<br />
*[[CVMFS|Using CVMFS at Nikhef]]<br />
*[[NIKHEF_user_analysis_tutorial_2010|NIKHEF user analysis tutorial 2010]]<br />
*[[Ganga/AMA tutorial September 2008|Ganga/AMA tutorial September 2008]]<br />
*[[wouter_aod_example|Wouters ttbar AOD analysis example]]<br />
*[[MCatNLO_howto|MCatNLO howto for Rome]]<br />
*[http://www.nikhef.nl/~barison/AOD_Tutorial.ppt Marcellos AOD tutorial (ppt)]<br />
*[[Generating_Higgs_To_4_Muons_at_NIKHEF|Generating_Higgs_To_4_Muons_at_NIKHEF]]<br />
*[[Generating_Higgs_Events_on_the_grid|Generating_Higgs_Events_on_the_grid]]<br />
*[[FullChain_on_the_grid|FullChain_on_the_grid]]<br />
*[[Event display tutorial|Event display tutorial]]<br />
*[[MuonCalib_Tutorial|Muon calibration tutorial]]<br />
*[[Combined Reconstruction Recipes|Cosmic muon Reconstruction Recipes]]<br />
*[[athena_workshop|ATHENA workshop October 2006]]<br />
<br />
----<br />
'''Rome Data Sample (NIKHEF):'''<br />
*[[NIKHEF_Rome_MCatNLO_Samples| Location ttbar files at NIKHEF for Rome meeting]]<br />
*[https://uimon.cern.ch/twiki/bin/view/Atlas/RomeListOfSamples CERN Wiki for Rome data]<br />
*[http://atlfarm003.mi.infn.it/%7Enegri/rome_dataset.htm List of GRID jobs]<br />
*[http://phyweb.lbl.gov/AOD/10.0.1/ Web repository of reconstructed data]<br />
*[[aod_ntuple| Contents of the AOD-based Root ntuple (Woutuple)]]<br />
*[[ntuple_analysis_skeleton| Analyse the Ntuple (skeleton analysis)]]<br />
<br />
----<br />
'''TTbar analysis'''<br />
*[[location_datasample| Rome data sample (MCatNLO-input/Generator/AOD/Ntuples)]]<br />
*[[wouter_aod_example| TTbar AOD analysis & producing the TTbar Ntuple from an AOD]]<br />
*[[ttbar_analysis_skeleton| TTbar Ntuple Analysis Skeleton]]<br />
*[[producing_Woutuple_from_AOD | Producing a TTbar Ntuple from an AOD]]<br />
<br />
----<br />
'''CSC Data Analysis (NIKHEF):'''<br />
*[[Setting_up_1206 | Setting up Athena 12.0.7 and TopView ]]<br />
*[[TVModularAnalysis | TVModularAnalysis ]]<br />
*[[12.0.6_TopViewNtuples_and_AOD's_at_NIKHEF | 12.0.6 TopViewNtuples and AOD's at NIKHEF ]]<br />
*[[Location_Ntuples_and_AOD's_at_NIKHEF | Location Ntuples and AOD's at NIKHEF ]] (outdated)<br />
*[[CSC_PhysicsSamples| Location CSC files at NIKHEF]] (outdated)<br />
*[[Producing_CSC_Ntuples_From_AOD_11 | Producing CSC Ntuples from AOD (release 11)]] (outdated)<br />
*[[Analysing_CSC_Ntuples_From_AOD_11 | Analyzing CSC Ntuples from AOD (release 11)]] (outdated)<br />
----<br />
'''Athena release 13 (Nikhef):'''<br />
*[[Setting_up_13.0.30 at Nikhef | Setting up Athena 13.0.30 with slc3 and slc4 at Nikhef]]<br />
----<br />
'''Stoomboot (local cluster)'''<br />
* See: [[Stoomboot]]<br />
----<br />
'''Grid'''<br />
<!-- *[[Using_DQ2_at_NIKHEF | Using DQ2 at NIKHEF ]] --><br />
<!-- *[[Using_ganga_at_NIKHEF | Using ganga at NIKHEF ]] --><br />
*[[StructNtuple_making |StructNtuple making]]<br />
*[[DQ2Client quick usage | DQ2Client quick usage]]<br />
*[[Ganga_basic_usage | Ganga: basic usage]]<br />
*[[Ganga_with_AMAAthena | Ganga: running AMAAthena]]<br />
*[[Using_GANGA_with_AMAAthena | Using GANGA with AMAAthena]] (outdated)<br />
*[[Access_to_data_at_NIKHEF_tier | Access to data at NIKHEF Tier ]]<br />
*[[CSC_Ntuple_production | Producing CSC ntuples ]]<br />
*[[Renew_Grid_Certificate | Renewing your Grid Certificate(s) ]]<br />
*[[Using_atlas_nl_group_membership | Using <tt>/atlas/nl</tt> group membership for proxy ]]<br />
*[[Using_RFIO_or_DCAP_to_access_files_on_Nikhef_Grid_disks | Using RFIO or DCAP to access files on Nikhef Grid disks ]]<br />
*[[ADC_Operation_NL | ATLAS Distributed Computing Operations in the NL cloud ]]<br />
*[[NL_Cloud_Monitor_Instructions | Monitoring ATLAS activities at the NL cloud (Shifters' instructions)]]<br />
----<br />
'''SUSY tools:'''<br />
*[[Tools_to_scan_the_mSUGRA_phasespace | Tools to scan the mSUGRA phasespace ]]<br />
*[[Generating_AtlFast_SUSY_Events | Generating AtlFast SUSY Events ]]<br />
----<br />
'''AtlasModularAnalysis'''<br />
*[[ AMA_in_Athena_14_1_0 | Running AMA using Athena release 14.1.0 ]]<br />
*[[AMA_on_Stoomboot_14_2_22 | Running AMAAthena on SARA ESDs release 14.2.22 ]]<br />
----<br />
'''Miscellaneous'''<br />
*[[ AFS | Accessing /afs/cern.ch from Nikhef]]</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Software&diff=3603Software2012-09-14T07:29:54Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>==Atlas Software Pages==<br />
<br />
Here you will find the pages on software issues, specific for Atlas. In the past we tried to keep up with a number of pages, but unfortunately they have been outdated by now. Nevertheless you can have a look at them [http://www.nikhef.nl/pub/experiments/atlas/software/ here]<br />
<br />
<br />
----<br />
'''External software pages:'''<br />
*[http://www.nikhef.nl/~barison/Production.php Marcello Barisonzi Single top production page]<br />
*[http://www.nikhef.nl/~stanb/Production.php Stan's DC2 production page]<br />
<br />
----<br />
'''Tutorials'''<br />
*[[Using Athena at Nikhef]]<br />
*[[CVMFS|Using CVMFS at Nikhef]]<br />
*[[NIKHEF_user_analysis_tutorial_2010|NIKHEF user analysis tutorial 2010]]<br />
*[[Ganga/AMA tutorial September 2008|Ganga/AMA tutorial September 2008]]<br />
*[[wouter_aod_example|Wouters ttbar AOD analysis example]]<br />
*[[MCatNLO_howto|MCatNLO howto for Rome]]<br />
*[http://www.nikhef.nl/~barison/AOD_Tutorial.ppt Marcellos AOD tutorial (ppt)]<br />
*[[Generating_Higgs_To_4_Muons_at_NIKHEF|Generating_Higgs_To_4_Muons_at_NIKHEF]]<br />
*[[Generating_Higgs_Events_on_the_grid|Generating_Higgs_Events_on_the_grid]]<br />
*[[FullChain_on_the_grid|FullChain_on_the_grid]]<br />
*[[Event display tutorial|Event display tutorial]]<br />
*[[MuonCalib_Tutorial|Muon calibration tutorial]]<br />
*[[Combined Reconstruction Recipes|Cosmic muon Reconstruction Recipes]]<br />
*[[athena_workshop|ATHENA workshop October 2006]]<br />
<br />
----<br />
'''Rome Data Sample (NIKHEF):'''<br />
*[[NIKHEF_Rome_MCatNLO_Samples| Location ttbar files at NIKHEF for Rome meeting]]<br />
*[https://uimon.cern.ch/twiki/bin/view/Atlas/RomeListOfSamples CERN Wiki for Rome data]<br />
*[http://atlfarm003.mi.infn.it/%7Enegri/rome_dataset.htm List of GRID jobs]<br />
*[http://phyweb.lbl.gov/AOD/10.0.1/ Web repository of reconstructed data]<br />
*[[aod_ntuple| Contents of the AOD-based Root ntuple (Woutuple)]]<br />
*[[ntuple_analysis_skeleton| Analyse the Ntuple (skeleton analysis)]]<br />
<br />
----<br />
'''TTbar analysis'''<br />
*[[location_datasample| Rome data sample (MCatNLO-input/Generator/AOD/Ntuples)]]<br />
*[[wouter_aod_example| TTbar AOD analysis & producing the TTbar Ntuple from an AOD]]<br />
*[[ttbar_analysis_skeleton| TTbar Ntuple Analysis Skeleton]]<br />
*[[producing_Woutuple_from_AOD | Producing a TTbar Ntuple from an AOD]]<br />
<br />
----<br />
'''CSC Data Analysis (NIKHEF):'''<br />
*[[Setting_up_1206 | Setting up Athena 12.0.7 and TopView ]]<br />
*[[TVModularAnalysis | TVModularAnalysis ]]<br />
*[[12.0.6_TopViewNtuples_and_AOD's_at_NIKHEF | 12.0.6 TopViewNtuples and AOD's at NIKHEF ]]<br />
*[[Location_Ntuples_and_AOD's_at_NIKHEF | Location Ntuples and AOD's at NIKHEF ]] (outdated)<br />
*[[CSC_PhysicsSamples| Location CSC files at NIKHEF]] (outdated)<br />
*[[Producing_CSC_Ntuples_From_AOD_11 | Producing CSC Ntuples from AOD (release 11)]] (outdated)<br />
*[[Analysing_CSC_Ntuples_From_AOD_11 | Analyzing CSC Ntuples from AOD (release 11)]] (outdated)<br />
----<br />
'''Athena release 13 (Nikhef):'''<br />
*[[Setting_up_13.0.30 at Nikhef | Setting up Athena 13.0.30 with slc3 and slc4 at Nikhef]]<br />
----<br />
'''Stoomboot (local cluster)'''<br />
* See: [[Stoomboot]]<br />
----<br />
'''Grid'''<br />
<!-- *[[Using_DQ2_at_NIKHEF | Using DQ2 at NIKHEF ]] --><br />
<!-- *[[Using_ganga_at_NIKHEF | Using ganga at NIKHEF ]] --><br />
*[[StructNtuple_making |StructNtuple making]]<br />
*[[DQ2Client quick usage | DQ2Client quick usage]]<br />
*[[Ganga_basic_usage | Ganga: basic usage]]<br />
*[[Ganga_with_AMAAthena | Ganga: running AMAAthena]]<br />
*[[Using_GANGA_with_AMAAthena | Using GANGA with AMAAthena]] (outdated)<br />
*[[Access_to_data_at_NIKHEF_tier | Access to data at NIKHEF Tier ]]<br />
*[[CSC_Ntuple_production | Producing CSC ntuples ]]<br />
*[[Renew_Grid_Certificate | Renewing your Grid Certificate(s) ]]<br />
*[[Using_atlas_nl_group_membership | Using <tt>/atlas/nl</tt> group membership for proxy ]]<br />
*[[Using_RFIO_or_DCAP_to_access_files_on_Nikhef_Grid_disks | Using RFIO or DCAP to access files on Nikhef Grid disks ]]<br />
*[[ADC_Operation_NL | ATLAS Distributed Computing Operations in the NL cloud ]]<br />
*[[NL_Cloud_Monitor_Instructions | Monitoring ATLAS activities at the NL cloud (Shifters' instructions)]]<br />
----<br />
'''SUSY tools:'''<br />
*[[Tools_to_scan_the_mSUGRA_phasespace | Tools to scan the mSUGRA phasespace ]]<br />
*[[Generating_AtlFast_SUSY_Events | Generating AtlFast SUSY Events ]]<br />
----<br />
'''AtlasModularAnalysis'''<br />
*[[ AMA_in_Athena_14_1_0 | Running AMA using Athena release 14.1.0 ]]<br />
*[[AMA_on_Stoomboot_14_2_22 | Running AMAAthena on SARA ESDs release 14.2.22 ]]<br />
----<br />
'''Miscelleaneous'''<br />
*[[ AFS | Accessing /afs/cern.ch from Nikhef]]</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Software&diff=3602Software2012-08-02T13:32:00Z<p>Dgeerts@nikhef.nl: /* Atlas Software Pages */ Added link to new CVMFS page</p>
<hr />
<div>==Atlas Software Pages==<br />
<br />
Here you will find the pages on software issues, specific for Atlas. In the past we tried to keep up with a number of pages, but unfortunately they have been outdated by now. Nevertheless you can have a look at them [http://www.nikhef.nl/pub/experiments/atlas/software/ here]<br />
<br />
<br />
----<br />
'''External software pages:'''<br />
*[http://www.nikhef.nl/~barison/Production.php Marcello Barisonzi Single top production page]<br />
*[http://www.nikhef.nl/~stanb/Production.php Stan's DC2 production page]<br />
<br />
----<br />
'''Tutorials'''<br />
*[[Using Athena at Nikhef]]<br />
*[[CVMFS|Using CVMFS at Nikhef]]<br />
*[[NIKHEF_user_analysis_tutorial_2010|NIKHEF user analysis tutorial 2010]]<br />
*[[Ganga/AMA tutorial September 2008|Ganga/AMA tutorial September 2008]]<br />
*[[wouter_aod_example|Wouters ttbar AOD analysis example]]<br />
*[[MCatNLO_howto|MCatNLO howto for Rome]]<br />
*[http://www.nikhef.nl/~barison/AOD_Tutorial.ppt Marcellos AOD tutorial (ppt)]<br />
*[[Generating_Higgs_To_4_Muons_at_NIKHEF|Generating_Higgs_To_4_Muons_at_NIKHEF]]<br />
*[[Generating_Higgs_Events_on_the_grid|Generating_Higgs_Events_on_the_grid]]<br />
*[[FullChain_on_the_grid|FullChain_on_the_grid]]<br />
*[[Event display tutorial|Event display tutorial]]<br />
*[[MuonCalib_Tutorial|Muon calibration tutorial]]<br />
*[[Combined Reconstruction Recipes|Cosmic muon Reconstruction Recipes]]<br />
*[[athena_workshop|ATHENA workshop October 2006]]<br />
<br />
----<br />
'''Rome Data Sample (NIKHEF):'''<br />
*[[NIKHEF_Rome_MCatNLO_Samples| Location ttbar files at NIKHEF for Rome meeting]]<br />
*[https://uimon.cern.ch/twiki/bin/view/Atlas/RomeListOfSamples CERN Wiki for Rome data]<br />
*[http://atlfarm003.mi.infn.it/%7Enegri/rome_dataset.htm List of GRID jobs]<br />
*[http://phyweb.lbl.gov/AOD/10.0.1/ Web repository of reconstructed data]<br />
*[[aod_ntuple| Contents of the AOD-based Root ntuple (Woutuple)]]<br />
*[[ntuple_analysis_skeleton| Analyse the Ntuple (skeleton analysis)]]<br />
<br />
----<br />
'''TTbar analysis'''<br />
*[[location_datasample| Rome data sample (MCatNLO-input/Generator/AOD/Ntuples)]]<br />
*[[wouter_aod_example| TTbar AOD analysis & producing the TTbar Ntuple from an AOD]]<br />
*[[ttbar_analysis_skeleton| TTbar Ntuple Analysis Skeleton]]<br />
*[[producing_Woutuple_from_AOD | Producing a TTbar Ntuple from an AOD]]<br />
<br />
----<br />
'''CSC Data Analysis (NIKHEF):'''<br />
*[[Setting_up_1206 | Setting up Athena 12.0.7 and TopView ]]<br />
*[[TVModularAnalysis | TVModularAnalysis ]]<br />
*[[12.0.6_TopViewNtuples_and_AOD's_at_NIKHEF | 12.0.6 TopViewNtuples and AOD's at NIKHEF ]]<br />
*[[Location_Ntuples_and_AOD's_at_NIKHEF | Location Ntuples and AOD's at NIKHEF ]] (outdated)<br />
*[[CSC_PhysicsSamples| Location CSC files at NIKHEF]] (outdated)<br />
*[[Producing_CSC_Ntuples_From_AOD_11 | Producing CSC Ntuples from AOD (release 11)]] (outdated)<br />
*[[Analysing_CSC_Ntuples_From_AOD_11 | Analyzing CSC Ntuples from AOD (release 11)]] (outdated)<br />
----<br />
'''Athena release 13 (Nikhef):'''<br />
*[[Setting_up_13.0.30 at Nikhef | Setting up Athena 13.0.30 with slc3 and slc4 at Nikhef]]<br />
----<br />
'''Grid'''<br />
<!-- *[[Using_DQ2_at_NIKHEF | Using DQ2 at NIKHEF ]] --><br />
<!-- *[[Using_ganga_at_NIKHEF | Using ganga at NIKHEF ]] --><br />
*[[StructNtuple_making |StructNtuple making]]<br />
*[[DQ2Client quick usage | DQ2Client quick usage]]<br />
*[[Ganga_basic_usage | Ganga: basic usage]]<br />
*[[Ganga_with_AMAAthena | Ganga: running AMAAthena]]<br />
*[[Using_GANGA_with_AMAAthena | Using GANGA with AMAAthena]] (outdated)<br />
*[[Access_to_data_at_NIKHEF_tier | Access to data at NIKHEF Tier ]]<br />
*[[CSC_Ntuple_production | Producing CSC ntuples ]]<br />
*[[Renew_Grid_Certificate | Renewing your Grid Certificate(s) ]]<br />
*[[Using_atlas_nl_group_membership | Using <tt>/atlas/nl</tt> group membership for proxy ]]<br />
*[[Using_RFIO_or_DCAP_to_access_files_on_Nikhef_Grid_disks | Using RFIO or DCAP to access files on Nikhef Grid disks ]]<br />
*[[ADC_Operation_NL | ATLAS Distributed Computing Operations in the NL cloud ]]<br />
*[[NL_Cloud_Monitor_Instructions | Monitoring ATLAS activities at the NL cloud (Shifters' instructions)]]<br />
----<br />
'''SUSY tools:'''<br />
*[[Tools_to_scan_the_mSUGRA_phasespace | Tools to scan the mSUGRA phasespace ]]<br />
*[[Generating_AtlFast_SUSY_Events | Generating AtlFast SUSY Events ]]<br />
----<br />
'''AtlasModularAnalysis'''<br />
*[[ AMA_in_Athena_14_1_0 | Running AMA using Athena release 14.1.0 ]]<br />
*[[AMA_on_Stoomboot_14_2_22 | Running AMAAthena on SARA ESDs release 14.2.22 ]]<br />
----<br />
'''Miscelleaneous'''<br />
*[[ AFS | Accessing /afs/cern.ch from Nikhef]]</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=CVMFS&diff=4864CVMFS2012-08-02T13:30:37Z<p>Dgeerts@nikhef.nl: Created page</p>
<hr />
<div>In December 2011, a new system for software distribution called the Cern Virtual Machine FileSystem, or CVMFS for short, was deployed at Nikhef for users. See the [http://agenda.nikhef.nl/conferenceDisplay.py?confId=1512 presentation] that introduced it.<br />
<br />
----<br />
<br />
Please note that CVMFS isn't available on all PCs at Nikhef. It should be ready-to-use on all ATLAS PCs; if not, please contact the [mailto:helpdesk@nikhef.nl helpdesk]. It's also available on all Stoomboot cluster nodes, including the interactive ones.<br />
<br />
To use CVMFS, type in a shell:<br />
<pre><br />
% source /project/atlas/nikhef/cvmfs/setup.sh<br />
% setupATLAS<br />
</pre><br />
<br />
The first line sets-up CVMFS, and the second line calls a simple script that setups the cluster environment for ATLAS, that makes [https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasSetup AtlasSetup] available. Use Asetup to get your Athena release. For more info about the options of Asetup, see the [https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasSetupReference AtlasSetupReference].<br />
<br />
The last command also outputs a list of further command you can use to set-up the software, but first, let's look at all the software available on CVMFS:<br />
<pre><br />
% showVersions<br />
</pre><br />
<br />
This list is rather long; if you only want to show the versions of a certain package (for example, ROOT), use:<br />
<pre><br />
showVersions --show root<br />
</pre><br />
<br />
And finally, here's some example commands you can use to set-up certain software:<br />
<pre><br />
% localSetupDQ2Client<br />
% localSetupPandaClient<br />
% localSetupGanga<br />
% asetup 17.4.0.1,AtlasProduction<br />
% localSetupROOT<br />
</pre></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Using_ganga_at_NIKHEF&diff=4783Using ganga at NIKHEF2012-04-24T15:04:28Z<p>Dgeerts@nikhef.nl: /* Possible problems (and possible sollutions) */ Fixed a typo</p>
<hr />
<div>== Setting up ganga ==<br />
You need an afs ticket to run ganga. Also, you need a grid certificate, and <br />
you need to setup the grid, as described in the [[Using_DQ2_at_NIKHEF|DQ2 at Nikhef wiki]].<br />
At the same time, assuming you set up the GRID tools according to Martijn’s <br />
Wiki, COMMENT OUT THE LINE: <br />
source /project/atlas/nikhef/dq2/dq2_setup.csh.NIKHEF <br />
If you setup the GRID tools in some other way, make sure the grid tools <br />
environment is not loaded. '''GANGA AND GRID TOOLS ENVIRONMENT CLASH!''' <br />
Apparently, it is a mismatch between the grid tools environment <br />
and the Athena environment. You can add the line to an alias or whatever, <br />
if you wish. Then setup ATHENA at NIKHEF as described in [[Setting_up_1206|athena 12.0.6 Wiki]].<br />
<br />
<br />
<br />
<br />
To setup ganga, add the two following lines to the .cshrc:<br><br><br />
<br />
<tt><br />
setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini <br><br />
&#35;for the local installation <br><br />
&#35;set path = (/public/public_linux/Ganga/install/4.3.0/bin/ $path) <br><br />
&#35;for the newest version installed on afs <br><br />
set path = (/afs/cern.ch/sw/ganga/install/4.3.2/bin/ $path) <br><br />
setenv LFC_HOST ’lfc03.nikhef.nl’ <br><br />
setenv LCG_CATALOG_TYPE lfc <br><br />
</tt><br><br />
<br />
or if you are working in a sh based shell (such as bash): <br><br><br />
<br />
<tt><br />
export GANGA\_CONFIG\_PATH=GangaAtlas/Atlas.ini <br><br />
&#35;for the local installation <br><br />
&#35;PATH=/public/public_linux/Ganga/install/4.3.0/bin/:${PATH}<br> <br />
&#35;for the newest version installed on afs <br><br />
PATH=/afs/cern.ch/sw/ganga/install/4.3.1/bin/:${PATH} <br><br />
source LFC_HOST=’lfc03.nikhef.nl’ <br><br />
source LCG_CATALOG_TYPE=lfc<br><br />
</tt><br><br />
<br />
The first time ganga runs, it will ask to create a configuration file $HOME/.gangarc. <br />
Answer yes, and edit the config file as follows: <br />
<ol><br />
<li><br />
In the section labelled [LCG] uncomment the line:<br><br><br />
<tt>VirtualOrganisation = atlas </tt><br><br><br />
and add the line<br />
<br />
<tt>DefaultSE = tbn18\.nikhef\.nl </tt><br />
<br />
</li><br><br />
<li><br />
In the section labeled [Athena] uncomment the line: <br><br><br />
<br />
<tt><br />
&#35; local path to base paths of dist-kits (lxplus example) <br><br />
ATLAS_SOFTWARE = /data/atlas/offline/<br />
</tt><br />
<br />
</li><br><br />
<li><br />
In the section labeld [ROOT] uncomment and edit the lines: <br><br><br />
<br />
<tt><br />
location = /data/atlas/offline/12.0.6/sw/lcg/external/root/ <br><br />
version = 5.10.00e <br><br />
arch = slc3_ia3_gcc323 <br><br />
</tt><br />
<br />
</li><br />
<li><br />
Until ganga 4.3.2 is released, there is a workaround to get ganga working with large input sandboxes. In the section [LCG], add the lines:<br />
<br />
<tt><br />
<br />
</tt><br />
</li><br />
<br />
</ol><br />
<br />
== Running ganga ==<br />
<br />
You can start the ganga CLI by typing ganga on the commandline. This <br />
starts a python interface, where you can start defining your jobs. There are <br />
a few commands you can use to get around in ganga: <br />
<ul><br />
<li><br />
<tt>jobs</tt>: Lists all the jobs that are defined in ganga. You can get to an <br />
indivudual job by typing: <br />
</li><br />
<li><br />
<tt>jobs[id]</tt>: where the id is listed in the second column of the jobs output. <br />
</li><br />
</ul><br />
One thing you can do with a job is view it’s status: <br><br><br />
<tt><br />
jobs[1].status() <br />
</tt><br><br><br />
This can be ’new’, ’submitted’, ’running’ or ’completed’. Once the job is <br />
completed, you can view it’s output (which is stored by default in <br />
$HOME/gangadir/workspace/Local/ jobid /output) by typing:<br><br><br />
<tt> <br />
In [25]: jobs[0].peek() <br />
</tt><br><br><br />
Or look at a specific output file by typing: <br><br><br />
<tt><br />
In [25]: jobs[0].peek(’stderr’,’less’) <br />
</tt><br><br><br />
where <tt>stderr</tt> is the name of the file you want to view, and less the program <br />
to view it with. You can kill a job using the <tt>kill()</tt> method, and remove it <br />
from the jobs list with the remove() method. The most important command <br />
by far is <tt>help()</tt>. This starts the interactive help program of ganga. After <br />
typing it, you get a help> prompt. Typing index gives you a list of all possible <br />
help subjects. The explanations are rather brief, but it does help you to find <br />
methods of build-in classes of Ganga and it’s plugin. For instance, the atlas <br />
plugin defines classes like <tt>DQ2Dataset</tt>. For more info on <tt>DQ2Dataset</tt> you <br />
type <tt>DQ2Dataset</tt> at the <tt>help></tt> prompt.<br />
<br />
<br />
<br />
== Running a simple Job ==<br />
<br />
This little piece of code runs a Hello World Job on the LCG grid: <br><br />
<br />
<tt><br />
In [0] : j=Job()<br> <br />
In [1] : j.application=Executable(exe=’/bin/echo’,args=[’Hello World’]) <br><br />
In [2] : j.backend=LCG() <br><br />
In [3] : j.submit() <br><br />
</tt><br />
<br />
The application that is run here is a UNIX executable. LCG() is another <br />
predefined class that takes care of a lot of details of submitting to the grid. <br />
After it is finished, you can type:<br />
<br />
<tt><br />
In[4] : j.peek(’stdout’,’cat’))<br />
</tt><br />
<br />
Which will output the expected ”Hello World”. You can also put these lines <br />
in a script my script.py, and at the ganga prompt type:<br />
<br />
<tt><br />
In [4]: execfile(’my_script.py’) <br />
</tt><br />
<br />
== Running an <tt>ATHENA</tt> job ==<br />
<br />
Running an athena job, storing the output files into a dq2 dataset, requires a <br />
bit more work, but still it is not hard. The following script defines a Athena <br />
job, splits the job so that there is one job (and hence one outputfile) per <br />
inputfile, runs athena with the TopView localOverride.py jobOptions, and <br />
stores the output on the grid in a DQ2 dataset called testing Ganga V9.<br />
<br />
<tt><br />
&#35;Define the ATHENA job<br />
j = Job() <br> <br />
j.name=’TopView Standard Job, Ganga 4.3.2’ <br><br />
j.application=Athena() <br><br />
j.application.prepare(athena_compile=True) <br><br />
j.application.option_file=’/project/atlas/users/fkoetsve/TestArea1206/PhysicsAnalysis/TopPhys/TopView/TopView-00-12-12-02/run/LocalOverride_Nikhef_BASIC.py’ <br />
&#35;j.application.max_events=’20’ <br><br />
j.splitter=AthenaSplitterJob() <br><br />
j.splitter.match_subjobs_files=True <br><br />
&#35;The merger can be used to merge al the output files into one. See the ganga ATLAS Twiki for details<br />
&#35;j.merger=AthenaOutputMerger()<br> <br />
&#35;Define the inputdata<br />
j.inputdata=DQ2Dataset() <br><br />
j.inputdata.dataset="trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601"<br> <br />
&#35;To send job to complete and incomplete dataset location sources, uncomment either the next line, or the line after that <br><br />
&#35;j.inputdata.min_num_files=100 <br><br />
&#35;j.inputdata.match_ce_all=True <br><br />
j.inputdata.type=’DQ2_LOCAL’ <br><br />
&#35;Define outputdata<br />
&#35;j.outputdata=ATLASOutputDataset() <br><br />
j.outputdata=DQ2OutputDataset() <br><br />
j.outputdata.datasetname=’testing_Ganga_V9’ <br> <br />
j.outputdata.outputdata=[’TopViewAANtuple.root’] <br><br />
&#35;j.outputdata.location=’NIKHEF’ <br><br />
&#35;j.outputsandbox=[’TopViewAANtuple.root’] <br><br />
&#35;Submit<br />
j.backend=LCG() <br><br />
j.backend.CE=’ce-fzk.gridka.de:2119/jobmanager-pbspro-atlas’<br> <br />
&#35;j.inputsandbox=[’my_extra_file’ ] <br><br />
j.application.exclude_from_user_area = [] <br><br />
j.submit() <br><br />
</tt><br />
<br />
Explanation of the terms: <br />
<ul><br />
<li><br />
<tt>j.Application=Athena()</tt>: Defines the job to be an Athena job. <br />
Packs the local installation of athena packages, and sends them with <br />
the job. The groupArea tag of the athena setup, used e.g. for TopView, <br />
does not work (yet). Instead, all the packages defined in the groupArea <br />
tag must be installed locally and packed with the job <br />
</li><br />
<li><br />
<tt>j.splitter=AthenaSplitterJob()</tt>: To get one outputfile per inputfile, as must be done to keep naming of files consistent when going <br />
from AOD to NTuple, you need the job to be split in as many sub jobs <br />
as there are inputfiles. You need this splitter plugin to do that, and <br />
set j.splitter.match_subjobs_files to True<br />
</li><br />
<li><br />
<tt>j.merger</tt>: can be used to merge all the outputfiles into one <br />
</li><br />
<li><br />
<tt>j.inputdata=DQDataset()</tt>: tells the job to get the files from the DQ2 <br />
file catalogue <br />
</li><br />
<li><br />
<tt>j.inputdata.match_ce_all=True</tt>: If there is no location with a complete copy of the dataset, this attribute sends the job to a random <br />
location <br />
</li><br />
<li><br />
<tt>j.inputdata.min_num_files=100</tt>: instead of sending the job to a <br />
random location, this first checks that a given minimum of files is <br />
present at that location <br />
</li><br />
<li><br />
<tt>j.ouputdata=DQ2Outputdataset()</tt>: tells the job to store the output <br />
data on the grid, and register it to the DQ2 registry. <br />
</li><br />
<li><br />
<tt>j.outputdata.outputdata=[’ntuple.root’]</tt>: gives a list of filenames that must be stored in the output dataset. Wildcards are not <br />
supported. If the jobs is split, the outputfiles are numbered automatically. <br />
</li><br />
<li><br />
<tt>j.backend.CE</tt>: allows you to specify which Computing Element the <br />
job should be send to. The syntax is <server>:<port>/jobmanager <br />
-<service>-<queue> <br />
</li><br />
<li><br />
<tt>j.application.exclude_from_user_area = []</tt>: allows you to exclude packages that you have installed locally from inclusion in the <br />
input sandbox (the tar file containing all the files that are send with <br />
your job to the CE)<br />
</ul><br />
After submitting your job you can type jobs in the ganga commandline, <br />
which will show something like: <br />
<br />
<tt><br />
<table><br />
<tr><br />
<td> &#35; </td><td> id </td><td>status </td><td> name </td><td>subjobs</td><td> application </td><td>backend </td><td>CE </td><br />
</tr><br />
<tr><br />
<td>&#35; </td><td>41</td><td> completed </td><td>TopView Standard Job </td><td> 3 </td><td> Athena </td><td> LCG </td><td> ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh </td><br />
</tr><br />
<tr><br />
<td>&#35; </td><td>42 </td><td>completed </td><td></td><td> </td><td>Athena </td><td> LCG </td><td> ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh </td><br />
</tr><br />
</table><br />
</tt><br />
<br />
Here you can see all the jobs, their status, the type of job, it’s name, and <br />
at which CE it is running. If you want more info, you can type <tt>jobs[41]</tt><br />
at the commandline, and you will get the complete configuration of the job, <br />
even those parameters that were set from default, that you know nothing <br />
about. This is very helpfull when debugging ganga.<br />
When the status changes to completed (ganga tells you of the change of <br />
status of any job sa soon as you issue a new command), you can see any <br />
verbal output by typing, just like before: <br />
<br />
<tt><br />
jobs[41].peek(’stdout’,’cat’) <br />
</tt><br />
<br />
If the job completed succesfully, you can retrieve the outputdata by typing: <br />
<br />
<tt><br />
jobs.outputdata.retrieve()<br />
</tt><br />
<br />
The outputdata is then stored in the directory ${HOME}/gangadir/workspace/Local/<job id>/out <br />
As the outputfiles can be large, it is whise to change the location of this directory, by creating a symbolic link called gangadir in your home dir, pointing <br />
to somehwhere where large amounts of data can be stored (temporarily).<br />
<br />
<br />
==Using ganga for running TopView==<br />
The current version of TopView that is used by the Top group and by us is TopView-00-12-13-03. The groupArea Tag should work now, only the latest version of TopTools (PysicsAnalysis/TopPhys/TopTools-00-00-12) has to be checked out. However, the groupArea tag causes the input sandbox to be VERY large (~90M). This is difficult for ganga to handle, hence use the EVTags tar file that can be found [http://atlas-computing.web.cern.ch/atlas-computing/links/kitsDirectory/PAT/EventView/ here]. Use the PhysicsAnalysis package from that tar file. The latest package has a slightly older version of TopView, so you still need to check out the TopTools-00-00-12 and TopView-00-12-13-03 packages.<br />
<br />
Do not forget to copy the InstallArea as well.<br />
<br />
In the GangaScript area there is also a SubmitScript.py, which you run by typing fro the run directory:<br />
<br />
<tt><br />
ganga /user/fkoetsve/GangaScripts/SubmitScript.py --dataset=<datasetname> --number_of_files=<nfiles> --simstyle=<simstyle>.<br />
</tt><br />
<br />
The number of files you find by typing <tt> dq2_ls -f <datasetname> </tt>, and it's requirted. The simstyle can be fullsim, fastsim or streamingtest, and is also required. This script has not been tested yet. For testing ganga, I now use a script <tt>GangaScripts/TopViewGangaSubmission_Override.py</tt>, which runs over the dataset <tt>trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601.</tt><br />
<br />
==Possible problems (and possible sollutions)==<br />
<br />
These are some problems that I encoutered, plus there sollution. <br />
<br />
'''(60, ’SSL certificate problem, verify that the CA cert is OK’)''' <br />
This means that the certificate that is used by ganga is wrong. The directory where your certificates are located is stored in the variale X509_CERT_DIR. Send a request to grid.support@nikhef.nl to <br />
update the certificates, or download them yourself and change the value of X509_CERT_DIR<br />
<br />
'''[Errno 28] No space left on device''' <br />
Ganga writes to different places: /tmp, but also <br />
${HOME}/gangadir/workspace/Local <br />
Cleanup, especially after jobs failed, is not always very tidy. You might need <br />
to clean up some files manually at regular intervals. If you want to be able to <br />
store bigger files, the easiest way to change the gangadir location is to make <br />
a symbolic link in your home directory called gangadir, to whatever location <br />
you want. <br />
<br />
'''File ”/global/ices/lcg/glite3.0.12/edg/bin/UIutils.py”, line 377, in errMsg print message( info.logFile , message )''' <br />
Repeated many lines. The same sollution as the previous section, No space <br />
left on device. <br />
<br />
'''<bound method Job.peek of <Ganga.GPIDev.Lib.Job.Job.Job object at 0xb7015f6c>> ''' <br />
You forgot the ’()’ after the command, in this case peek. It also happens e.g. <br />
with remove(). <br />
<br />
'''LCMAPS credential mapping NOT successful '''<br />
This means the input sandbox (meaning all the files you are sending with your <br />
job to the remote machine) is too large (> 3M B). It then tries to store that <br />
sandbox on a Storage Element, which is default some machine at CERN, and <br />
you don’t have permission to use that. At the following line to the [LCG] <br />
section of $HOME/.gangarc: <br />
<br />
<tt><br />
DefaultSE = tbn18\.nikhef\.nl <br />
</tt><br />
<br />
The . needs to be escaped using the <br />
\, because the file is read into Python. <br />
<br />
'''lcg cp: Transport endpoint is not connected''' <br />
I think this has to do with an overly large sandbox as well, caused by too <br />
many packages checked out in the TestArea. Exclude packages from the <br />
input sandbox using: <br />
j.application.exclude_from_user_area=["package1","package2"] <br />
This turns out to be a bug in ganga, which should be solved in version 4.3.2. There is a workaround now using a temporarily enlarged buffer size at one of the resource brokers. Add these lines to the [LCG] part of .gangarc:<br />
<br />
<tt><br />
ConfigVO = /user/fkoetsve/rb106_edg_wl_ui.conf <br><br />
BoundSandboxLimit = 52428800<br />
</tt><br />
<br />
'''Dataset empty at Triumf'''<br />
A bug in ganga 4.3.2. To fix, download the [http://ganga.web.cern.ch/ganga/download/ganga-install ganga-install] script, and run, in the directory where you want to install:<br />
<br />
<tt><br />
./ganga-install --extern=GangaAtlas,GangaNG,GangaCronus,GangaGUI,GangaPlotter<br />
</tt><br />
<br />
and change the path variable to point to the new ganga executable. Then, replace the file<br />
<br />
<tt><br />
Ganga/install/4.3.2/python/GangaAtlas/Lib/Athena/ganga-stage-in-out-dq2.py<br />
</tt><br />
<br />
with <br />
<br />
<tt><br />
/afs/cern.ch/user/e/elmsheus/public/ganga-stage-in-out-dq2.py<br />
</tt><br />
<br />
==More Info==<br />
<br />
[http://ganga.web.cern.ch/ganga/ The ganga project homepage]<br><br />
[https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ Ganga FAQ ]<br><br />
[https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial43 Ganga 4.3.0 tutorial] (no tutorial for higher versions available yet; check the hypernews forum for extra features)<br />
<br />
If you find any problems with this document, please contact me by clicking [mailto:f.koetsveld@science.ru.nl here]</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Stoomboot&diff=3704Stoomboot2011-11-09T12:30:43Z<p>Dgeerts@nikhef.nl: Updated for 'recent' extension; some other numbers might not be right anymore</p>
<hr />
<div>== What is stoomboot ==<br />
<br />
Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all<br />
NIKHEF users, but a login account does not give automatic access to stoomboot.<br />
Contact helpdesk@nikhef.nl to gain access<br />
<br />
=== Hardware ===<br />
<br />
Stoomboot consists of 32 nodes (<tt>stbc-01</tt> through <tt>stbc-32</tt>) that are each<br />
a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and<br />
16 Gb of memory. The total number of cores is 256.<br />
<br />
=== Software & disk access ===<br />
<br />
All stoomboot nodes run Scientific Linux 4.7. All NFS mountable disks<br />
at NIKHEF are visible (<tt>/project/*</tt> and <tt>/data/*</tt>). Stoomboot does not run<br />
AFS so no AFS directories including <tt>/afs/cern.ch</tt> are not visible. This<br />
may indirectly impact you as certain experimental software installations<br />
attempt to access files on <tt>/afs/cern.ch</tt>. As stoomboot is intended as a local<br />
batch farm there are no plans to install AFS.<br />
<br />
== How to use stoomboot ==<br />
<br />
=== Submitting batch jobs ===<br />
<br />
Stoomboot is a batch-only facilities and jobs can be submitted through<br />
the PBS <tt>qsub</tt> command<br />
<br />
<pre><br />
unix> qsub test.sh<br />
9714.allier.nikhef.nl<br />
</pre><br />
<br />
The argument passed to <tt>qsub</tt> is a script that will be executed in your home<br />
directory. The returned string is the job identifier and can be used to look<br />
up the status of the job, or to manipulate it later. Jobs can be submitted from any <br />
linux desktop at nikhef as well as <tt>login.nikhef.nl</tt>. If you cannot submit jobs<br />
from your local desktop, contact <tt>helpdesk@nikhef.nl</tt> to have the batch client software installed.<br />
<br />
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings<br />
apply when you submit a batch job<br />
<br />
<br />
* Job runs in home directory (<tt>$HOME</tt>)<br />
* Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script<br />
* Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file <tt>test.sh.o9714</tt> contains stdout and file <tt>test.sh.e9714</tt> contains stderr. If there is no stdout or stderr, an empty file is created<br />
* A mail is sent to you the output files cannot be created<br />
<br />
<br />
Here is a listed of frequently desired changes in default behavior and their corresponding<br />
option in <tt>qsub</tt><br />
<br />
<br />
*''Merge stdout and stderr in a single file''. Add option <tt>-j oe</tt> to <tt>qsub</tt> command (single file <tt>*.o*</tt> is written)<br />
<br />
* ''Choose batch queue''. Right now there are two queues: <tt>test</tt> (30 min) and <tt>qlong</tt> (48h) Add option <tt>-q <queuename></tt> to qsub command<br />
<br />
* ''Choose different output file for stdout''. Add option <tt>-o <filename></tt> to <tt>qsub</tt> command<br />
<br />
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''. Add option <tt>-V</tt> to <tt>qsub</tt> command<br />
<br />
* '' Run the job on a specific stoomboot node ''. Add option <tt> -l host=stbc-XX</tt> to the <tt> qsub</tt> command line.<br />
<br />
<br />
A full list of options can be obtained from <tt>man qsub</tt><br />
<br />
=== Examining the status of your jobs ===<br />
<br />
The <tt>qstat</tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.<br />
<br />
<pre><br />
unix> qstat<br />
Job id Name User Time Use S Queue<br />
------------------- ---------------- --------------- -------- - -----<br />
9714.allier test.sh verkerke 00:00:00 C test<br />
</pre><br />
<br />
The <tt>qstat</tt> command only shows your own jobs, not those of other users.<br />
Only completed jobs that completed less than 10 minutes ago are listsed with status 'C'.<br />
Output of jobs that completed longer ago is kept, but they are simply no longer<br />
listed in the status overview.<br />
<br />
To see activity of other users on the system you can use the lower-level maui command<br />
<tt>showq</tt> which will show jobs of all users. The <tt>showq</tt> command works without<br />
arguments on <tt>login</tt>, on any other host add <tt>--host=allier</tt> to run it successfully.<br />
<br />
The general level of activity on stoomboot is graphically monitored in this location<br />
http://www.nikhef.nl/grid/stats/stbc/<br />
<br />
== Common practical issues in stoomboot user ==<br />
<br />
=== LSF job submission emulator ===<br />
<br />
The interface of the PBS batch system is notably different from the LSF batch system<br />
that is run at e.g. CERN and SLAC. One of the convenient features of LSF <tt>bsub</tt> is that the<br />
user does not need to write a script for every batch job, but that a command line that<br />
is passed to <tt>bsub</tt> is executed. An emulator is available for the LSF <tt>bsub</tt><br />
command that submits a job that executes the <tt>bsub</tt> command line in the present<br />
working directory and the complete present envirnment. For example one can do<br />
<br />
<pre><br />
bsub ls -l <br />
</pre><br />
<br />
which will submit a batch job that executes <tt>ls -l</tt> in the working directory<br />
from which the <tt>bsub</tt> command was executed. This script expressly allows the<br />
user to setup e.g. the complete ATLAS software environment in a shell on the local desktop <br />
and then substitute local desktop running of an ATLAS software job with a batch-run job by prefixing <tt>bsub</tt> to the executed command line. The scope of the LSF <tt>bsub</tt> emulator is limited to<br />
its ability to execute the command line in batch in an identical environment. It does<br />
not emulate the various command line flags of LSF <tt>bsub</tt>. You can find the <tt>bsub</tt><br />
emulator for now in <tt>~verkerke/bin/bsub</tt><br />
<br />
<br />
=== Suggestions for debugging and trouble shooting ===<br />
<br />
If you want to debug a problem that occurs on a stoomboot batch job, or you want to make a short trial run for a larger series of batch jobs there are two ways to gain interactive login access to stoomboot.<br />
<br />
* You can directly login to node stbc-32 (this node ''only'') to test and/or debug your problem. You should keep CPU consumption and testing time to a minimum as regularly scheduled batch jobs run on this machine too.<br />
<br />
* You can request an 'interactive' batch job through <tt>qsub -q qlong -X -I</tt>. In this mode you can consume as much CPU resources as the queue that the interactive job was submitted to allows. The 'look and feel' of interactive bacth jobs is nearly identical to that of <tt>ssh</tt>. The main exception is that when no free job slot is available the <tt>qsub</tt> command will hang until one becomes available.<br />
<br />
=== Scratch disk usages and NFS disk access ===<br />
<br />
When running on stoomboot please be sure to locate all local 'scratch' files to the directory pointed to by the environment variable <tt>$TMPDIR</tt> and ''not'' <tt>/tmp</tt>. The latter is very small (a few Gb) and when filled up will give all kinds of problems for you and other users. The disk pointed to by <tt>$TMPDIR</tt> is typically 200 Gb. Also here be sure to clean up when your job ends to avoid filling up these disk as well.<br />
<br />
When accessing NFS mounted disks (<tt>/project/*</tt>, <tt>/data/*</tt>) please keep in mind that the network bandwidth between stoomboot nodes and the NFS server is limited and that the NFS server capacity is also limited. Running e.g. 50 jobs that read from or write to files on NFS disks at a high rate ('ntuple analysis') may result in poor performance of both the NFS server and your jobs.<br />
<br />
=== Scheduling policies and CPU quota ===<br />
<br />
This section is sensitive to changes as scheduling policies and quota allocation are still evolving.<br />
At the time of writing (December 2008) each group (atlas,bphys etc...) is allowed to use at most 96 run slots (i.e. 75% of the available capacity, this is the hard limit). When the system is 'busy', as determined by the maui scheduler a lower soft limit of 64 run slots is enforced (50% of the capacity). Each individual user is entitled to use all run slots of his group. To see what policy prevents your queued jobs from running use the <tt>checkjob <jobid></tt> command.<br />
<br />
<br />
== Questions, communication and announcements on stoomboot ==<br />
<br />
To ask questions and to receive announcements on stoomboot operations, subscribe<br />
to the stoomboot users mailing list (stbc-users@nikhef.nl). To subscribe yourself<br />
to this list go to https://mailman.nikhef.nl/cgi-bin/listinfo/stbc-users.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/Nuke&diff=4859User:Dgeerts/Nuke2011-07-04T13:40:02Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>Bla</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=4858User:Dgeerts/DontLookAtMe2011-07-04T13:38:44Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3397User:Dgeerts/DontLookAtMe2011-07-04T13:37:51Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.<br />
<br />
=Nikhef travel system has full PHP error reporting enabled=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
All the PHP error reporting functions are enabled in the Nikhef Travel Booking system. Whenever any exception occured, full error information (what error, file (with fill path) and linenumber) are displayed to the external user.<br />
<br />
=Nikhef travel system leaks logged-in status of users=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using, for example, the URL: http://www.nikhef.nl/pub/travel/reizen.php?cmd=1&als=0&gebruiker=XXX, where 'XXX' is a user ID, the Travel system will respond differently whether a user is logged in (= has a session defined in the database) or not. Logged in users will produce a "Session corrupt" page, while not logged in users will actually produce a PHP error message.<br />
<br />
=Unauthenticated external requests can logout any user from the Nikhef travel system=<br />
*<B>Type</B>: Denial of service<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
By retrieving the http://www.nikhef.nl/pub/travel/reizen.php?cmd=72&als=1&gebruiker=XXX page, the session of user 'XXX' (user ID) is removed from the database. (cmd 72 is the log out command.) This is because the Travel system does not verify the session of the caller before processing this command.<br />
<br />
=Nikhef travel system database reachable from non-webserver computers=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
It is possible to connect to the MySQL server hosting the Travel system database from other computers than the webserver.<br />
<br />
=Copy (unmodified) of Nikhef travel system is fully functional=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
A copy of the Nikhef Travel system hosted from a user's public_html folder is fully functional. Since the source-code is directly copyable, this is trivially done. Also, with minor modifications (such as: disabling the password checks) it is possible to host a version of the system allowing anybody to log in as anybody else.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3396User:Dgeerts/DontLookAtMe2011-07-04T13:37:18Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.<br />
<br />
=Nikhef travel system has full PHP error reporting enabled=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
All the PHP error reporting functions are enabled in the Nikhef Travel Booking system. Whenever any exception occured, full error information (what error, file (with fill path) and linenumber) are displayed to the external user.<br />
<br />
=Nikhef travel system leaks logged-in status of users=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using, for example, the URL: http://www.nikhef.nl/pub/travel/reizen.php?cmd=1&als=0&gebruiker=XXX, where 'XXX' is a user ID, the Travel system will respond differently whether a user is logged in (= has a session defined in the database) or not. Logged in users will produce a "Session corrupt" page, while not logged in users will actually produce a PHP error message.<br />
<br />
=Unauthenticated external requests can logout any user from the Nikhef travel system=<br />
*<B>Type</B>: Denial of service<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
By retrieving the http://www.nikhef.nl/pub/travel/reizen.php?cmd=72&als=1&gebruiker=XXX page, the session of user 'XXX' (user ID) is removed from the database. (cmd 72 is the log out command.) This is because the Travel system does not verify the session of the caller before processing this command.<br />
<br />
=Nikhef travel system database reachable from non-webserver computers=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
It is possible to connect to the MySQL server hosting the Travel system database from other computers than the webserver.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3395User:Dgeerts/DontLookAtMe2011-07-04T13:36:41Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.<br />
<br />
=Nikhef travel system has full PHP error reporting enabled=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
All the PHP error reporting functions are enabled in the Nikhef Travel Booking system. Whenever any exception occured, full error information (what error, file (with fill path) and linenumber) are displayed to the external user.<br />
<br />
=Nikhef travel system leaks logged-in status of users=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using, for example, the URL: http://www.nikhef.nl/pub/travel/reizen.php?cmd=1&als=0&gebruiker=XXX, where 'XXX' is a user ID, the Travel system will respond differently whether a user is logged in (= has a session defined in the database) or not. Logged in users will produce a "Session corrupt" page, while not logged in users will actually produce a PHP error message.<br />
<br />
=Unauthenticated external requests can logout any user from the Nikhef travel system=<br />
*<B>Type</B>: Denial of service<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
By retrieving the http://www.nikhef.nl/pub/travel/reizen.php?cmd=72&als=1&gebruiker=XXX page, the session of user 'XXX' (user ID) is removed from the database. (cmd 72 is the log out command.) This is because the Travel system does not verify the session of the caller before processing this command.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3394User:Dgeerts/DontLookAtMe2011-07-04T12:38:13Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.<br />
<br />
=Nikhef travel system has full PHP error reporting enabled=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
All the PHP error reporting functions are enabled in the Nikhef Travel Booking system. Whenever any exception occured, full error information (what error, file (with fill path) and linenumber) are displayed to the external user.<br />
<br />
=Nikhef travel system leaks logged-in status of users=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using, for example, the URL: http://www.nikhef.nl/pub/travel/reizen.php?cmd=1&als=0&gebruiker=XXX, where 'XXX' is a user ID, the Travel system will respond differently whether a user is logged in (= has a session defined in the database) or not. Logged in users will produce a "Session corrupt" page, while not logged in users will actually produce a PHP error message.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3393User:Dgeerts/DontLookAtMe2011-07-04T12:34:10Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.<br />
<br />
=Nikhef travel system has full PHP error reporting enabled=<br />
*<B>Type</B>: Information disclosure<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
All the PHP error reporting functions are enabled in the Nikhef Travel Booking system. Whenever any exception occured, full error information (what error, file (with fill path) and linenumber) are displayed to the external user.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3392User:Dgeerts/DontLookAtMe2011-07-04T12:33:27Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)<br />
<br />
=Password file of Nikhef travel system externally reachable=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The password file of the Nikhef Travel Booking system is externally reachable by URL: [http://www.nikhef.nl/pub/travel/config.inc http://www.nikhef.nl/pub/travel/config.inc]. The Urenregistratie system has a similar file (geheim.php), but this is both protected by being having a PHP extension, and, more importantly, not being accessible by other accounts. However, it is readable by the webserver (obviously), so a simple dump-content-of-file PHP script is able to display the contents of this file anyway.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3391User:Dgeerts/DontLookAtMe2011-07-04T12:32:24Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.<br />
<br />
=World-writable folder(s) on webserver=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=yellow>Partially fixed</FONT><br />
<br />
There are several world-writable files and folders that are served by the webserver, allowing a local user to put content on the website. Main example: /public/www/pub. Was used to deface the Nikhef Travel Booking system (this one is fixed). Various others remain. (Mostly 'registration participant' files.)</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3390User:Dgeerts/DontLookAtMe2011-07-04T12:31:41Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.<br />
<br />
=PHP on webserver able to exec arbitrary executables=<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Using PHP's built-in 'exec' command, a PHP script can run arbitrary executables on the webserver.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3389User:Dgeerts/DontLookAtMe2011-07-04T12:30:57Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).<br />
<br />
=PHP on webserver able to reach main filesystem=<br />
<br />
*<B>Type</B>:<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
The PHP installation running on the webserver is able to reach the main filesystem (at least the *nix side) without any problems. In fact, several directories are exposed to the web (by design). This allows PHP scripts to access the filesystem, and (if rights permit) even write to the filesystem.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3388User:Dgeerts/DontLookAtMe2011-07-04T12:29:24Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: <FONT color=red>Not fixed</FONT><br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: <FONT color=blue>Fixed</FONT><br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3387User:Dgeerts/DontLookAtMe2011-07-04T12:28:33Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: Not fixed<br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.<br />
<br />
=PHP scripts on webserver run under 'web' account=<br />
*<B>Type</B>: Local privilege escalation<br />
*<B>Status</B>: Fixed<br />
<br />
Any PHP script run on the webserver (by, for example, dropping the scriptfile into the user's public_html directory) executes under the 'web' account. This allows users to escalate their privilege (if the 'web' account has more rights than the user's account).</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=User:Dgeerts/DontLookAtMe&diff=3386User:Dgeerts/DontLookAtMe2011-07-04T12:27:13Z<p>Dgeerts@nikhef.nl: </p>
<hr />
<div>=Able to run arbitrary executables on Windows Terminal server=<br />
*<B>Type</B>: Local arbitrary code execution<br />
*<B>Status</B>: Not fixed<br />
<br />
Microsoft Windows' bootloader by default checks the header of the executable it is given, to determine whether it is an EXE, BAT or COM file, and automatically runs it as the proper type. The current configuration on the Nikhef Windows Terminal Server blocks the loading of arbitrary EXE and COM files, but not arbitrary BAT files. Thus, by renaming the file extension from EXE to BAT, this security feature is circumvented, and the executable executed.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Using_ganga_at_NIKHEF&diff=3522Using ganga at NIKHEF2010-08-26T08:07:37Z<p>Dgeerts@nikhef.nl: /* Setting up ganga */ Fixed a typo</p>
<hr />
<div>== Setting up ganga ==<br />
You need an afs ticket to run ganga. Also, you need a grid certificate, and <br />
you need to setup the grid, as described in the [[Using_DQ2_at_NIKHEF|DQ2 at Nikhef wiki]].<br />
At the same time, assuming you set up the GRID tools according to Martijn’s <br />
Wiki, COMMENT OUT THE LINE: <br />
source /project/atlas/nikhef/dq2/dq2_setup.csh.NIKHEF <br />
If you setup the GRID tools in some other way, make sure the grid tools <br />
environment is not loaded. '''GANGA AND GRID TOOLS ENVIRONMENT CLASH!''' <br />
Apparently, it is a mismatch between the grid tools environment <br />
and the Athena environment. You can add the line to an alias or whatever, <br />
if you wish. Then setup ATHENA at NIKHEF as described in [[Setting_up_1206|athena 12.0.6 Wiki]].<br />
<br />
<br />
<br />
<br />
To setup ganga, add the two following lines to the .cshrc:<br><br><br />
<br />
<tt><br />
setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini <br><br />
&#35;for the local installation <br><br />
&#35;set path = (/public/public_linux/Ganga/install/4.3.0/bin/ $path) <br><br />
&#35;for the newest version installed on afs <br><br />
set path = (/afs/cern.ch/sw/ganga/install/4.3.2/bin/ $path) <br><br />
setenv LFC_HOST ’lfc03.nikhef.nl’ <br><br />
setenv LCG_CATALOG_TYPE lfc <br><br />
</tt><br><br />
<br />
or if you are working in a sh based shell (such as bash): <br><br><br />
<br />
<tt><br />
export GANGA\_CONFIG\_PATH=GangaAtlas/Atlas.ini <br><br />
&#35;for the local installation <br><br />
&#35;PATH=/public/public_linux/Ganga/install/4.3.0/bin/:${PATH}<br> <br />
&#35;for the newest version installed on afs <br><br />
PATH=/afs/cern.ch/sw/ganga/install/4.3.1/bin/:${PATH} <br><br />
source LFC_HOST=’lfc03.nikhef.nl’ <br><br />
source LCG_CATALOG_TYPE=lfc<br><br />
</tt><br><br />
<br />
The first time ganga runs, it will ask to create a configuration file $HOME/.gangarc. <br />
Answer yes, and edit the config file as follows: <br />
<ol><br />
<li><br />
In the section labelled [LCG] uncomment the line:<br><br><br />
<tt>VirtualOrganisation = atlas </tt><br><br><br />
and add the line<br />
<br />
<tt>DefaultSE = tbn18\.nikhef\.nl </tt><br />
<br />
</li><br><br />
<li><br />
In the section labeled [Athena] uncomment the line: <br><br><br />
<br />
<tt><br />
&#35; local path to base paths of dist-kits (lxplus example) <br><br />
ATLAS_SOFTWARE = /data/atlas/offline/<br />
</tt><br />
<br />
</li><br><br />
<li><br />
In the section labeld [ROOT] uncomment and edit the lines: <br><br><br />
<br />
<tt><br />
location = /data/atlas/offline/12.0.6/sw/lcg/external/root/ <br><br />
version = 5.10.00e <br><br />
arch = slc3_ia3_gcc323 <br><br />
</tt><br />
<br />
</li><br />
<li><br />
Until ganga 4.3.2 is released, there is a workaround to get ganga working with large input sandboxes. In the section [LCG], add the lines:<br />
<br />
<tt><br />
<br />
</tt><br />
</li><br />
<br />
</ol><br />
<br />
== Running ganga ==<br />
<br />
You can start the ganga CLI by typing ganga on the commandline. This <br />
starts a python interface, where you can start defining your jobs. There are <br />
a few commands you can use to get around in ganga: <br />
<ul><br />
<li><br />
<tt>jobs</tt>: Lists all the jobs that are defined in ganga. You can get to an <br />
indivudual job by typing: <br />
</li><br />
<li><br />
<tt>jobs[id]</tt>: where the id is listed in the second column of the jobs output. <br />
</li><br />
</ul><br />
One thing you can do with a job is view it’s status: <br><br><br />
<tt><br />
jobs[1].status() <br />
</tt><br><br><br />
This can be ’new’, ’submitted’, ’running’ or ’completed’. Once the job is <br />
completed, you can view it’s output (which is stored by default in <br />
$HOME/gangadir/workspace/Local/ jobid /output) by typing:<br><br><br />
<tt> <br />
In [25]: jobs[0].peek() <br />
</tt><br><br><br />
Or look at a specific output file by typing: <br><br><br />
<tt><br />
In [25]: jobs[0].peek(’stderr’,’less’) <br />
</tt><br><br><br />
where <tt>stderr</tt> is the name of the file you want to view, and less the program <br />
to view it with. You can kill a job using the <tt>kill()</tt> method, and remove it <br />
from the jobs list with the remove() method. The most important command <br />
by far is <tt>help()</tt>. This starts the interactive help program of ganga. After <br />
typing it, you get a help> prompt. Typing index gives you a list of all possible <br />
help subjects. The explanations are rather brief, but it does help you to find <br />
methods of build-in classes of Ganga and it’s plugin. For instance, the atlas <br />
plugin defines classes like <tt>DQ2Dataset</tt>. For more info on <tt>DQ2Dataset</tt> you <br />
type <tt>DQ2Dataset</tt> at the <tt>help></tt> prompt.<br />
<br />
<br />
<br />
== Running a simple Job ==<br />
<br />
This little piece of code runs a Hello World Job on the LCG grid: <br><br />
<br />
<tt><br />
In [0] : j=Job()<br> <br />
In [1] : j.application=Executable(exe=’/bin/echo’,args=[’Hello World’]) <br><br />
In [2] : j.backend=LCG() <br><br />
In [3] : j.submit() <br><br />
</tt><br />
<br />
The application that is run here is a UNIX executable. LCG() is another <br />
predefined class that takes care of a lot of details of submitting to the grid. <br />
After it is finished, you can type:<br />
<br />
<tt><br />
In[4] : j.peek(’stdout’,’cat’))<br />
</tt><br />
<br />
Which will output the expected ”Hello World”. You can also put these lines <br />
in a script my script.py, and at the ganga prompt type:<br />
<br />
<tt><br />
In [4]: execfile(’my_script.py’) <br />
</tt><br />
<br />
== Running an <tt>ATHENA</tt> job ==<br />
<br />
Running an athena job, storing the output files into a dq2 dataset, requires a <br />
bit more work, but still it is not hard. The following script defines a Athena <br />
job, splits the job so that there is one job (and hence one outputfile) per <br />
inputfile, runs athena with the TopView localOverride.py jobOptions, and <br />
stores the output on the grid in a DQ2 dataset called testing Ganga V9.<br />
<br />
<tt><br />
&#35;Define the ATHENA job<br />
j = Job() <br> <br />
j.name=’TopView Standard Job, Ganga 4.3.2’ <br><br />
j.application=Athena() <br><br />
j.application.prepare(athena_compile=True) <br><br />
j.application.option_file=’/project/atlas/users/fkoetsve/TestArea1206/PhysicsAnalysis/TopPhys/TopView/TopView-00-12-12-02/run/LocalOverride_Nikhef_BASIC.py’ <br />
&#35;j.application.max_events=’20’ <br><br />
j.splitter=AthenaSplitterJob() <br><br />
j.splitter.match_subjobs_files=True <br><br />
&#35;The merger can be used to merge al the output files into one. See the ganga ATLAS Twiki for details<br />
&#35;j.merger=AthenaOutputMerger()<br> <br />
&#35;Define the inputdata<br />
j.inputdata=DQ2Dataset() <br><br />
j.inputdata.dataset="trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601"<br> <br />
&#35;To send job to complete and incomplete dataset location sources, uncomment either the next line, or the line after that <br><br />
&#35;j.inputdata.min_num_files=100 <br><br />
&#35;j.inputdata.match_ce_all=True <br><br />
j.inputdata.type=’DQ2_LOCAL’ <br><br />
&#35;Define outputdata<br />
&#35;j.outputdata=ATLASOutputDataset() <br><br />
j.outputdata=DQ2OutputDataset() <br><br />
j.outputdata.datasetname=’testing_Ganga_V9’ <br> <br />
j.outputdata.outputdata=[’TopViewAANtuple.root’] <br><br />
&#35;j.outputdata.location=’NIKHEF’ <br><br />
&#35;j.outputsandbox=[’TopViewAANtuple.root’] <br><br />
&#35;Submit<br />
j.backend=LCG() <br><br />
j.backend.CE=’ce-fzk.gridka.de:2119/jobmanager-pbspro-atlas’<br> <br />
&#35;j.inputsandbox=[’my_extra_file’ ] <br><br />
j.application.exclude_from_user_area = [] <br><br />
j.submit() <br><br />
</tt><br />
<br />
Explanation of the terms: <br />
<ul><br />
<li><br />
<tt>j.Application=Athena()</tt>: Defines the job to be an Athena job. <br />
Packs the local installation of athena packages, and sends them with <br />
the job. The groupArea tag of the athena setup, used e.g. for TopView, <br />
does not work (yet). Instead, all the packages defined in the groupArea <br />
tag must be installed locally and packed with the job <br />
</li><br />
<li><br />
<tt>j.splitter=AthenaSplitterJob()</tt>: To get one outputfile per inputfile, as must be done to keep naming of files consistent when going <br />
from AOD to NTuple, you need the job to be split in as many sub jobs <br />
as there are inputfiles. You need this splitter plugin to do that, and <br />
set j.splitter.match_subjobs_files to True<br />
</li><br />
<li><br />
<tt>j.merger</tt>: can be used to merge all the outputfiles into one <br />
</li><br />
<li><br />
<tt>j.inputdata=DQDataset()</tt>: tells the job to get the files from the DQ2 <br />
file catalogue <br />
</li><br />
<li><br />
<tt>j.inputdata.match_ce_all=True</tt>: If there is no location with a complete copy of the dataset, this attribute sends the job to a random <br />
location <br />
</li><br />
<li><br />
<tt>j.inputdata.min_num_files=100</tt>: instead of sending the job to a <br />
random location, this first checks that a given minimum of files is <br />
present at that location <br />
</li><br />
<li><br />
<tt>j.ouputdata=DQ2Outputdataset()</tt>: tells the job to store the output <br />
data on the grid, and register it to the DQ2 registry. <br />
</li><br />
<li><br />
<tt>j.outputdata.outputdata=[’ntuple.root’]</tt>: gives a list of filenames that must be stored in the output dataset. Wildcards are not <br />
supported. If the jobs is split, the outputfiles are numbered automatically. <br />
</li><br />
<li><br />
<tt>j.backend.CE</tt>: allows you to specify which Computing Element the <br />
job should be send to. The syntax is <server>:<port>/jobmanager <br />
-<service>-<queue> <br />
</li><br />
<li><br />
<tt>j.application.exclude_from_user_area = []</tt>: allows you to exclude packages that you have installed locally from inclusion in the <br />
input sandbox (the tar file containing all the files that are send with <br />
your job to the CE)<br />
</ul><br />
After submitting your job you can type jobs in the ganga commandline, <br />
which will show something like: <br />
<br />
<tt><br />
<table><br />
<tr><br />
<td> &#35; </td><td> id </td><td>status </td><td> name </td><td>subjobs</td><td> application </td><td>backend </td><td>CE </td><br />
</tr><br />
<tr><br />
<td>&#35; </td><td>41</td><td> completed </td><td>TopView Standard Job </td><td> 3 </td><td> Athena </td><td> LCG </td><td> ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh </td><br />
</tr><br />
<tr><br />
<td>&#35; </td><td>42 </td><td>completed </td><td></td><td> </td><td>Athena </td><td> LCG </td><td> ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh </td><br />
</tr><br />
</table><br />
</tt><br />
<br />
Here you can see all the jobs, their status, the type of job, it’s name, and <br />
at which CE it is running. If you want more info, you can type <tt>jobs[41]</tt><br />
at the commandline, and you will get the complete configuration of the job, <br />
even those parameters that were set from default, that you know nothing <br />
about. This is very helpfull when debugging ganga.<br />
When the status changes to completed (ganga tells you of the change of <br />
status of any job sa soon as you issue a new command), you can see any <br />
verbal output by typing, just like before: <br />
<br />
<tt><br />
jobs[41].peek(’stdout’,’cat’) <br />
</tt><br />
<br />
If the job completed succesfully, you can retrieve the outputdata by typing: <br />
<br />
<tt><br />
jobs.outputdata.retrieve()<br />
</tt><br />
<br />
The outputdata is then stored in the directory ${HOME}/gangadir/workspace/Local/<job id>/out <br />
As the outputfiles can be large, it is whise to change the location of this directory, by creating a symbolic link called gangadir in your home dir, pointing <br />
to somehwhere where large amounts of data can be stored (temporarily).<br />
<br />
<br />
==Using ganga for running TopView==<br />
The current version of TopView that is used by the Top group and by us is TopView-00-12-13-03. The groupArea Tag should work now, only the latest version of TopTools (PysicsAnalysis/TopPhys/TopTools-00-00-12) has to be checked out. However, the groupArea tag causes the input sandbox to be VERY large (~90M). This is difficult for ganga to handle, hence use the EVTags tar file that can be found [http://atlas-computing.web.cern.ch/atlas-computing/links/kitsDirectory/PAT/EventView/ here]. Use the PhysicsAnalysis package from that tar file. The latest package has a slightly older version of TopView, so you still need to check out the TopTools-00-00-12 and TopView-00-12-13-03 packages.<br />
<br />
Do not forget to copy the InstallArea as well.<br />
<br />
In the GangaScript area there is also a SubmitScript.py, which you run by typing fro the run directory:<br />
<br />
<tt><br />
ganga /user/fkoetsve/GangaScripts/SubmitScript.py --dataset=<datasetname> --number_of_files=<nfiles> --simstyle=<simstyle>.<br />
</tt><br />
<br />
The number of files you find by typing <tt> dq2_ls -f <datasetname> </tt>, and it's requirted. The simstyle can be fullsim, fastsim or streamingtest, and is also required. This script has not been tested yet. For testing ganga, I now use a script <tt>GangaScripts/TopViewGangaSubmission_Override.py</tt>, which runs over the dataset <tt>trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601.</tt><br />
<br />
==Possible problems (and possible sollutions)==<br />
<br />
These are some problems that I encoutered, plus there sollution. <br />
<br />
'''(60, ’SSL certificate problem, verify that the CA cert is OK’)''' <br />
This means that the certificate that is used by ganga is wrong. The directory where your certificates are located is stored in the variale X509_CERT_DIR. Send a request to grid.support@nikhef.nl to <br />
update the certificates, or download them yourself and change the value of X509_CERT_DIR<br />
<br />
'''[Errno 28] No space left on device''' <br />
Ganga writes to different places: /tmp, but also <br />
${HOME}/gangadir/workspace/Local <br />
Cleanup, especially after jobs failed, is not always very tidy. You might need <br />
to clean up some files manually at regular intervals. If you want to be able to <br />
store bigger files, the easiest way to change the gangadir location is to make <br />
a symbolic link in your home directory called gangadir, to whatever location <br />
you want. <br />
<br />
'''File ”/global/ices/lcg/glite3.0.12/edg/bin/UIutils.py”, line 377, in errMsg print message( info.logFile , message )''' <br />
Repeated many lines. The same sollution as the previous section, No space <br />
left on device. <br />
<br />
'''<bound method Job.peek of <Ganga.GPIDev.Lib.Job.Job.Job object at 0xb7015f6c>> ''' <br />
You forgot the ’()’ after the command, in this case peek. It also happens e.g. <br />
with remove(). <br />
<br />
'''LCMAPS credential mapping NOT successful '''<br />
This means the input sandox (meaning all the files you are sending with your <br />
job to the remote machine) is too large (> 3M B). It then tries to store that <br />
sandbox on a Storage Element, which is default some machine at CERN, and <br />
you don’t have permission to use that. At the following line to the [LCG] <br />
section of $HOME/.gangarc: <br />
<br />
<tt><br />
DefaultSE = tbn18\.nikhef\.nl <br />
</tt><br />
<br />
The . needs to be escaped using the <br />
\, because the file is read into Python. <br />
<br />
'''lcg cp: Transport endpoint is not connected''' <br />
I think this has to do with an overly large sandbox as well, caused by too <br />
many packages checked out in the TestArea. Exclude packages from the <br />
input sandbox using: <br />
j.application.exclude_from_user_area=["package1","package2"] <br />
This turns out to be a bug in ganga, which should be solved in version 4.3.2. There is a workaround now using a temporarily enlarged buffer size at one of the resource brokers. Add these lines to the [LCG] part of .gangarc:<br />
<br />
<tt><br />
ConfigVO = /user/fkoetsve/rb106_edg_wl_ui.conf <br><br />
BoundSandboxLimit = 52428800<br />
</tt><br />
<br />
'''Dataset empty at Triumf'''<br />
A bug in ganga 4.3.2. To fix, download the [http://ganga.web.cern.ch/ganga/download/ganga-install ganga-install] script, and run, in the directory where you want to install:<br />
<br />
<tt><br />
./ganga-install --extern=GangaAtlas,GangaNG,GangaCronus,GangaGUI,GangaPlotter<br />
</tt><br />
<br />
and change the path variable to point to the new ganga executable. Then, replace the file<br />
<br />
<tt><br />
Ganga/install/4.3.2/python/GangaAtlas/Lib/Athena/ganga-stage-in-out-dq2.py<br />
</tt><br />
<br />
with <br />
<br />
<tt><br />
/afs/cern.ch/user/e/elmsheus/public/ganga-stage-in-out-dq2.py<br />
</tt><br />
==More Info==<br />
<br />
[http://ganga.web.cern.ch/ganga/ The ganga project homepage]<br><br />
[https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ Ganga FAQ ]<br><br />
[https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial43 Ganga 4.3.0 tutorial] (no tutorial for higher versions available yet; check the hypernews forum for extra features)<br />
<br />
If you find any problems with this document, please contact me by clicking [mailto:f.koetsveld@science.ru.nl here]</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_basic_usage&diff=3607Ganga basic usage2010-02-09T13:39:35Z<p>Dgeerts@nikhef.nl: /* Introduction */ Fixed a typo</p>
<hr />
<div>== Introduction ==<br />
The [Ganga basic usage] will help beginners to understand how to use this tool for managing computational jobs running locally or on the Grid. This guide provides step-by-step instructions for running simple "HelloWorld" job through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. <tt>elel22.nikhef.nl</tt>) and submit jobs to Stoomboot (a PBS cluster) and to the LCG.<br />
<br />
As Ganga is also a job management tools for end users, this wiki will also introduce a few useful commands for managing Ganga jobs.<br />
<br />
== Requirements ==<br />
<ul><br />
<li>You need to have a proper privilege for submitting jobs to a local cluster (e.g. NIKHEF account for Stoomboot and/or CERN account for lxbatch)<br />
<li>You need to have a valid grid certificate registered in a Virtual Organization (e.g. ATLAS) for running jobs on the Grid.<br />
</ul><br />
<br />
== Preparation ==<br />
=== password-less login between desktop and Stoomboot nodes ===<br />
This step is needed for managing jobs on Stoomboot with Ganga. <br />
<br />
== Starting GANGA session ==<br />
<ul><br />
<li>'''For NIKHEF users'''<br />
<pre><br />
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF<br />
% export DPNS_HOST=tbn18.nikhef.nl<br />
% export LFC_HOST=lfc-atlas.grid.sara.nl<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
Every time you start with a clean shell, and you'll need to setup Ganga with the lines given right above. <br />
</li><br />
<br />
<li>'''For CERN lxplus users'''<br />
<pre><br />
% source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh<br />
% ganga<br />
</pre><br />
<br />
More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php<br />
<br />
</li><br />
</ul><br />
<br />
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a <tt>~/.gangarc</tt> file. The template of the <tt>~/.gangarc</tt> file can be generated by:<br />
<br />
<pre><br />
% ganga -g<br />
</pre><br />
<br />
If you see the following prompt:<br />
<br />
<pre><br />
*** Welcome to Ganga ***<br />
Version: Ganga-5-4-2<br />
Documentation and support: http://cern.ch/ganga<br />
Type help() or help('index') for online help.<br />
<br />
This is free software (GPL), and you are welcome to redistribute it<br />
under certain conditions; type license() for details.<br />
<br />
In [1]:<br />
</pre><br />
<br />
you are already in a GANGA session. The GANGA session is actually an [http://ipython.scipy.org/moin/ IPython] shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.<br />
<br />
== Leaving GANGA session ==<br />
To quit from a GANGA session, just press '''CTRL-D'''.<br />
<br />
== Getting familiar with GANGA ==<br />
=== My first Grid job running a HelloWorld shell script ===<br />
<br />
Now go to your project directory<br />
<pre><br />
cd /project/atlas/Users/yourusernamehere<br />
</pre><br />
and create 'myscript.sh'<br />
<pre><br />
#!/bin/sh<br />
echo 'myscript.sh running...'<br />
echo "----------------------"<br />
/bin/hostname<br />
echo "HELLO PLANET!"<br />
echo "----------------------"<br />
</pre><br />
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure:<br />
<pre><br />
In[n]: j = Job()<br />
In[n]: j.application=Executable()<br />
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')<br />
In[n]: j.backend=LCG()<br />
In[n]: j.submit() <br />
</pre><br />
<br />
This Ganga Job means the following<br />
* Line 1 defines the job<br />
* Line 2 sets it as an Executable<br />
* Line 3 tell which file to run<br />
* Line 4 Tell where the job should run<br />
* Line 5 submits the job<br />
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid.<br />
Now start ganga again and submit the job to the LCG-grid<br />
<pre><br />
In[n]: execfile("./gangaScript.py")<br />
</pre><br />
<br />
the status of the job can be monitored with <br />
<pre><br />
In[n]: jobs<br />
</pre><br />
<br />
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.<br />
<br />
When your job is <tt>completed</tt>, the job's output is automatically fetched from the Grid and stored in your <tt>gangadir</tt> directory. The exact output location can be found by:<br />
<pre><br />
In[n]: j.outputdir<br />
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output<br />
</pre><br />
<br />
if 0 was the job ID. This was our first grid-job submitted via ganga!<br />
<br />
=== Working with historical jobs ===<br />
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. <tt>gangadir</tt>) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.<br />
<br />
The first thing to work with your historical job is to get the job instance from the repository as the following:<br />
<br />
<pre><br />
In [n]: jobs<br />
Out[n]: <br />
Job slice: jobs (12 jobs)<br />
--------------<br />
# fqid status name subjobs application backend backend.actualCE <br />
# 17 submitted 1000 Executable LCG <br />
# 18 submitted 2000 Executable LCG <br />
# 20 completed 10 Executable LCG<br />
# 28 submitted Executable LCG<br />
# 29 submitted test_lcg Executable LCG <br />
</pre><br />
<br />
The table above lists the historical jobs in your GANGA repository indexed by <tt>fqid</tt>. For example, if you are interested in the job with id <tt>29</tt>, you can get the job instance by<br />
<br />
<pre><br />
In [n]: j = jobs(29)<br />
</pre><br />
<br />
then you are all set to work with the job.<br />
<br />
Please note that you <span style="color:#800000">'''CANNOT'''</span> change the attributes of a historical job.<br />
<br />
=== More GANGA jobs to run on different platforms ===<br />
Now try the following commands in the Ganga shell to gets your hands dirty :)<br />
Try to find where the second job runs.<br />
<br />
<pre><br />
In [n]: j = Job()<br />
In [n]: j.backend=Local()<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
<br />
In [n]: j = j.copy()<br />
In [n]: j.backend=PBS()<br />
In [n]: j.backend.queue = 'test'<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
</pre><br />
<br />
If you run Ganga on a NIKHEF desktop, the PBS backend should be configured for submitting jobs to the Stoomboot cluster.<br />
<br />
== Basic job management ==<br />
<br />
=== Checking job status ===<br />
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.<br />
<br />
In addition, you can get a job summary table by:<br />
<br />
<pre><br />
In [n]: jobs<br />
</pre><br />
<br />
or a summary table for subjobs (you won't have subjobs if you don't use '''Splitter''' with the job, for more advanced application, the '''Splitter''' may be used):<br />
<br />
<pre><br />
In [n]: j.subjobs<br />
</pre><br />
<br />
=== Killing and removing jobs ===<br />
You can kill a job by calling<br />
<br />
<pre><br />
In [n]: j.kill()<br />
</pre><br />
<br />
Ganga keeps the killed job still referable so the working directory and job registry of the removed jobs are still kept in Ganga (that can take your disk space). So if you want to really erase everything related to this job from Ganga, you can remove a job by<br />
<br />
<pre><br />
In [n]: j.remove()<br />
</pre><br />
<br />
=== Failing jobs manually ===<br />
Some unexpected issues in the job may cause Ganga unable to update the job status to <tt>failed</tt> as it should be. In this case, you can manually fail the job in force<br />
<br />
<pre><br />
In [n]: j.force_status("failed", force=True)<br />
</pre><br />
<br />
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.<br />
<br />
=== Checking stdout/err (basic trouble shooting) ===<br />
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:<br />
<br />
<pre><br />
In [n]: j.peek('stdout','less')<br />
In [n]: j.peek('stderr','cat')<br />
</pre><br />
<br />
or<br />
<br />
<pre><br />
In [n]: j.peek('stdout.gz','zless')<br />
In [n]: j.peek('stdout.gz','zcat')<br />
</pre><br />
<br />
for the LCG jobs.<br />
<br />
=== More actions on a job ===<br />
try to type: <tt>j.<TAB Key></tt> in your Ganga session, the auto-completion feature of IPython will tells you the exported methods of the Ganga job object.<br />
<br />
Or you can get help on the job object, for example:<br />
<br />
<pre><br />
In [n]: j = jobs[-1]<br />
In [n]: help(j)<br />
</pre></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_basic_usage&diff=2920Ganga basic usage2010-02-09T13:39:13Z<p>Dgeerts@nikhef.nl: /* Starting GANGA session */ Fixed a typo</p>
<hr />
<div>== Introduction ==<br />
The [Ganga basic usage] will help beginners to understand how to use this tool for managing computational jobs running locally or on the Grid. This guide provides step-by-step instructions for running simple "HelloWorld" job through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. <tt>elel22.nikhef.nl</tt>) and submit jobs to Stoomboot (a PBS cluster) and to the LCG.<br />
<br />
As Ganga is also a job management tools for end users, this wiki will also introduce few useful commands for managing Ganga jobs.<br />
<br />
== Requirements ==<br />
<ul><br />
<li>You need to have a proper privilege for submitting jobs to a local cluster (e.g. NIKHEF account for Stoomboot and/or CERN account for lxbatch)<br />
<li>You need to have a valid grid certificate registered in a Virtual Organization (e.g. ATLAS) for running jobs on the Grid.<br />
</ul><br />
<br />
== Preparation ==<br />
=== password-less login between desktop and Stoomboot nodes ===<br />
This step is needed for managing jobs on Stoomboot with Ganga. <br />
<br />
== Starting GANGA session ==<br />
<ul><br />
<li>'''For NIKHEF users'''<br />
<pre><br />
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF<br />
% export DPNS_HOST=tbn18.nikhef.nl<br />
% export LFC_HOST=lfc-atlas.grid.sara.nl<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
Every time you start with a clean shell, and you'll need to setup Ganga with the lines given right above. <br />
</li><br />
<br />
<li>'''For CERN lxplus users'''<br />
<pre><br />
% source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh<br />
% ganga<br />
</pre><br />
<br />
More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php<br />
<br />
</li><br />
</ul><br />
<br />
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a <tt>~/.gangarc</tt> file. The template of the <tt>~/.gangarc</tt> file can be generated by:<br />
<br />
<pre><br />
% ganga -g<br />
</pre><br />
<br />
If you see the following prompt:<br />
<br />
<pre><br />
*** Welcome to Ganga ***<br />
Version: Ganga-5-4-2<br />
Documentation and support: http://cern.ch/ganga<br />
Type help() or help('index') for online help.<br />
<br />
This is free software (GPL), and you are welcome to redistribute it<br />
under certain conditions; type license() for details.<br />
<br />
In [1]:<br />
</pre><br />
<br />
you are already in a GANGA session. The GANGA session is actually an [http://ipython.scipy.org/moin/ IPython] shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.<br />
<br />
== Leaving GANGA session ==<br />
To quit from a GANGA session, just press '''CTRL-D'''.<br />
<br />
== Getting familiar with GANGA ==<br />
=== My first Grid job running a HelloWorld shell script ===<br />
<br />
Now go to your project directory<br />
<pre><br />
cd /project/atlas/Users/yourusernamehere<br />
</pre><br />
and create 'myscript.sh'<br />
<pre><br />
#!/bin/sh<br />
echo 'myscript.sh running...'<br />
echo "----------------------"<br />
/bin/hostname<br />
echo "HELLO PLANET!"<br />
echo "----------------------"<br />
</pre><br />
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure:<br />
<pre><br />
In[n]: j = Job()<br />
In[n]: j.application=Executable()<br />
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')<br />
In[n]: j.backend=LCG()<br />
In[n]: j.submit() <br />
</pre><br />
<br />
This Ganga Job means the following<br />
* Line 1 defines the job<br />
* Line 2 sets it as an Executable<br />
* Line 3 tell which file to run<br />
* Line 4 Tell where the job should run<br />
* Line 5 submits the job<br />
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid.<br />
Now start ganga again and submit the job to the LCG-grid<br />
<pre><br />
In[n]: execfile("./gangaScript.py")<br />
</pre><br />
<br />
the status of the job can be monitored with <br />
<pre><br />
In[n]: jobs<br />
</pre><br />
<br />
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.<br />
<br />
When your job is <tt>completed</tt>, the job's output is automatically fetched from the Grid and stored in your <tt>gangadir</tt> directory. The exact output location can be found by:<br />
<pre><br />
In[n]: j.outputdir<br />
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output<br />
</pre><br />
<br />
if 0 was the job ID. This was our first grid-job submitted via ganga!<br />
<br />
=== Working with historical jobs ===<br />
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. <tt>gangadir</tt>) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.<br />
<br />
The first thing to work with your historical job is to get the job instance from the repository as the following:<br />
<br />
<pre><br />
In [n]: jobs<br />
Out[n]: <br />
Job slice: jobs (12 jobs)<br />
--------------<br />
# fqid status name subjobs application backend backend.actualCE <br />
# 17 submitted 1000 Executable LCG <br />
# 18 submitted 2000 Executable LCG <br />
# 20 completed 10 Executable LCG<br />
# 28 submitted Executable LCG<br />
# 29 submitted test_lcg Executable LCG <br />
</pre><br />
<br />
The table above lists the historical jobs in your GANGA repository indexed by <tt>fqid</tt>. For example, if you are interested in the job with id <tt>29</tt>, you can get the job instance by<br />
<br />
<pre><br />
In [n]: j = jobs(29)<br />
</pre><br />
<br />
then you are all set to work with the job.<br />
<br />
Please note that you <span style="color:#800000">'''CANNOT'''</span> change the attributes of a historical job.<br />
<br />
=== More GANGA jobs to run on different platforms ===<br />
Now try the following commands in the Ganga shell to gets your hands dirty :)<br />
Try to find where the second job runs.<br />
<br />
<pre><br />
In [n]: j = Job()<br />
In [n]: j.backend=Local()<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
<br />
In [n]: j = j.copy()<br />
In [n]: j.backend=PBS()<br />
In [n]: j.backend.queue = 'test'<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
</pre><br />
<br />
If you run Ganga on a NIKHEF desktop, the PBS backend should be configured for submitting jobs to the Stoomboot cluster.<br />
<br />
== Basic job management ==<br />
<br />
=== Checking job status ===<br />
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.<br />
<br />
In addition, you can get a job summary table by:<br />
<br />
<pre><br />
In [n]: jobs<br />
</pre><br />
<br />
or a summary table for subjobs (you won't have subjobs if you don't use '''Splitter''' with the job, for more advanced application, the '''Splitter''' may be used):<br />
<br />
<pre><br />
In [n]: j.subjobs<br />
</pre><br />
<br />
=== Killing and removing jobs ===<br />
You can kill a job by calling<br />
<br />
<pre><br />
In [n]: j.kill()<br />
</pre><br />
<br />
Ganga keeps the killed job still referable so the working directory and job registry of the removed jobs are still kept in Ganga (that can take your disk space). So if you want to really erase everything related to this job from Ganga, you can remove a job by<br />
<br />
<pre><br />
In [n]: j.remove()<br />
</pre><br />
<br />
=== Failing jobs manually ===<br />
Some unexpected issues in the job may cause Ganga unable to update the job status to <tt>failed</tt> as it should be. In this case, you can manually fail the job in force<br />
<br />
<pre><br />
In [n]: j.force_status("failed", force=True)<br />
</pre><br />
<br />
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.<br />
<br />
=== Checking stdout/err (basic trouble shooting) ===<br />
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:<br />
<br />
<pre><br />
In [n]: j.peek('stdout','less')<br />
In [n]: j.peek('stderr','cat')<br />
</pre><br />
<br />
or<br />
<br />
<pre><br />
In [n]: j.peek('stdout.gz','zless')<br />
In [n]: j.peek('stdout.gz','zcat')<br />
</pre><br />
<br />
for the LCG jobs.<br />
<br />
=== More actions on a job ===<br />
try to type: <tt>j.<TAB Key></tt> in your Ganga session, the auto-completion feature of IPython will tells you the exported methods of the Ganga job object.<br />
<br />
Or you can get help on the job object, for example:<br />
<br />
<pre><br />
In [n]: j = jobs[-1]<br />
In [n]: help(j)<br />
</pre></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_basic_usage&diff=2919Ganga basic usage2010-02-09T13:36:01Z<p>Dgeerts@nikhef.nl: /* Working with historical jobs */ Fixed a typo</p>
<hr />
<div>== Introduction ==<br />
The [Ganga basic usage] will help beginners to understand how to use this tool for managing computational jobs running locally or on the Grid. This guide provides step-by-step instructions for running simple "HelloWorld" job through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. <tt>elel22.nikhef.nl</tt>) and submit jobs to Stoomboot (a PBS cluster) and to the LCG.<br />
<br />
As Ganga is also a job management tools for end users, this wiki will also introduce few useful commands for managing Ganga jobs.<br />
<br />
== Requirements ==<br />
<ul><br />
<li>You need to have a proper privilege for submitting jobs to a local cluster (e.g. NIKHEF account for Stoomboot and/or CERN account for lxbatch)<br />
<li>You need to have a valid grid certificate registered in a Virtual Organization (e.g. ATLAS) for running jobs on the Grid.<br />
</ul><br />
<br />
== Preparation ==<br />
=== password-less login between desktop and Stoomboot nodes ===<br />
This step is needed for managing jobs on Stoomboot with Ganga. <br />
<br />
== Starting GANGA session ==<br />
<ul><br />
<li>'''For NIKHEF users'''<br />
<pre><br />
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF<br />
% export DPNS_HOST=tbn18.nikhef.nl<br />
% export LFC_HOST=lfc-atlas.grid.sara.nl<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
Every time you start with a clean shell, and you'll need to setup ganga with the lines given right above. <br />
</li><br />
<br />
<li>'''For CERN lxplus users'''<br />
<pre><br />
% source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh<br />
% ganga<br />
</pre><br />
<br />
More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php<br />
<br />
</li><br />
</ul><br />
<br />
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a <tt>~/.gangarc</tt> file. The template of the <tt>~/.gangarc</tt> file can be generated by:<br />
<br />
<pre><br />
% ganga -g<br />
</pre><br />
<br />
If you see the following prompt:<br />
<br />
<pre><br />
*** Welcome to Ganga ***<br />
Version: Ganga-5-4-2<br />
Documentation and support: http://cern.ch/ganga<br />
Type help() or help('index') for online help.<br />
<br />
This is free software (GPL), and you are welcome to redistribute it<br />
under certain conditions; type license() for details.<br />
<br />
In [1]:<br />
</pre><br />
<br />
you are already in a GANGA session. The GANGA session is actually an [http://ipython.scipy.org/moin/ IPython] shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.<br />
<br />
== Leaving GANGA session ==<br />
To quit from a GANGA session, just press '''CTRL-D'''.<br />
<br />
== Getting familiar with GANGA ==<br />
=== My first Grid job running a HelloWorld shell script ===<br />
<br />
Now go to your project directory<br />
<pre><br />
cd /project/atlas/Users/yourusernamehere<br />
</pre><br />
and create 'myscript.sh'<br />
<pre><br />
#!/bin/sh<br />
echo 'myscript.sh running...'<br />
echo "----------------------"<br />
/bin/hostname<br />
echo "HELLO PLANET!"<br />
echo "----------------------"<br />
</pre><br />
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure:<br />
<pre><br />
In[n]: j = Job()<br />
In[n]: j.application=Executable()<br />
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')<br />
In[n]: j.backend=LCG()<br />
In[n]: j.submit() <br />
</pre><br />
<br />
This Ganga Job means the following<br />
* Line 1 defines the job<br />
* Line 2 sets it as an Executable<br />
* Line 3 tell which file to run<br />
* Line 4 Tell where the job should run<br />
* Line 5 submits the job<br />
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid.<br />
Now start ganga again and submit the job to the LCG-grid<br />
<pre><br />
In[n]: execfile("./gangaScript.py")<br />
</pre><br />
<br />
the status of the job can be monitored with <br />
<pre><br />
In[n]: jobs<br />
</pre><br />
<br />
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.<br />
<br />
When your job is <tt>completed</tt>, the job's output is automatically fetched from the Grid and stored in your <tt>gangadir</tt> directory. The exact output location can be found by:<br />
<pre><br />
In[n]: j.outputdir<br />
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output<br />
</pre><br />
<br />
if 0 was the job ID. This was our first grid-job submitted via ganga!<br />
<br />
=== Working with historical jobs ===<br />
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. <tt>gangadir</tt>) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.<br />
<br />
The first thing to work with your historical job is to get the job instance from the repository as the following:<br />
<br />
<pre><br />
In [n]: jobs<br />
Out[n]: <br />
Job slice: jobs (12 jobs)<br />
--------------<br />
# fqid status name subjobs application backend backend.actualCE <br />
# 17 submitted 1000 Executable LCG <br />
# 18 submitted 2000 Executable LCG <br />
# 20 completed 10 Executable LCG<br />
# 28 submitted Executable LCG<br />
# 29 submitted test_lcg Executable LCG <br />
</pre><br />
<br />
The table above lists the historical jobs in your GANGA repository indexed by <tt>fqid</tt>. For example, if you are interested in the job with id <tt>29</tt>, you can get the job instance by<br />
<br />
<pre><br />
In [n]: j = jobs(29)<br />
</pre><br />
<br />
then you are all set to work with the job.<br />
<br />
Please note that you <span style="color:#800000">'''CANNOT'''</span> change the attributes of a historical job.<br />
<br />
=== More GANGA jobs to run on different platforms ===<br />
Now try the following commands in the Ganga shell to gets your hands dirty :)<br />
Try to find where the second job runs.<br />
<br />
<pre><br />
In [n]: j = Job()<br />
In [n]: j.backend=Local()<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
<br />
In [n]: j = j.copy()<br />
In [n]: j.backend=PBS()<br />
In [n]: j.backend.queue = 'test'<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
</pre><br />
<br />
If you run Ganga on a NIKHEF desktop, the PBS backend should be configured for submitting jobs to the Stoomboot cluster.<br />
<br />
== Basic job management ==<br />
<br />
=== Checking job status ===<br />
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.<br />
<br />
In addition, you can get a job summary table by:<br />
<br />
<pre><br />
In [n]: jobs<br />
</pre><br />
<br />
or a summary table for subjobs (you won't have subjobs if you don't use '''Splitter''' with the job, for more advanced application, the '''Splitter''' may be used):<br />
<br />
<pre><br />
In [n]: j.subjobs<br />
</pre><br />
<br />
=== Killing and removing jobs ===<br />
You can kill a job by calling<br />
<br />
<pre><br />
In [n]: j.kill()<br />
</pre><br />
<br />
Ganga keeps the killed job still referable so the working directory and job registry of the removed jobs are still kept in Ganga (that can take your disk space). So if you want to really erase everything related to this job from Ganga, you can remove a job by<br />
<br />
<pre><br />
In [n]: j.remove()<br />
</pre><br />
<br />
=== Failing jobs manually ===<br />
Some unexpected issues in the job may cause Ganga unable to update the job status to <tt>failed</tt> as it should be. In this case, you can manually fail the job in force<br />
<br />
<pre><br />
In [n]: j.force_status("failed", force=True)<br />
</pre><br />
<br />
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.<br />
<br />
=== Checking stdout/err (basic trouble shooting) ===<br />
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:<br />
<br />
<pre><br />
In [n]: j.peek('stdout','less')<br />
In [n]: j.peek('stderr','cat')<br />
</pre><br />
<br />
or<br />
<br />
<pre><br />
In [n]: j.peek('stdout.gz','zless')<br />
In [n]: j.peek('stdout.gz','zcat')<br />
</pre><br />
<br />
for the LCG jobs.<br />
<br />
=== More actions on a job ===<br />
try to type: <tt>j.<TAB Key></tt> in your Ganga session, the auto-completion feature of IPython will tells you the exported methods of the Ganga job object.<br />
<br />
Or you can get help on the job object, for example:<br />
<br />
<pre><br />
In [n]: j = jobs[-1]<br />
In [n]: help(j)<br />
</pre></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_basic_usage&diff=2918Ganga basic usage2010-02-08T11:33:01Z<p>Dgeerts@nikhef.nl: /* My first Grid job running a HelloWorld shell script */ Fixed a typo</p>
<hr />
<div>== Introduction ==<br />
The [Ganga basic usage] will help beginners to understand how to use this tool for managing computational jobs running locally or on the Grid. This guide provides step-by-step instructions for running simple "HelloWorld" job through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. <tt>elel22.nikhef.nl</tt>) and submit jobs to Stoomboot (a PBS cluster) and to the LCG.<br />
<br />
As Ganga is also a job management tools for end users, this wiki will also introduce few useful commands for managing Ganga jobs.<br />
<br />
== Requirements ==<br />
<ul><br />
<li>You need to have a proper privilege for submitting jobs to a local cluster (e.g. NIKHEF account for Stoomboot and/or CERN account for lxbatch)<br />
<li>You need to have a valid grid certificate registered in a Virtual Organization (e.g. ATLAS) for running jobs on the Grid.<br />
</ul><br />
<br />
== Preparation ==<br />
=== password-less login between desktop and Stoomboot nodes ===<br />
This step is needed for managing jobs on Stoomboot with Ganga. <br />
<br />
== Starting GANGA session ==<br />
<ul><br />
<li>'''For NIKHEF users'''<br />
<pre><br />
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF<br />
% export DPNS_HOST=tbn18.nikhef.nl<br />
% export LFC_HOST=lfc-atlas.grid.sara.nl<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
Every time you start with a clean shell, and you'll need to setup ganga with the lines given right above. <br />
</li><br />
<br />
<li>'''For CERN lxplus users'''<br />
<pre><br />
% source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh<br />
% ganga<br />
</pre><br />
<br />
More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php<br />
<br />
</li><br />
</ul><br />
<br />
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a <tt>~/.gangarc</tt> file. The template of the <tt>~/.gangarc</tt> file can be generated by:<br />
<br />
<pre><br />
% ganga -g<br />
</pre><br />
<br />
If you see the following prompt:<br />
<br />
<pre><br />
*** Welcome to Ganga ***<br />
Version: Ganga-5-4-2<br />
Documentation and support: http://cern.ch/ganga<br />
Type help() or help('index') for online help.<br />
<br />
This is free software (GPL), and you are welcome to redistribute it<br />
under certain conditions; type license() for details.<br />
<br />
In [1]:<br />
</pre><br />
<br />
you are already in a GANGA session. The GANGA session is actually an [http://ipython.scipy.org/moin/ IPython] shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.<br />
<br />
== Leaving GANGA session ==<br />
To quit from a GANGA session, just press '''CTRL-D'''.<br />
<br />
== Getting familiar with GANGA ==<br />
=== My first Grid job running a HelloWorld shell script ===<br />
<br />
Now go to your project directory<br />
<pre><br />
cd /project/atlas/Users/yourusernamehere<br />
</pre><br />
and create 'myscript.sh'<br />
<pre><br />
#!/bin/sh<br />
echo 'myscript.sh running...'<br />
echo "----------------------"<br />
/bin/hostname<br />
echo "HELLO PLANET!"<br />
echo "----------------------"<br />
</pre><br />
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure:<br />
<pre><br />
In[n]: j = Job()<br />
In[n]: j.application=Executable()<br />
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')<br />
In[n]: j.backend=LCG()<br />
In[n]: j.submit() <br />
</pre><br />
<br />
This Ganga Job means the following<br />
* Line 1 defines the job<br />
* Line 2 sets it as an Executable<br />
* Line 3 tell which file to run<br />
* Line 4 Tell where the job should run<br />
* Line 5 submits the job<br />
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid.<br />
Now start ganga again and submit the job to the LCG-grid<br />
<pre><br />
In[n]: execfile("./gangaScript.py")<br />
</pre><br />
<br />
the status of the job can be monitored with <br />
<pre><br />
In[n]: jobs<br />
</pre><br />
<br />
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.<br />
<br />
When your job is <tt>completed</tt>, the job's output is automatically fetched from the Grid and stored in your <tt>gangadir</tt> directory. The exact output location can be found by:<br />
<pre><br />
In[n]: j.outputdir<br />
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output<br />
</pre><br />
<br />
if 0 was the job ID. This was our first grid-job submitted via ganga!<br />
<br />
=== Working with historical jobs ===<br />
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. <tt>gangadir</tt>) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.<br />
<br />
The first thing to work with your historical job is to get the job instance from the repository as the following:<br />
<br />
<pre><br />
In [n]: jobs<br />
Out[1]: <br />
Job slice: jobs (12 jobs)<br />
--------------<br />
# fqid status name subjobs application backend backend.actualCE <br />
# 17 submitted 1000 Executable LCG <br />
# 18 submitted 2000 Executable LCG <br />
# 20 completed 10 Executable LCG<br />
# 28 submitted Executable LCG<br />
# 29 submitted test_lcg Executable LCG <br />
</pre><br />
<br />
The table above lists the historical jobs in your GANGA repository indexed by <tt>fqid</tt>. For example, if you are interested in the job with id <tt>29</tt>, you can get the job instance by<br />
<br />
<pre><br />
In [n]: j = jobs(29)<br />
</pre><br />
<br />
then you are all set to work with the job.<br />
<br />
Please note that you <span style="color:#800000">'''CANNOT'''</span> change the attributes of a historical job.<br />
<br />
=== More GANGA jobs to run on different platforms ===<br />
Now try the following commands in the Ganga shell to gets your hands dirty :)<br />
Try to find where the second job runs.<br />
<br />
<pre><br />
In [n]: j = Job()<br />
In [n]: j.backend=Local()<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
<br />
In [n]: j = j.copy()<br />
In [n]: j.backend=PBS()<br />
In [n]: j.backend.queue = 'test'<br />
In [n]: j.submit()<br />
In [n]: jobs<br />
</pre><br />
<br />
If you run Ganga on a NIKHEF desktop, the PBS backend should be configured for submitting jobs to the Stoomboot cluster.<br />
<br />
== Basic job management ==<br />
<br />
=== Checking job status ===<br />
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.<br />
<br />
In addition, you can get a job summary table by:<br />
<br />
<pre><br />
In [n]: jobs<br />
</pre><br />
<br />
or a summary table for subjobs (you won't have subjobs if you don't use '''Splitter''' with the job, for more advanced application, the '''Splitter''' may be used):<br />
<br />
<pre><br />
In [n]: j.subjobs<br />
</pre><br />
<br />
=== Killing and removing jobs ===<br />
You can kill a job by calling<br />
<br />
<pre><br />
In [n]: j.kill()<br />
</pre><br />
<br />
Ganga keeps the killed job still referable so the working directory and job registry of the removed jobs are still kept in Ganga (that can take your disk space). So if you want to really erase everything related to this job from Ganga, you can remove a job by<br />
<br />
<pre><br />
In [n]: j.remove()<br />
</pre><br />
<br />
=== Failing jobs manually ===<br />
Some unexpected issues in the job may cause Ganga unable to update the job status to <tt>failed</tt> as it should be. In this case, you can manually fail the job in force<br />
<br />
<pre><br />
In [n]: j.force_status("failed", force=True)<br />
</pre><br />
<br />
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.<br />
<br />
=== Checking stdout/err (basic trouble shooting) ===<br />
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:<br />
<br />
<pre><br />
In [n]: j.peek('stdout','less')<br />
In [n]: j.peek('stderr','cat')<br />
</pre><br />
<br />
or<br />
<br />
<pre><br />
In [n]: j.peek('stdout.gz','zless')<br />
In [n]: j.peek('stdout.gz','zcat')<br />
</pre><br />
<br />
for the LCG jobs.<br />
<br />
=== More actions on a job ===<br />
try to type: <tt>j.<TAB Key></tt> in your Ganga session, the auto-completion feature of IPython will tells you the exported methods of the Ganga job object.<br />
<br />
Or you can get help on the job object, for example:<br />
<br />
<pre><br />
In [n]: j = jobs[-1]<br />
In [n]: help(j)<br />
</pre></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_with_AMAAthena&diff=4834Ganga with AMAAthena2010-02-08T11:31:31Z<p>Dgeerts@nikhef.nl: /* PBS */ Fixed a mistake</p>
<hr />
<div>== Introduction ==<br />
This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (local desktop, Stoomboot, lxbatch, Grid).<br />
<br />
== Preparation ==<br />
<ul><br />
<li>Make sure you can run AMAAthena standalone on local desktop. Here are instructions about doing it at NIKHEF: [[Using Athena at Nikhef | Using Athena at NIKHEF]]<br />
<li>Make sure you manage to submit HelloWorld jobs to different computing platforms. Here are instructions: [[Ganga_basic_usage | Ganga: basic usage]]<br />
</ul><br />
<br />
== Starting Ganga ==<br />
Before starting Ganga, set CMT environment properly. Here is the example commands presuming that you have the setup scripts for CMT in <tt>$HOME/cmthome</tt> directory.<br />
<br />
<pre><br />
% source $HOME/cmthome/setup.sh -tag=15.6.1,32<br />
% source $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt/setup.sh<br />
</pre><br />
<br />
Then start Ganga in <tt>$TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run</tt> directory.<br />
<br />
<pre><br />
% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
== Tutorial templates for quick start ==<br />
There are ready-to-go Ganga scripts made for this tutorial. Following the instructions below to copy them into your AMAAthena run directory:<br />
<br />
<pre><br />
% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/job_options/* .<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/ama_config/* .<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/ganga_scripts/* .<br />
</pre><br />
<br />
<ul><br />
<li><span style="background:#00FF00"><tt>data_D3PD_simple2.conf</tt></span>: simple AMA configuration file for converting AOD into D3PD and dumping NTuples<br />
<li><span style="background:#00FF00"><tt>data_D3PD_filler_v4.conf</tt></span>: part of the AMA configuration file included by <tt>data_D3PD_simple2.conf</tt><br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.local.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on local desktop<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.pbs.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on Stoomboot<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.lcg.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through gLite WMS<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.panda.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through Panda<br />
</ul><br />
<br />
Apart from the files mentioned above, there are also few files prepared so that one can submit the jobs right away. They are listed below. In general, those files are prepared by user as mentioned in the [[#Pre-configuration | Application pre-configuration]] below. <br />
<br />
<ul><br />
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_new.py</tt></span>: top-level AMAAthena job option file without AutoConfig/RecExCommon<br />
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_AUTO.py</tt></span>: top-level AMAAthena job option file with AutoConfig/RecExCommon<br />
<li><span style="background:#FFFF00"><tt>data_D3PD_simple2.py</tt></span>: user-level AMAAthena job option file converted from the AMA configuration file<br />
<li><span style="background:#FFFF00"><tt>rundef.py</tt></span>: run definition job option file of AMAAthena<br />
</ul><br />
<br />
With all those files ready in the <tt>run</tt> directory, you can just load one of the Ganga script and submit the analysis jobs right away.<br />
<br />
For example:<br />
<pre><br />
In [n]: execfile('ama_d3pd_maker.lcg.gpi')<br />
In [n]: j.submit()<br />
</pre><br />
<br />
will create and submit a LCG job to generate D3PD files using AMAAthena.<br />
<br />
The rest of the wiki will give you detail explanations on what has been done within those scripts.<br />
<br />
== Ganga jobs by yourself ==<br />
<br />
=== Ganga job creation ===<br />
The first step is to create a new (empty) job in Ganga, do<br />
<br />
<pre><br />
In [n]: j = Job()<br />
</pre><br />
<br />
and you can set job's name as<br />
<br />
<pre><br />
In [n]: j.name = 'my_ama_job'<br />
</pre><br />
<br />
=== Application configuration ===<br />
<br />
==== Pre-configuration ====<br />
AMAAthena is an Athena "Algorithm", so you can just use the Athena application plugin in Ganga to run AMAAthena. However, there are steps to be done before setting the Athena application object in Ganga:<br />
<br />
<ol><br />
<li>copy the top-level job option file of AMAAthena to your working directory:<br />
<ul><br />
<li> with AutoConfig/RecExCommon<br />
<pre>% get_files -jo AMAAthena_jobOptions_AUTO.py</pre><br />
<li> without AutoConfig/RecExCommon<br />
<pre>% get_files -jo AMAAthena_jobOptions_new.py</pre> <br />
</ul><br />
<li>convert user-level AMA configuration file into a Athena job option file. For example, if you have a configuration file called <tt>data_D3PD_simple2.conf</tt>, do:<br />
<pre>% AMAConfigfileConverter data_D3PD_simple2.conf data_D3PD_simple2.py</pre><br />
<li>create a AMA runtime definition job option called <tt>rundef.py</tt> and edit it as the following example:<br />
<pre><br />
SampleName = 'data09_900GeV_00140541_MuonswBeam'<br />
ConfigFile = 'data_D3PD_simple2.py'<br />
FlagList = ''<br />
EvtMax = -1<br />
AMAAthenaFlags = ['DATA', 'TRIG']<br />
</pre><br />
<br />
The variables in rundef.py is explained in the following:<br />
<ul><br />
<li>'''SampleName''': the user defined sample name. This name will be used in composing the AMA summary output files.<br />
<li>'''ConfigFile''': the job option file name converted from the user-level configuration file (the output of step 2)<br />
<li>'''FlagList''': legacy AMA flags<br />
<li>'''EvtMax''': the maximum number of event to be processed in the job<br />
<li>'''AMAAthenaFlags''': the additional AMA job option files to be included by the top-level AMA job option file. This is ignored if using AutoConfig/RecExCommon.<br />
</ul><br />
<br />
</ol><br />
<br />
==== Configuration ====<br />
Once you have the above steps done, you can proceed in Ganga to set up the Athena application:<br />
<br />
<pre><br />
In [n]: j.application = Athena()<br />
In [n]: j.application.max_events = -1<br />
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_new.py'), File('data_D3PD_simple2.py') ]<br />
In [n]: j.application.prepare()<br />
</pre><br />
<br />
The <tt>j.application.prepare()</tt> method automatically detects the input/output files by virtually run through the job option files given above. As the outputs are controlled internally by AMA, it's suggested to always add the following two lines to avoid possible confusion (e.g. with Panda). So if you run AMAAthena, always do the following lines after <tt>j.application.prepare()</tt>.<br />
<br />
<pre><br />
In [n]: j.application.atlas_run_config['output']['outHist'] = False<br />
In [n]: j.application.atlas_run_config['output']['alloutputs'] = []<br />
</pre><br />
<br />
==== Optional configurations ====<br />
===== Override default DBRelease =====<br />
By default, the job will pick up the DBRelease shipped together with the Athena release that you are using to run the job. In some cases, you may want to override it, for example, when you encounter the following error:<br />
<br />
<pre><br />
T_AthenaPoolCnv ERROR poolToObject: caught error: <br />
FID "74981861-8AD2-DE11-95BD-001CC466D3D3" is not existing in the catalog <br />
( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )<br />
</pre><br />
<br />
For local jobs, you can simply set the following lines in order to force the job to load a proper DBRelease version from certain area, presuming that the DBRelease area is on a shared file system:<br />
<br />
<pre><br />
In [n]: j.application.atlas_environment = ['ATLAS_DB_AREA=/data/atlas/offline/db', 'DBRELEASE_OVERRIDE=7.8.1']<br />
</pre><br />
<br />
For grid jobs, you cannot do that as you don't know the path on the remote machine in advance. To achieve it, one needs to do:<br />
<br />
<pre><br />
In [n]: j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v070801:DBRelease-7.8.1.tar.gz'<br />
In [n]: j.application.atlas_environment =['DBRELEASE_OVERRIDE=7.8.1']<br />
</pre><br />
<br />
where the <tt>j.application.atlas_dbrelease</tt> points the job to download the DBRelease tarball "<tt>DBRelease-7.8.1.tar.gz</tt>" in the ATLAS dataset "<tt>ddo.000001.Atlas.Ideal.DBRelease.v070801</tt>"; while <tt>j.application.atlas_environment</tt> enforces the Athena job to use its version of DBRelease instead of the default one.<br />
<br />
=== InputDataset configuration ===<br />
It is encouraged to enable [https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FileStager FileStager] with your analysis job as it has been proved to be more efficient in majority of cases. To do so, there are two InputDataset plugins in Ganga can be used depending on where the job will run on.<br />
<br />
==== <tt>'''StagerDataset'''</tt> for local jobs ====<br />
<br />
Presuming that you want to run over a dataset <tt>"data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268"</tt> located at <tt>"NIKHEF-ELPROD_DATADISK"</tt>, you can set the <tt>inputdata</tt> attribute of the Ganga job object as the following:<br />
<br />
<pre><br />
In [n]: config.DQ2.DQ2_LOCAL_SITE_ID = 'NIKHEF-ELPROD_DATADISK'<br />
In [n]: j.inputdata = StagerDataset()<br />
In [n]: j.inputdata.type = 'DQ2'<br />
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]<br />
</pre><br />
<br />
<span style="color:#800000"><u>'''Remarks'''</u></span><br />
<ul><br />
<li>Use <tt>StagerDataset</tt> only with <tt>Local</tt>, <tt>LSF</tt> and <tt>PBS</tt> backend plugins for local jobs.<br />
<li><tt>StagerDataset</tt> is restricted to copy files from the grid storage close to the computing node. You need to find the local location of the dataset in terms of DDM site name and set it properly with <tt>'''config.DQ2.DQ2_LOCAL_SITE_ID'''</tt><br />
</ul><br />
<br />
You can also use <tt>'''StagerDataset'''</tt> to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory <tt>/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt><br />
<br />
<pre><br />
In [n]: j.inputdata = StagerDataset()<br />
In [n]: j.inputdata.type = 'LOCAL'<br />
In [n]: j.inputdata.dataset = ['/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268']<br />
</pre><br />
<br />
==== <tt>'''DQ2Dataset'''</tt> for grid jobs ====<br />
<br />
Presuming you want to run on a dataset <tt>data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt> on the grid, you can set the <tt>InputDataset</tt> object as the following in Ganga:<br />
<br />
<pre><br />
In [n]: j.inputdata = DQ2Dataset()<br />
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]<br />
In [n]: j.inputdata.type = 'FILE_STAGER'<br />
</pre><br />
<br />
<span style="color:#800000"><u>'''Remarks'''</u></span><br />
<ul><br />
<li>Always use <tt>'''DQ2Dataset'''</tt> with '''Panda''' and '''LCG''' backends.<br />
</ul><br />
<br />
=== Splitter configuration ===<br />
The examples below ask each subjob to process on 2 files in maximum.<br />
<br />
==== <tt>'''StagerJobSplitter'''</tt> for <tt>'''StagerDataset'''</tt> ====<br />
<pre><br />
In [n]: j.splitter = StagerJobSplitter()<br />
In [n]: j.splitter.numfiles = 2<br />
</pre><br />
<br />
==== <tt>'''DQ2JobSplitter'''</tt> for <tt>'''DQ2Dataset'''</tt> ====<br />
<br />
<pre><br />
In [n]: j.splitter = DQ2JobSplitter()<br />
In [n]: j.splitter.numfiles = 2<br />
</pre><br />
<br />
=== Backend (platform) configuration ===<br />
You should be able to switch to different computing platform ('''Backend''' in Ganga terminology) by simply change the <tt>'''backend'''</tt> attribute of a job object. The available backends are:<br />
<br />
<ul><br />
<li><span style="background:#00FF00"><tt>'''Local'''</tt></span>: for running jobs locally right on your desktop<br />
<li><span style="background:#00FF00"><tt>'''PBS'''</tt></span>: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)<br />
<li><span style="background:#00FF00"><tt>'''LSF'''</tt></span>: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)<br />
<li><span style="background:#00FF00"><tt>'''LCG'''</tt></span>: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)<br />
<li><span style="background:#00FF00"><tt>'''Panda'''</tt></span>: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by [http://panda.cern.ch Panda]<br />
</ul><br />
<br />
For example, to switch to submit jobs to the grid through Panda:<br />
<br />
<pre><br />
In [n]: j.backend = Panda()<br />
</pre><br />
<br />
==== Local ====<br />
Ask the job to be executed locally right on the desktop. This is the default backend of a newly created Ganga job.<br />
<pre><br />
In [n]: j.backend = Local()<br />
</pre><br />
<br />
==== PBS ====<br />
ask job to be submitted to the "qlong" of the Stoomboot.<br />
<pre><br />
In [n]: j.backend = PBS()<br />
In [n]: j.backend.queue = 'qlong'<br />
</pre><br />
<br />
==== LSF ====<br />
Ask job to be submitted to the "1nh" (1 hour) queue on the lxbatch@CERN. You need to run it from lxplus@CERN. <br />
<pre><br />
In [n]: j.backend = LSF()<br />
In [n]: j.backend.queue = '1nh'<br />
</pre><br />
<br />
==== LCG ====<br />
Ask job to be submitted to a EGEE site wherever the dataset given above is available and with the queue supporting 12 hours long jobs.<br />
<pre><br />
In [n]: j.backend = LCG()<br />
In [n]: j.backend.requirements.cloud = 'ALL'<br />
In [n]: j.backend.requirements.walltime = 720<br />
</pre><br />
<br />
==== Panda ====<br />
Ask the job to be submitted to Panda and then being brokered to whatever site being able to process this job in the "US" cloud.<br />
<pre><br />
In [n]: j.backend = Panda()<br />
In [n]: j.backend.libds = ''<br />
In [n]: j.backend.requirements.cloud = 'US'<br />
</pre><br />
<br />
=== Job submission ===<br />
This is as simple as you can imagine:<br />
<pre><br />
In [n]: j.submit()<br />
</pre><br />
<br />
=== Job management ===<br />
Job management in Ganga is application independent; therefore you are referred to [http://www.nikhef.nl/pub/experiments/atlaswiki/index.php/Ganga_basic_usage#Basic_job_management Basic job management] where the basic job management functions are explained.<br />
<br />
== Helper scripts ==<br />
<br />
== Next step ==<br />
For more details about different Athena use cases, you can refer to the following twiki to get more information:<br />
<br />
* [https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorial Full GangaAtlas tutorial]: the up-to-date official tutorial wiki for GangaAtlas<br />
* [https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ GangaAtlas FAQ]: the Q&A page with questions collected by the global user analysis support team<br />
<br />
You are also encouraged to subscribe to [https://groups.cern.ch/group/hn-atlas-dist-analysis-help/default.aspx atlas-dist-analysis-help forum] where you can send GangaAtlas related issues to ask for supports from experts and/or global users. Take it as the HelpDesk of the global user analysis support.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=Ganga_with_AMAAthena&diff=2914Ganga with AMAAthena2010-02-08T11:31:10Z<p>Dgeerts@nikhef.nl: /* LSF */ Fixed a mistake</p>
<hr />
<div>== Introduction ==<br />
This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (local desktop, Stoomboot, lxbatch, Grid).<br />
<br />
== Preparation ==<br />
<ul><br />
<li>Make sure you can run AMAAthena standalone on local desktop. Here are instructions about doing it at NIKHEF: [[Using Athena at Nikhef | Using Athena at NIKHEF]]<br />
<li>Make sure you manage to submit HelloWorld jobs to different computing platforms. Here are instructions: [[Ganga_basic_usage | Ganga: basic usage]]<br />
</ul><br />
<br />
== Starting Ganga ==<br />
Before starting Ganga, set CMT environment properly. Here is the example commands presuming that you have the setup scripts for CMT in <tt>$HOME/cmthome</tt> directory.<br />
<br />
<pre><br />
% source $HOME/cmthome/setup.sh -tag=15.6.1,32<br />
% source $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt/setup.sh<br />
</pre><br />
<br />
Then start Ganga in <tt>$TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run</tt> directory.<br />
<br />
<pre><br />
% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run<br />
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh<br />
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef<br />
</pre><br />
<br />
== Tutorial templates for quick start ==<br />
There are ready-to-go Ganga scripts made for this tutorial. Following the instructions below to copy them into your AMAAthena run directory:<br />
<br />
<pre><br />
% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/job_options/* .<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/ama_config/* .<br />
% cp /project/atlas/nikhef/ganga/tutorial_2010/ganga_scripts/* .<br />
</pre><br />
<br />
<ul><br />
<li><span style="background:#00FF00"><tt>data_D3PD_simple2.conf</tt></span>: simple AMA configuration file for converting AOD into D3PD and dumping NTuples<br />
<li><span style="background:#00FF00"><tt>data_D3PD_filler_v4.conf</tt></span>: part of the AMA configuration file included by <tt>data_D3PD_simple2.conf</tt><br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.local.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on local desktop<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.pbs.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on Stoomboot<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.lcg.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through gLite WMS<br />
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.panda.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through Panda<br />
</ul><br />
<br />
Apart from the files mentioned above, there are also few files prepared so that one can submit the jobs right away. They are listed below. In general, those files are prepared by user as mentioned in the [[#Pre-configuration | Application pre-configuration]] below. <br />
<br />
<ul><br />
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_new.py</tt></span>: top-level AMAAthena job option file without AutoConfig/RecExCommon<br />
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_AUTO.py</tt></span>: top-level AMAAthena job option file with AutoConfig/RecExCommon<br />
<li><span style="background:#FFFF00"><tt>data_D3PD_simple2.py</tt></span>: user-level AMAAthena job option file converted from the AMA configuration file<br />
<li><span style="background:#FFFF00"><tt>rundef.py</tt></span>: run definition job option file of AMAAthena<br />
</ul><br />
<br />
With all those files ready in the <tt>run</tt> directory, you can just load one of the Ganga script and submit the analysis jobs right away.<br />
<br />
For example:<br />
<pre><br />
In [n]: execfile('ama_d3pd_maker.lcg.gpi')<br />
In [n]: j.submit()<br />
</pre><br />
<br />
will create and submit a LCG job to generate D3PD files using AMAAthena.<br />
<br />
The rest of the wiki will give you detail explanations on what has been done within those scripts.<br />
<br />
== Ganga jobs by yourself ==<br />
<br />
=== Ganga job creation ===<br />
The first step is to create a new (empty) job in Ganga, do<br />
<br />
<pre><br />
In [n]: j = Job()<br />
</pre><br />
<br />
and you can set job's name as<br />
<br />
<pre><br />
In [n]: j.name = 'my_ama_job'<br />
</pre><br />
<br />
=== Application configuration ===<br />
<br />
==== Pre-configuration ====<br />
AMAAthena is an Athena "Algorithm", so you can just use the Athena application plugin in Ganga to run AMAAthena. However, there are steps to be done before setting the Athena application object in Ganga:<br />
<br />
<ol><br />
<li>copy the top-level job option file of AMAAthena to your working directory:<br />
<ul><br />
<li> with AutoConfig/RecExCommon<br />
<pre>% get_files -jo AMAAthena_jobOptions_AUTO.py</pre><br />
<li> without AutoConfig/RecExCommon<br />
<pre>% get_files -jo AMAAthena_jobOptions_new.py</pre> <br />
</ul><br />
<li>convert user-level AMA configuration file into a Athena job option file. For example, if you have a configuration file called <tt>data_D3PD_simple2.conf</tt>, do:<br />
<pre>% AMAConfigfileConverter data_D3PD_simple2.conf data_D3PD_simple2.py</pre><br />
<li>create a AMA runtime definition job option called <tt>rundef.py</tt> and edit it as the following example:<br />
<pre><br />
SampleName = 'data09_900GeV_00140541_MuonswBeam'<br />
ConfigFile = 'data_D3PD_simple2.py'<br />
FlagList = ''<br />
EvtMax = -1<br />
AMAAthenaFlags = ['DATA', 'TRIG']<br />
</pre><br />
<br />
The variables in rundef.py is explained in the following:<br />
<ul><br />
<li>'''SampleName''': the user defined sample name. This name will be used in composing the AMA summary output files.<br />
<li>'''ConfigFile''': the job option file name converted from the user-level configuration file (the output of step 2)<br />
<li>'''FlagList''': legacy AMA flags<br />
<li>'''EvtMax''': the maximum number of event to be processed in the job<br />
<li>'''AMAAthenaFlags''': the additional AMA job option files to be included by the top-level AMA job option file. This is ignored if using AutoConfig/RecExCommon.<br />
</ul><br />
<br />
</ol><br />
<br />
==== Configuration ====<br />
Once you have the above steps done, you can proceed in Ganga to set up the Athena application:<br />
<br />
<pre><br />
In [n]: j.application = Athena()<br />
In [n]: j.application.max_events = -1<br />
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_new.py'), File('data_D3PD_simple2.py') ]<br />
In [n]: j.application.prepare()<br />
</pre><br />
<br />
The <tt>j.application.prepare()</tt> method automatically detects the input/output files by virtually run through the job option files given above. As the outputs are controlled internally by AMA, it's suggested to always add the following two lines to avoid possible confusion (e.g. with Panda). So if you run AMAAthena, always do the following lines after <tt>j.application.prepare()</tt>.<br />
<br />
<pre><br />
In [n]: j.application.atlas_run_config['output']['outHist'] = False<br />
In [n]: j.application.atlas_run_config['output']['alloutputs'] = []<br />
</pre><br />
<br />
==== Optional configurations ====<br />
===== Override default DBRelease =====<br />
By default, the job will pick up the DBRelease shipped together with the Athena release that you are using to run the job. In some cases, you may want to override it, for example, when you encounter the following error:<br />
<br />
<pre><br />
T_AthenaPoolCnv ERROR poolToObject: caught error: <br />
FID "74981861-8AD2-DE11-95BD-001CC466D3D3" is not existing in the catalog <br />
( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )<br />
</pre><br />
<br />
For local jobs, you can simply set the following lines in order to force the job to load a proper DBRelease version from certain area, presuming that the DBRelease area is on a shared file system:<br />
<br />
<pre><br />
In [n]: j.application.atlas_environment = ['ATLAS_DB_AREA=/data/atlas/offline/db', 'DBRELEASE_OVERRIDE=7.8.1']<br />
</pre><br />
<br />
For grid jobs, you cannot do that as you don't know the path on the remote machine in advance. To achieve it, one needs to do:<br />
<br />
<pre><br />
In [n]: j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v070801:DBRelease-7.8.1.tar.gz'<br />
In [n]: j.application.atlas_environment =['DBRELEASE_OVERRIDE=7.8.1']<br />
</pre><br />
<br />
where the <tt>j.application.atlas_dbrelease</tt> points the job to download the DBRelease tarball "<tt>DBRelease-7.8.1.tar.gz</tt>" in the ATLAS dataset "<tt>ddo.000001.Atlas.Ideal.DBRelease.v070801</tt>"; while <tt>j.application.atlas_environment</tt> enforces the Athena job to use its version of DBRelease instead of the default one.<br />
<br />
=== InputDataset configuration ===<br />
It is encouraged to enable [https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FileStager FileStager] with your analysis job as it has been proved to be more efficient in majority of cases. To do so, there are two InputDataset plugins in Ganga can be used depending on where the job will run on.<br />
<br />
==== <tt>'''StagerDataset'''</tt> for local jobs ====<br />
<br />
Presuming that you want to run over a dataset <tt>"data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268"</tt> located at <tt>"NIKHEF-ELPROD_DATADISK"</tt>, you can set the <tt>inputdata</tt> attribute of the Ganga job object as the following:<br />
<br />
<pre><br />
In [n]: config.DQ2.DQ2_LOCAL_SITE_ID = 'NIKHEF-ELPROD_DATADISK'<br />
In [n]: j.inputdata = StagerDataset()<br />
In [n]: j.inputdata.type = 'DQ2'<br />
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]<br />
</pre><br />
<br />
<span style="color:#800000"><u>'''Remarks'''</u></span><br />
<ul><br />
<li>Use <tt>StagerDataset</tt> only with <tt>Local</tt>, <tt>LSF</tt> and <tt>PBS</tt> backend plugins for local jobs.<br />
<li><tt>StagerDataset</tt> is restricted to copy files from the grid storage close to the computing node. You need to find the local location of the dataset in terms of DDM site name and set it properly with <tt>'''config.DQ2.DQ2_LOCAL_SITE_ID'''</tt><br />
</ul><br />
<br />
You can also use <tt>'''StagerDataset'''</tt> to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory <tt>/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt><br />
<br />
<pre><br />
In [n]: j.inputdata = StagerDataset()<br />
In [n]: j.inputdata.type = 'LOCAL'<br />
In [n]: j.inputdata.dataset = ['/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268']<br />
</pre><br />
<br />
==== <tt>'''DQ2Dataset'''</tt> for grid jobs ====<br />
<br />
Presuming you want to run on a dataset <tt>data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt> on the grid, you can set the <tt>InputDataset</tt> object as the following in Ganga:<br />
<br />
<pre><br />
In [n]: j.inputdata = DQ2Dataset()<br />
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]<br />
In [n]: j.inputdata.type = 'FILE_STAGER'<br />
</pre><br />
<br />
<span style="color:#800000"><u>'''Remarks'''</u></span><br />
<ul><br />
<li>Always use <tt>'''DQ2Dataset'''</tt> with '''Panda''' and '''LCG''' backends.<br />
</ul><br />
<br />
=== Splitter configuration ===<br />
The examples below ask each subjob to process on 2 files in maximum.<br />
<br />
==== <tt>'''StagerJobSplitter'''</tt> for <tt>'''StagerDataset'''</tt> ====<br />
<pre><br />
In [n]: j.splitter = StagerJobSplitter()<br />
In [n]: j.splitter.numfiles = 2<br />
</pre><br />
<br />
==== <tt>'''DQ2JobSplitter'''</tt> for <tt>'''DQ2Dataset'''</tt> ====<br />
<br />
<pre><br />
In [n]: j.splitter = DQ2JobSplitter()<br />
In [n]: j.splitter.numfiles = 2<br />
</pre><br />
<br />
=== Backend (platform) configuration ===<br />
You should be able to switch to different computing platform ('''Backend''' in Ganga terminology) by simply change the <tt>'''backend'''</tt> attribute of a job object. The available backends are:<br />
<br />
<ul><br />
<li><span style="background:#00FF00"><tt>'''Local'''</tt></span>: for running jobs locally right on your desktop<br />
<li><span style="background:#00FF00"><tt>'''PBS'''</tt></span>: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)<br />
<li><span style="background:#00FF00"><tt>'''LSF'''</tt></span>: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)<br />
<li><span style="background:#00FF00"><tt>'''LCG'''</tt></span>: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)<br />
<li><span style="background:#00FF00"><tt>'''Panda'''</tt></span>: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by [http://panda.cern.ch Panda]<br />
</ul><br />
<br />
For example, to switch to submit jobs to the grid through Panda:<br />
<br />
<pre><br />
In [n]: j.backend = Panda()<br />
</pre><br />
<br />
==== Local ====<br />
Ask the job to be executed locally right on the desktop. This is the default backend of a newly created Ganga job.<br />
<pre><br />
In [n]: j.backend = Local()<br />
</pre><br />
<br />
==== PBS ====<br />
ask job to be submitted to the "qlong" of the Stoomboot.<br />
<pre><br />
In [n]: j.backend = PBS()<br />
In [n]: j.queue = 'qlong'<br />
</pre><br />
<br />
==== LSF ====<br />
Ask job to be submitted to the "1nh" (1 hour) queue on the lxbatch@CERN. You need to run it from lxplus@CERN. <br />
<pre><br />
In [n]: j.backend = LSF()<br />
In [n]: j.backend.queue = '1nh'<br />
</pre><br />
<br />
==== LCG ====<br />
Ask job to be submitted to a EGEE site wherever the dataset given above is available and with the queue supporting 12 hours long jobs.<br />
<pre><br />
In [n]: j.backend = LCG()<br />
In [n]: j.backend.requirements.cloud = 'ALL'<br />
In [n]: j.backend.requirements.walltime = 720<br />
</pre><br />
<br />
==== Panda ====<br />
Ask the job to be submitted to Panda and then being brokered to whatever site being able to process this job in the "US" cloud.<br />
<pre><br />
In [n]: j.backend = Panda()<br />
In [n]: j.backend.libds = ''<br />
In [n]: j.backend.requirements.cloud = 'US'<br />
</pre><br />
<br />
=== Job submission ===<br />
This is as simple as you can imagine:<br />
<pre><br />
In [n]: j.submit()<br />
</pre><br />
<br />
=== Job management ===<br />
Job management in Ganga is application independent; therefore you are referred to [http://www.nikhef.nl/pub/experiments/atlaswiki/index.php/Ganga_basic_usage#Basic_job_management Basic job management] where the basic job management functions are explained.<br />
<br />
== Helper scripts ==<br />
<br />
== Next step ==<br />
For more details about different Athena use cases, you can refer to the following twiki to get more information:<br />
<br />
* [https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorial Full GangaAtlas tutorial]: the up-to-date official tutorial wiki for GangaAtlas<br />
* [https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ GangaAtlas FAQ]: the Q&A page with questions collected by the global user analysis support team<br />
<br />
You are also encouraged to subscribe to [https://groups.cern.ch/group/hn-atlas-dist-analysis-help/default.aspx atlas-dist-analysis-help forum] where you can send GangaAtlas related issues to ask for supports from experts and/or global users. Take it as the HelpDesk of the global user analysis support.</div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=NL_Cloud_Monitor_Instructions&diff=2863NL Cloud Monitor Instructions2010-01-12T15:19:26Z<p>Dgeerts@nikhef.nl: /* Panda Monitor (Production) */ Fixed a broken link</p>
<hr />
<div>== Introduction ==<br />
This page will give a step-by-step instruction for the shifters (of the ATLAS NL-cloud regional operation) to check through several key monitoring pages used by Atlas Distributed Computing (ADC). Those pages are also monitored by official ADC shifters (e.g. ADCoS, DAST).<br />
<br />
The general architecture of ADC operation is shown below. '''The shifters that we are concerning here is part of the "regional operation team". The contribution will be credited by OTSMU.'''<br />
<br />
[[Image:ADC Operation Architecture.png|thumb|center|640px|General architecture of the ADC operation]]<br />
<br />
== Shifter's duty ==<br />
The shifter-on-duty needs to follow the instructions below to:<br />
# check different monitoring pages regularly (3-4 times per day would be expected)<br />
# notify the NL cloud squad team accordingly via [mailto:adc-nl-cloud-support@nikhef.nl adc-nl-cloud-support@nikhef.nl].<br />
<br />
<!-- Shifters are requested to check those pages as regular as 3-4 times per day (morning, early afternoon, late afternoon/early evening) --><br />
<br />
== Things to monitor ==<br />
=== ADCoS eLog ===<br />
ADCoS eLog is mainly used by ADC experts and ADCoS shifters to log the actions taken on a site concerning a site issues. For example, removing/adding site from/into the ATLAS production system. The eLog entries related to NL-cloud can be found [https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/?Cloud=NL&mode=summary here].<br />
<br />
'''''The shifter has to notify the squad team if there are issues not being followed up for a long while (~24 hours).'''''<br />
<br />
=== DDM Dashboard ===<br />
[http://dashb-atlas-data.cern.ch/dashboard/request.py/site DDM Dashboard] is used for monitoring the data transfer activities between sites.<br />
<br />
The main monitoring page is explained below<br />
<br />
[[Image:Dashboard explained.png|thumb|center|640px|DDM Dashboard Explanation]]<br />
<br />
There are few things to note on this page:<br />
<br />
# the summary indicates the data transfer "TO" a particular cloud or site. For example, transfers from RAL to SARA is categorized to "SARA"; while transfers from SARA to RAL is catagorized to "RAL".<br />
# the cloud is label with its Tier-1 name, for example, "SARA" represents the whole transfers "TO" NL cloud.<br />
# it will be handy to remember that "yellow" bar indicates transfers to NL cloud.<br />
<br />
To check this page, here are few simple steps to follow:<br />
<br />
# look at the bottom-right plot (total transfer errors). If the yellow bar persists every hour with a significant number of errors. Go to check the summary table below.<br />
# To check the failed transfers to NL cloud, click on the "SARA" entry on the summary table. The table will be extended to show the detail transfers to the sites within NL cloud. From there you can see which site is in trouble.<br />
# When you identify the destination site of the problematic transfers, you can click on the "+" sign in front of the site, the table will be extended again to show the "source site" of the transfers. By clicking on the number of the transfer errors showing on the table (the 4th column from the end), the error message will be presented. A graphic instruction of those steps is shown below.<br />
<br />
[[Image:DDM find error msg.png|thumb|center|640px|Steps to trace down to the transfer error messages]]<br />
<br />
'''''The shifter has to report the problem to the [mailto:adc-nl-cloud-support@nikhef.nl NL squad team] when the number of the error is high.'''''<br />
<br />
The shifter can ignore reporting problems in case of:<br />
# the error message indicates that it's a "SOURCE" error (you can see it on the error message).<br />
# site is in downtime. The downtime schedule can be found here: http://lxvm0350.cern.ch:12409/agis/calendar/<br />
# the same error that has been reported earlier during your shift.<br />
<br />
=== Panda Monitor (Production) ===<br />
[http://panda.cern.ch:25980/server/pandamon/query?dash=prod&reload=yes The Panda Monitor for ATLAS production] is used for monitoring Monte Carlo simulation and data reprocessing jobs on the grid.<br />
<br />
The graphic explanation of the main page is given below.<br />
<br />
[[Image:Panda Monitor Explained.png|thumb|center|640px|Explanation of the Panda main page]]<br />
<br />
Here are few simple steps to follow for checking on this page:<br />
<br />
# firstly check the number of active tasks in NL cloud versus the running jobs in NL cloud. If the number of active tasks to NL cloud is non-zero; but there is no running jobs. Something is wrong and the shifter should notify [mailto:adc-nl-cloud-support@nikhef.nl the NL squad team] to have a look.<br />
# then look at the job statistics table below. The statistics is summarized by cloud. The first check is on the last column of the NL row indicating the overall job failure rate in past 12 hours. If the number is too high (e.g > 30%), go through the following instructions to get one failed job.<br />
<br />
'''How to get job failure'''<br />
<br />
Here are instructions to get failed jobs.<br />
<br />
# click on the number of failed jobs per site on the summary table. This will guide you to the list of failed jobs with error details. <br />
# try to categorize the failed jobs with the error details and pick up one example job per failure category.<br />
# report to NL squad team with the failure category and the link to an example job per category.<br />
<br />
The following picture shows the graphic illustration of those steps.<br />
<br />
[[Image:Panda find job error.png|thumb|center|640px|Finding example job and job error in Panda]]<br />
<br />
=== Panda Monitor (Analysis) ===<br />
The instruction to check Panda analysis job monitor is similar to what has been mentioned in [[#Panda Monitor (Production)]].<br />
<br />
=== GangaRobot ===<br />
[http://gangarobot.cern.ch/ GangaRobot] is a site functional test for running analysis jobs. Sites fail to pass one of the regular tests in past 12 hours will be blacklisted. The user analysis jobs submitted through the gLite Workload Management System (WMS) from [http://cern.ch/ganga Ganga] will be instrumented to avoid being assigned to those problematic sites.<br />
<br />
The currently blacklisted site can be found [http://gangarobot.cern.ch/blacklist.html here].<br />
<br />
'''The shifter has to notify the [mailto:adc-nl-cloud-support@nikhef.nl NL squad team] when any one of the sites in NL cloud shows up in the list.'''<br />
<br />
== Shifters' calendar ==<br />
<!-- here shows the shifter-on-duty --><br />
<br />
== Quick links ==<br />
<!-- quick links to key monitoring pages --><br />
* [mailto:adc-nl-cloud-support@nikhef.nl Contact NL squad team]<br />
* [https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/?Cloud=NL ADC eLog (NL related entries)]<br />
* [http://dashb-atlas-data.cern.ch/dashboard/request.py/site DDM Dashboard (last 4 hours overview)]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=prod&reload=yes Panda production jobs monitoring]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=analysis&reload=yes Panda analysis jobs monitoring]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=clouds&reload=yes#NL Panda queues for NL-cloud sites]<br />
* [http://gangarobot.cern.ch/blacklist.html GangaRobot site blacklist]<br />
<!-- some useful links to ADCoS tutorial wikis that are used for official ADCoS shifters --></div>Dgeerts@nikhef.nlhttps://wiki.nikhef.nl/atlas/index.php?title=NL_Cloud_Monitor_Instructions&diff=2843NL Cloud Monitor Instructions2010-01-12T15:01:39Z<p>Dgeerts@nikhef.nl: /* Panda Monitor (Production) */ Fixed typoes</p>
<hr />
<div>== Introduction ==<br />
This page will give a step-by-step instruction for the shifters (of the ATLAS NL-cloud regional operation) to check through several key monitoring pages used by Atlas Distributed Computing (ADC). Those pages are also monitored by official ADC shifters (e.g. ADCoS, DAST).<br />
<br />
The general architecture of ADC operation is shown below. '''The shifters that we are concerning here is part of the "regional operation team". The contribution will be credited by OTSMU.'''<br />
<br />
[[Image:ADC Operation Architecture.png|thumb|center|640px|General architecture of the ADC operation]]<br />
<br />
== Shifter's duty ==<br />
The shifter-on-duty needs to follow the instructions below to:<br />
# check different monitoring pages regularly (3-4 times per day would be expected)<br />
# notify the NL cloud squad team accordingly via [mailto:adc-nl-cloud-support@nikhef.nl adc-nl-cloud-support@nikhef.nl].<br />
<br />
<!-- Shifters are requested to check those pages as regular as 3-4 times per day (morning, early afternoon, late afternoon/early evening) --><br />
<br />
== Things to monitor ==<br />
=== ADCoS eLog ===<br />
ADCoS eLog is mainly used by ADC experts and ADCoS shifters to log the actions taken on a site concerning a site issues. For example, removing/adding site from/into the ATLAS production system. The eLog entries related to NL-cloud can be found [https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/?Cloud=NL&mode=summary here].<br />
<br />
'''''The shifter has to notify the squad team if there are issues not being followed up for a long while (~24 hours).'''''<br />
<br />
=== DDM Dashboard ===<br />
[http://dashb-atlas-data.cern.ch/dashboard/request.py/site DDM Dashboard] is used for monitoring the data transfer activities between sites.<br />
<br />
The main monitoring page is explained below<br />
<br />
[[Image:Dashboard explained.png|thumb|center|640px|DDM Dashboard Explanation]]<br />
<br />
There are few things to note on this page:<br />
<br />
# the summary indicates the data transfer "TO" a particular cloud or site. For example, transfers from RAL to SARA is categorized to "SARA"; while transfers from SARA to RAL is catagorized to "RAL".<br />
# the cloud is label with its Tier-1 name, for example, "SARA" represents the whole transfers "TO" NL cloud.<br />
# it will be handy to remember that "yellow" bar indicates transfers to NL cloud.<br />
<br />
To check this page, here are few simple steps to follow:<br />
<br />
# look at the bottom-right plot (total transfer errors). If the yellow bar persists every hour with a significant number of errors. Go to check the summary table below.<br />
# To check the failed transfers to NL cloud, click on the "SARA" entry on the summary table. The table will be extended to show the detail transfers to the sites within NL cloud. From there you can see which site is in trouble.<br />
# When you identify the destination site of the problematic transfers, you can click on the "+" sign in front of the site, the table will be extended again to show the "source site" of the transfers. By clicking on the number of the transfer errors showing on the table (the 4th column from the end), the error message will be presented. A graphic instruction of those steps is shown below.<br />
<br />
[[Image:DDM find error msg.png|thumb|center|640px|Steps to trace down to the transfer error messages]]<br />
<br />
'''''The shifter has to report the problem to the [mailto:adc-nl-cloud-support@nikhef.nl NL squad team] when the number of the error is high.'''''<br />
<br />
The shifter can ignore reporting problems in case of:<br />
# the error message indicates that it's a "SOURCE" error (you can see it on the error message).<br />
# site is in downtime. The downtime schedule can be found here: http://lxvm0350.cern.ch:12409/agis/calendar/<br />
# the same error that has been reported earlier during your shift.<br />
<br />
=== Panda Monitor (Production) ===<br />
[http://panda.cern.ch:25980/server/pandamon/query?dash=prod&reload=yes The Panda Monitor for ATLAS production] is used for monitoring Monte Carlo simulation and data reprocessing jobs on the grid.<br />
<br />
The graphic explanation of the main page is given below.<br />
<br />
[[Image:Panda Monitor Explained.png|thumb|center|640px|Explanation of the Panda main page]]<br />
<br />
Here are few simple steps to follow for checking on this page:<br />
<br />
# firstly check the number of active tasks in NL cloud versus the running jobs in NL cloud. If the number of active tasks to NL cloud is non-zero; but there is no running jobs. Something is wrong and the shifter should notify [adc-nl-cloud-support@nikhef.nl the NL squad team] to have a look.<br />
# then look at the job statistics table below. The statistics is summarized by cloud. The first check is on the last column of the NL row indicating the overall job failure rate in past 12 hours. If the number is too high (e.g > 30%), go through the following instructions to get one failed job.<br />
<br />
'''How to get job failure'''<br />
<br />
Here are instructions to get failed jobs.<br />
<br />
# click on the number of failed jobs per site on the summary table. This will guide you to the list of failed jobs with error details. <br />
# try to categorize the failed jobs with the error details and pick up one example job per failure category.<br />
# report to NL squad team with the failure category and the link to an example job per category.<br />
<br />
The following picture shows the graphic illustration of those steps.<br />
<br />
[[Image:Panda find job error.png|thumb|center|640px|Finding example job and job error in Panda]]<br />
<br />
=== Panda Monitor (Analysis) ===<br />
The instruction to check Panda analysis job monitor is similar to what has been mentioned in [[#Panda Monitor (Production)]].<br />
<br />
=== GangaRobot ===<br />
[http://gangarobot.cern.ch/ GangaRobot] is a site functional test for running analysis jobs. Sites fail to pass one of the regular tests in past 12 hours will be blacklisted. The user analysis jobs submitted through the gLite Workload Management System (WMS) from [http://cern.ch/ganga Ganga] will be instrumented to avoid being assigned to those problematic sites.<br />
<br />
The currently blacklisted site can be found [http://gangarobot.cern.ch/blacklist.html here].<br />
<br />
'''The shifter has to notify the [mailto:adc-nl-cloud-support@nikhef.nl NL squad team] when any one of the sites in NL cloud shows up in the list.'''<br />
<br />
== Shifters' calendar ==<br />
<!-- here shows the shifter-on-duty --><br />
<br />
== Quick links ==<br />
<!-- quick links to key monitoring pages --><br />
* [mailto:adc-nl-cloud-support@nikhef.nl Contact NL squad team]<br />
* [https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/?Cloud=NL ADC eLog (NL related entries)]<br />
* [http://dashb-atlas-data.cern.ch/dashboard/request.py/site DDM Dashboard (last 4 hours overview)]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=prod&reload=yes Panda production jobs monitoring]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=analysis&reload=yes Panda analysis jobs monitoring]<br />
* [http://panda.cern.ch:25980/server/pandamon/query?dash=clouds&reload=yes#NL Panda queues for NL-cloud sites]<br />
* [http://gangarobot.cern.ch/blacklist.html GangaRobot site blacklist]<br />
<!-- some useful links to ADCoS tutorial wikis that are used for official ADCoS shifters --></div>Dgeerts@nikhef.nl