Difference between revisions of "VL-e Resource Guide"
Line 122: | Line 122: | ||
== Monitoring == | == Monitoring == | ||
+ | |||
+ | There are various ways to get information about the current state of the Grid and your grid jobs. | ||
+ | |||
+ | * Sara uses Ganglia: | ||
+ | ** [http://ganglia.sara.nl/?c=Matrix%20Cluster&m=&r=hour&s=descending&hc=4 Matrix cluster load for last hour] | ||
+ | ** [http://ganglia.sara.nl/addons/job_monarch/?c=Matrix%20Cluster Matrix joblist report] | ||
+ | * Sara publishes the [http://www.sara.nl/systemstatus/systemstatus_eng.php3 system status]. | ||
+ | |||
== Documentation == | == Documentation == | ||
Line 127: | Line 135: | ||
=== Links === | === Links === | ||
− | + | ;[[Media:GridTutorial2006.pdf]]:The Grid Tutorial handouts 2006 | |
− | |||
− | |||
− | |||
− |
Revision as of 09:46, 24 August 2007
This document focuses on listing the grid resources that are available for VL-e project members. It does not explain how to use these resources; for that, you should consult the Grid Tutorial documentation. It is recommended that new users come to the Grid Tuturial, which is held every year.
The grid resources for VL-e are provided jointly by Nikhef and Sara, who participate in a larger framework for grid computing worldwide, in particular for the high energy physics experiments of the LHC. The grid middleware is provided by the European EGEE project. As a national project, VL-e has to share the resources with other applications.
Nikhef and Sara play a somewhat different role: Nikhef focuses mainly on computational clusters, while Sara has a tape storage facility for long-term data storage.
Computing
The clusters are accessible through the gLite stack of software developed by the EGEE project, based largely on Globus. You can use the command-line tools, as explained in the Grid Tutorial.
The following list shows the compute elements that are available to VL-e VOs. This information can be retrieved with
lcg-infosites --vo pvier.
and will vary according to per-VO configurations at the sites. The information system can be publicly queried by standard ldap tools, such as ldapsearch:
ldapsearch -x -H ldap://bdii03.nikhef.nl:2170/ -b 'mds-vo-name=NIKHEF-ELPROD,mds-vo-name=local,o=grid'
An exellent ldap browser (written in Java) is found here[1]. To browse the information system, connect to bdii03.nikhef.nl on port 2170 and enter 'o=grid' as the base dn.
Resource name | maximum wall time | comments |
---|---|---|
tbn20.nikhef.nl:2119/jobmanager-pbs-qlong | 30 hours | The Nikhef cluster has approx. 400 CPUs |
tbn20.nikhef.nl:2119/jobmanager-pbs-qshort | 5 hours | |
mu6.matrix.sara.nl:2119/jobmanager-pbs-short | 4 hours | The Matrix cluster will be gone soon |
mu6.matrix.sara.nl:2119/jobmanager-pbs-medium | 33 hours | |
ce.gina.sara.nl:2119/jobmanager-pbs-short | 4 hours | GINA (Grid In Almere) has 128 CPUs |
ce.gina.sara.nl:2119/jobmanager-pbs-medium | 33 hours |
Your VO affiliation greatly affects your ability to run jobs. While the dutch sites support the VL-e VOs, they also support many other VOs on the same infrastructure. This leads to competition for cycles. The mechanism to address this issue is to allow a fair share of the cycles to be used by a VO, and to give higher priority to VOs who have used little of their fair share in the last period.
The exact calculations of the fair share are somewhat of a black art.
Running tests and debugging
If your grid jobs are not behaving as expected, debugging can be a really frustrating ordeal. You have no way to inspect a running job up close, and the turnaround for each modification to your job is high.
To shorten the turnaround, you may request (from mailto:grid.support@sara.nl) the privilege to use the express queues on the GINA an Matrix clusters. The restriction is that jobs may last no longer than a couple of minutes.
If this is still not enough, and an application requires serious testing and debugging, you may request the use of the VL-e P4 Certification Test Bed. Contact mailto:vle-pfour-team@lists.vl-e.nl for support.
Storage
Grid Storage Elements can be discovered with the command
lcg-infosites --vo pvier se
(where you should replace pvier by the name of your own VO).
Avail Space(Kb) | Used Space(Kb) | SEs | Remarks |
---|---|---|---|
482906560 | 1587920944 | tbn15.nikhef.nl | classic SE; don't use this old one. |
1710000000 | 137730 | tbn18.nikhef.nl | Modern DPM system with SRM interface; use this one. |
317044396 | 539916244 | mu2.matrix.sara.nl | DCache system with SRM interface |
Note that the numbers may be different for other VOs, and there may actually be fewer SEs showing up. If you need more disk quota, please contact grid support (mailto:grid.support@nikhef.nl or mailto:grid.support@sara.nl).
SRB
Besides these SRM enabled systems, there is a SRB system provided by Sara. Unfortunately it cannot be used with the standard grid tools for logical file names, replicas, etc.; however, there is a gridftp front end. You need to obtain an SRB account by contacting mailto:grid.support@sara.nl (see the quickstart guide).
To use Srb, create a directory called .srb in your home directory and create a dummy file .MdasAuth (contents are irrelevant) and a file .MdasEnv:
mdasCollectionName '/VLENL/home/your.name.vlenl' mdasCollectionHome '/VLENL/home/your.name.vlenl' mdasDomainName 'vlenl' mdasDomainHome 'vlenl' srbUser 'your.name' srbHost 'srb.grid.sara.nl' srbPort '50000' defaultResource 'vleGridStore' AUTH_SCHEME 'GSI_AUTH' SERVER_DN '/O=dutchgrid/O=hosts/OU=sara.nl/CN=srb.grid.sara.nl'
Replace your.name with your SRB account name.
The web-interface to SRB can be reached here [2].
There is a grid-ftp frontend to SRB on the same server on port 50097. This can be accessed (for example) with uberftp :
(create proxy as usual) uberftp -P 50097 srb.grid.sara.nl
Tape storage
There is no default storage to tape. If you need tape storage, you have to explicitly request it. Sara can provide tape storage with various regimes, such as automatic disk-to-tape migration. Contact mailto:grid.support@sara.nl for more information.
about data replication
The Grid Tutorial explains how you can replicate your data to multiple storage elements, so that when you submit a job that needs this data, the resource broker may find a compute element 'close' to one of your copies. The current situation for VL-e members is such that all clusters on which a job may land are within the Netherlands. Thus, there is no gain in having more than one replica of your data around to improve the transfer efficiency.
Grid Access
Accessing the Grid means using grid tool; installing these tools can be tricky. Currently, there are a number of choices:
- ui.grid.sara.nl, a centrally provided machine at Sara. Contact mailto:grid.support@sara.nl to obtain a local account.
- install the VL-e PoC distribution (See the VL-e PoC distribution page.)
- install the VMWare image of the VL-e PoC distribution.
Monitoring
There are various ways to get information about the current state of the Grid and your grid jobs.
- Sara uses Ganglia:
- Sara publishes the system status.
Documentation
Links
- Media:GridTutorial2006.pdf
- The Grid Tutorial handouts 2006