Difference between revisions of "Introduction Grid Computing Lab Course Overview"

From PDP/Grid Wiki
Jump to navigationJump to search
m
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
= Structure =
 
= Structure =
  
The aim of the lab courses will be to install, deploy and operate a mini-grid, with some applications and services. The entire minigrid will be build and run by the students partipating in the course (of course with some help from the tutors).
+
The aim of the lab courses will be to install, deploy and operate a mini-grid, with some applications and services. The entire minigrid will be build and run by students partipating in the course (of course with some help from the tutors). At the end of the lab course you'll know what a grid is, be able to build one, and what's needed to make it useable by applications.
At the end of the lab course you'll know what a grid is, be able to build one,
 
and what is needed to make it useful for applications.
 
  
A grid contains a few components that you cannot do without:
+
There are two tracks in the lab course:
* a common trust domain (authentication)
+
* build a mini-grid from scratch (for advanced students). The list of projects and literature in section 2 makes - when put together - a basic mini-grid. A team of 2-3 sudents can pick up a topic in the list below (items 1-6) and provide that as a service to the others. All students in this track should look at the authentication part (section 1)
* communities of resources and users (authorization)
+
* paced grid services tutorial with Globus Toolkit 4 "GT4" (for 'new' students). Section 3 has the references to tutorial and background material. GT4 is already pre-installed on some systems for you. Based on these services, you can build a small application using the latest Web Services and Grid protocols.
* an information service
 
and of course some services to make the grid useful, like
 
* a job submission service
 
* data movement or indexing
 
* workload management
 
* database access
 
* your favourice custom services ...
 
  
For each of these, literature and documentation are given below, together with one or two projects (assignments) to be picked up by a team of students (say, 2-3 students per project).
+
=== Important Notes and AUP ===
 +
 
 +
* Before you start, read the [[Introduction_Grid_Computing_Lab_Course_AUP|Lab Course Acceptable Use Policy]]. You need to comply with it in order to get graded for the course. When in doubt, ask any of the tutors.
 +
* Keep a logbook (either electronically, on paper, or whatever). You'll appreciate it when you try to reproduce your results, or when a disk crashes. You will also need it to write your project result paper.
 +
* grading is integrated with the IGC lecture series and will be explained in the first IGC lecture on Monday
 +
 
 +
= Building the mini-grid =
  
 
== Authentication ==
 
== Authentication ==
Line 23: Line 20:
  
 
Commercial providers, like [http://www.verisign.com/products-services/security-services/ Verisign], [http://www.thawte.com/ Thawte], or [http://www.entrust.com/ Entrust], operate a Certification Authority and sell[http://www.ietf.org/html.charters/pkix-charter.html X.509] public key certificates.
 
Commercial providers, like [http://www.verisign.com/products-services/security-services/ Verisign], [http://www.thawte.com/ Thawte], or [http://www.entrust.com/ Entrust], operate a Certification Authority and sell[http://www.ietf.org/html.charters/pkix-charter.html X.509] public key certificates.
 +
 +
CAcert.org provides free certificates based on face-to-face meetings and a web of trust.
  
 
You can also setup an X.509 Certification Authority (CA) yourself. The simplest is to use the [http://www.openssl.org/ OpenSSL] commands, that even come with a shellscript to automate the task. More complete functionality can be found in [http://www.openca.org/ OpenCA].  
 
You can also setup an X.509 Certification Authority (CA) yourself. The simplest is to use the [http://www.openssl.org/ OpenSSL] commands, that even come with a shellscript to automate the task. More complete functionality can be found in [http://www.openca.org/ OpenCA].  
 
Recent version of the [http://www.globus.org/toolkit/ Globus Toolkit] also come with a package called "<tt>globus-simple-ca</tt>".
 
Recent version of the [http://www.globus.org/toolkit/ Globus Toolkit] also come with a package called "<tt>globus-simple-ca</tt>".
  
Establishing a trust domain is non-trivial (see, e.g., the [http://www.eugridpma.org/ EUGridPMA] or [http://www.gridpma.org/ IGTF] web sites), and it raises issues like validity period of the certificates, revocation lists or[http://www.ietf.org/rfc/rfc2459.txt CRLs], and on-line status checking via [http://www.faqs.org/rfcs/rfc2560.html OCSP].
+
But there is more to authentication than just issuing certificates to users and hosts. Keys can be compromised or lost, the data in the certificate may become invalid, etc. These issues must be considered, also for the course's CA service.
  
 +
=== Literature ===
 +
 +
* [http://ca.dutchgrid.nl/info/CA_gymnastics.html CA Gymnastics] - experiments in certificate maniopulation
 +
* <b>on the blackboard site -- Course Documents -- Lab Course docs:</b> ITU-T X.500 Document series, Open Systems Interconnection — The Directory: Overview of Concepts, Models, and Services, Recommendation X.500, ISO/IEC 9594-1
 +
* [https://forge.gridforum.org/projects/caops-wg Global Grid Forum CAOPS-WG]
 +
* [http://www.ietf.org/rfc/rfc2459.txt RFC 2459], Certificate Revocation Lists
 +
* [http://www.ietf.org/rfc/rfc2560.txt RFC 2560], On-line Certificate Status Protocol
 +
* [http://www.ietf.org/rfc/rfc3647.txt RFC 3647], Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework
 +
* [http://www.eugridpma.org/ EUGridPMA], the European Grid Authentication Policy Management Authority in e-Science
 +
* [http://www.gridpma.org/ IGTF], the International Grid Trust Federation
 +
* [http://www.eugridpma.org/guidelines/EUGridPMA-minreq-classic-20050128-3-2.htm Minimum CA Requirements] for Traditional X.509 Public Key Certification Authorities with secured infrastructure.
 +
* [http://www.e-irg.org/publ/2004-Dublin-eIRG-whitepaper.pdf e-IRG White Paper (Dublin)], e-Infrastructure Reflection Group 2004, (specifically section 5).
 +
* [http://specs.xmlsoap.org/ws/2005/02/trust/WS-Trust.pdf WS-Trust] Web Services Trust Language, (defines extensions that build on WS-Security to provide a framework for requesting and issuing security tokens, and to broker trust relationships).
 +
* [http://www.cert.dfn.de/dfn/berichte/db089/ DFN bericht nr. 89: Aufbau und Betrieb einer Zertifizierungsinstanz]
  
 
=== Project proposals ===
 
=== Project proposals ===
 +
* '''For everyone''': try to setup your own mini-CA, issue a cert to yourself and a friend, and try to setup an authentication connection between the two of you (use openssl s_client and s_server)
 +
 
* Build a simple CA service, e.g. based on OpenSSL, that can be used by your fellow students to obtain certificates.  
 
* Build a simple CA service, e.g. based on OpenSSL, that can be used by your fellow students to obtain certificates.  
  
Line 38: Line 53:
  
 
* Integrate on-line checks in a piece of middleware (optional)
 
* Integrate on-line checks in a piece of middleware (optional)
 
  
 
== Authorization ==
 
== Authorization ==
Line 49: Line 63:
 
This MyProxy service is required for portal operations.
 
This MyProxy service is required for portal operations.
  
==== Literature ====
+
=== Literature ===
 
* [ftp://ftp.globus.org/pub/globus/papers/security.pdf A Security Architecture for Computational Grids], I. Foster, et al. 5th ACM Conference on Computer and Communications Security, 1998.
 
* [ftp://ftp.globus.org/pub/globus/papers/security.pdf A Security Architecture for Computational Grids], I. Foster, et al. 5th ACM Conference on Computer and Communications Security, 1998.
 +
* [https://www.cs.tcd.ie/coghlan/pubs/webcom-grid-security.pdf Bridging Secure WebCom and European DataGrid Security for Multiple VOs over Multiple Grids], David W. O’Callaghan and Brian A. Coghlan, ISPDC'04, Cork (Ireland), 5-7 July, 2004.
 +
* [http://marianne.in2p3.fr/datagrid/documentation/ldap-doc.pdf VO Server Information], J.A. Templon@nikhef.nl and D.Groep, EDG internal note, October 2001.
 +
* [http://www.nikhef.nl/grid/lcaslcmaps/publications/edg-security_paper.pdf Authentication and Authorization Mechanisms for Multi-Domain Grid Environments], L. Cornwall et al., J. Grid Comput. 2(4): 301-311 (2004).
 +
* [http://www.nikhef.nl/grid/lcaslcmaps/publications/chep03.pdf Managing Dynamic User Communities in a Grid of Autonomous Resources], R. Alfieri et al, CHEP 2003, CoRR cs.DC/0306004: (2003).
 +
* [http://www.nikhef.nl/grid/lcaslcmaps/publications/edg-wp4_paper.pdf Autonomic Management of Large Clusters and Their Integration into the Grid], T. Roeblitz et al., J. Grid Comput. 2(3): 247-260 (2004).
 
* [http://www.globus.org/research/papers/CAS_2002_Revised.pdf A Community Authorization Service for Group Collaboration], L. Pearlman et al. IEEE Workshop on Policies for Distributed Systems and Networks, 2002.
 
* [http://www.globus.org/research/papers/CAS_2002_Revised.pdf A Community Authorization Service for Group Collaboration], L. Pearlman et al. IEEE Workshop on Policies for Distributed Systems and Networks, 2002.
 +
* [http://www.globus.org/toolkit/docs/4.0/admin/docbook/ch14.html GT4 Community Authorization Service (CAS) Administrators Guide], The Globus Alliance, 2005.
 
* [http://zuni.cs.vt.edu/publications/PRIMA-2003.pdf The PRIMA System for Privilege Management, Authorization and Enforcement in Grid Environments]. M. Lorch et al. Grid2003.
 
* [http://zuni.cs.vt.edu/publications/PRIMA-2003.pdf The PRIMA System for Privilege Management, Authorization and Enforcement in Grid Environments]. M. Lorch et al. Grid2003.
* [http://www.globus.org/research/papers/myproxy.pdf An Online Credential Repository for the Grid: MyProxy. ]J. Novotny, et al. Proceedings of the 10th IEEE Symposium on High Performance Distributed Computing (HPDC 10), 2001.
+
* [http://isoc.nl/activ/2005-Globus-Siebenlist.pdf Grid and Globus Security], Frank Siebenlist, ISOC/GridForum Nederland Masterclass, Amsterdam, July 2005.
 +
* [http://www.globus.org/research/papers/myproxy.pdf An Online Credential Repository for the Grid: MyProxy], J. Novotny, et al. Proceedings of the 10th IEEE Symposium on High Performance Distributed Computing (HPDC 10), 2001.
 +
* [https://forge.gridforum.org/projects/ogsa-sec-wg/ GGF OGSA Security WG]
 +
* [http://www.e-irg.org/publ/2004-Den-Haag-eIRG-whitepaper.pdf e-IRG White Paper (Den Haag)], e-Infrastructure Reflection Group 2005, (specifically sections 4 and 5).
  
 
=== Project proposals ===
 
=== Project proposals ===
Line 63: Line 86:
 
A grid consists of many autonomous resources, that come and go. A resource information system to find the resources available for you is therefor vitally important. The system must be stable, scalable to several hunderd sites, hunderds of queries per second, and universally understood.
 
A grid consists of many autonomous resources, that come and go. A resource information system to find the resources available for you is therefor vitally important. The system must be stable, scalable to several hunderd sites, hunderds of queries per second, and universally understood.
  
Information systems have evolved significantly over the years. The Globus Toolkit shipped originally with the  "Metacomputing Directory Service" (later renamed to Monitoring and Discovery Service, MDS). The information was presented via an LDAP interface with a proprietary schema.  This system later evolved into the [http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/internal/chep04/LCG-Info-System.pdf BDII] for increased performance.
+
Information systems have evolved significantly over the years. The Globus Toolkit shipped originally with the  "Metacomputing Directory Service" (later renamed to Monitoring and Discovery Service, MDS). The information was presented via an LDAP interface with a proprietary schema.  The EU DataGrid [http://www.edg.org/] and the LHC Computing Grid Project [http://cern.ch/lcg] evolved this system later into the [http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/internal/chep04/LCG-Info-System.pdf Berkeley Database Information Index (BDII)] for increased performance
 +
and stability.
  
R-GMA (a relational implementation of the GGF Grid Monitoring Archirecture [REF]) uses a structure, SQL based virtual database across all sites in the grid to propagae information in a producer-consumer paradigm.
+
R-GMA [http://www.r-gma.org/] (a relational implementation of the GGF Grid Monitoring Archirecture) uses a structured, SQL based 'virtual database' across all sites in the grid to propagate information in a producer-consumer paradigm.  
  
The Web services based GT4 release contains a completely new version of MDS, that's based on a notification/subscription mechanism and the new WS-notification standards.
+
The Web services based GT4 release contains a completely new version of MDS, that's based on a notification/subscription mechanism that are part of the WS-Resource Framework set of specifications.
[ADD REFs]. And with [http://www.cs.wisc.edu/condor/ Condor] you get it's own monitoring system Hawkeye.
 
  
Note also the existence of UDDI [ref], but that is a registry only (not an information system, look and find out why!)
+
And with [http://www.cs.wisc.edu/condor/ Condor] you get it's own monitoring system Hawkeye.
  
Essential for all of them is a common way to express the information in a schema. There are many schema in use. The most popular today is the GLUE schema.
+
Note also the existence of UDDI [http://www.uddi.org/], but that is a registry only (not an information or disvcovery service, the [http://www.w3.org/DesignIssues/WebServices.html W3C Web Services Design Issues] page by Tim BL has some details.
 +
 
 +
Essential for any information system is a common way to express the information in a schema so that others understand the content and meaning of the information contained therein. There are many schema in use. The most popular one today in production grids is the GLUE schema.  
  
 
Besides there are various management presentation tools like GridICE, MapCenter, GOC Monitor &c.
 
Besides there are various management presentation tools like GridICE, MapCenter, GOC Monitor &c.
  
==== Literature ====
+
=== Literature ===
 
* [http://doi.ieeecomputersociety.org/10.1109/HPDC.2003.1210036 A Performance Study of Monitoring and Information Services for Distributed Systems], 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03)  p. 270
 
* [http://doi.ieeecomputersociety.org/10.1109/HPDC.2003.1210036 A Performance Study of Monitoring and Information Services for Distributed Systems], 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03)  p. 270
 
* [http://www.gridpp.ac.uk/papers/ah03_148.pdf Relational Grid Monitoring Architecture (R-GMA)] Steve Fisher et al.
 
* [http://www.gridpp.ac.uk/papers/ah03_148.pdf Relational Grid Monitoring Architecture (R-GMA)] Steve Fisher et al.
* ...
+
* [http://springerlink.metapress.com/openurl.asp?genre=article&issn=1570-7873&volume=2&issue=4&spage=323 The Relational Grid Monitoring Architecture: Mediating Information about the Grid], A.W. Coocke et al.; S.M. Fisher, Journal of Grid Computing, Vol 2, 323-339, December 2004.
 +
* [https://www.cs.tcd.ie/coghlan/pubs/EGC2005-rgma-replication.pdf Fault tolerance in the R-GMA Information and Monitoring System], Rob Byrom et al., Proc.EGC'05, Amsterdam, February, 2005.
 +
* [http://www.globus.org/wsrf/ The WS-Resource Framework], Globus Alliance, 2004-2005.
 +
* [http://www.cs.wisc.edu/condor/hawkeye/ Hawkeye], A Monitoring and Management Tool for Distributed Systems.
 +
* [http://infnforge.cnaf.infn.it/glueinfomodel/ GLUE Schema specifications], Sergio Andreozzi, et al. 2002-2005.
 +
* [http://www.globus.org/toolkit/mds/glueschemalink.html Glue Schema and the Globus Toolkit].
 +
* [http://www.cs.northwestern.edu/~urgis/gis012.pdf A Unified Relational Approach to Grid Information Services], P. Dinda, B. Plale. Grid Forum Information Draft.
 +
* [http://nws.cs.ucsb.edu/ Network Weather Service].
  
==== Projects ====
+
"Manager" style monitoring tools:
 +
* [http://ganglia.sourceforge.net/ Ganglia] cluster monitoring
 +
* [http://monalisa.cacr.caltech.edu/ MonAlisa] by CalTech
 +
* [http://infnforge.cnaf.infn.it/gridice/ GridICE], EU DataTAG and EGEE (INFN), 2002-2005.
 +
 
 +
=== Projects ===
 
* extract resource information from a host (or a cluster when available) and express it in one of the information systems listed above.
 
* extract resource information from a host (or a cluster when available) and express it in one of the information systems listed above.
 
* once both clusters are operational to some degree, make sure you get the same information fromn both systems, so that the users can decide which one is the best to use (i.e. make a brokering decision based on the information)
 
* once both clusters are operational to some degree, make sure you get the same information fromn both systems, so that the users can decide which one is the best to use (i.e. make a brokering decision based on the information)
Line 87: Line 124:
 
== Job Management and Clusters ==
 
== Job Management and Clusters ==
  
The first resources to populate a grid were compute clusters (a computational grid). These resources were inintially supercomputers, but since we don't have one handy we stick with clusters.
+
The first resources to populate a grid were compute clusters (a computational grid). These resources were inintially supercomputers, but since we don't have one handy we stick with clusters. A cluster usually consists of a head node (called master, server, scheduler or the like), and a set of worker nodes. Jobs are submitted by users to the head node, and sent to worker node  for execution. When there are no free worker nodes left, jobs are queued on the head node.
 +
 
 +
There are a lot of different batch systems around, both open source and commercial. The references list a few of them. For this course, we will stick with open source or free schedulers.
 +
 
 +
When you have built the batch system, try running a some jobs through it.
 +
 
 +
=== Literature ===
  
 +
Batch system software:
 +
* [http://www.clusterresources.com/products/torque/ Torque], TORQUE (Tera-scale Open-source Resource and QUEue manager), Clusterresources Inc.
 +
* [http://www.cs.wisc.edu/condor/ Condor], Condor High Throughput Computing, UWisc-Madison.
 +
* [http://www.clusterresources.com/products/maui/ MAUI] MAUI Cluster Scheduler, ClusterResources Inc.
 +
* [http://www.platform.com/Products/Platform.LSF.Family/ LSF], Load Share Facility, Platofrm Computing, Inc. (see also the [http://batch.web.cern.ch/ CERN] LSF installation as an example).
 +
* [http://www.openpbs.org/ OpenPBS and PBSPro], by Altair, Inc.
 +
* [http://www.ihep.ac.cn/~chep01/paper/1-001.pdf FBSNG - Batch System for Farm Architecture], J. Fromm, et al. (FNAL) CHEP 2001.
 +
* [http://gridengine.sunsource.net/ Sun Grid Engine] SUN's (now open source) batch system.
  
[NEED DESCRIPTION AND REFS]
+
Building batch farms:
 +
* [http://www.clusterresources.com/products/torque/docs/torquequickstart.shtml Torque Quickstart Guide]
 +
* [http://grid-deployment.web.cern.ch/grid-deployment/documentation/Maui-Cookbook.pdf MAUI Cookbook], Sophie Lemaitre, Steve Traylen.
 +
* [http://www.gridpp.ac.uk/tb-support/faq/maui.cfg MAUI example configuration] from the RAL LCG Tier-1 site.
 +
* [[PBS_Caching_Utilities]] PBS qstat/pbsnodes caching utilities (needed in case of high load on the headnode pbs server)
 +
* [http://www.gridpp.ac.uk/tb-support/faq/torque.html Packaging] for torque and maui (including some nice patches to Torque)
  
 +
MPI related links:
 +
* [http://www-unix.mcs.anl.gov/mpi/ MPI]
 +
* [http://www.globus.org/grid_software/computation/mpich-g2.php MPICH-G2], Grid-enabled implementation of the popular MPI.
 +
* [http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque MPI Support with Torque] LCG/EGEE GoC Wiki, Cal Loomis, 2005.
 +
* [http://gridportal.fzk.de/distribution/crossgrid/releases/allfiles/7.3/cg/external/ CrossGrid MPICH-G2 RPMs]
  
==== Projects ====
+
Accounting:
 +
* [https://forge.gridforum.org/projects/ur-wg/ GGF Usage Record WG] specification for exchanging usage/accounting data.
 +
* [http://goc.grid-support.ac.uk/gridsite/accounting/ LCG/EGEE Accounting] GOC Accounting database and viewer.
 +
* [http://www.gridpp.ac.uk/abstracts/allhands2005/apel.pdf APEL]: An implementation of Grid accounting using R-GMA, Rob Byrom et al, GridPP All Hands meeting 2005.
 +
* [http://icsoc.dit.unitn.it/abstracts/A081.pdf An OGSA-Based Accounting System for Allocation Enforcement across HPC Centers], T. Sandholm et al., Proceedings of the 2nd International Conference on Service Oriented Computing. New York, USA, 15-19 November, 2004. Web site: [http://www.sgas.se/ www.sgas.se]
 +
* [http://doi.ieeecomputersociety.org/10.1109/GRID.2003.1261716 DGAS], An Economy-based Accounting Infrastructure for the DataGrid, R.M. Piro et al, Fourth International Workshop on Grid Computing p. 202.
 +
 
 +
=== Projects ===
 
(for two teams of ~3 students each)
 
(for two teams of ~3 students each)
  
Line 99: Line 167:
 
** can you run multi-node jobs?
 
** can you run multi-node jobs?
 
** what happens if a node fails (try pulling the network plug!)
 
** what happens if a node fails (try pulling the network plug!)
* build a Condor based cluster with three nodes in total, of which one is also used for other tasks?
+
** can you influence scheduling?
 +
** (optional) implement policy-based scheduling with MAUI
 +
* build a Condor based cluster with three nodes in total, of which one is also used for other tasks
 
** can use use idle cycles on the shared node?
 
** can use use idle cycles on the shared node?
 
** what happens in case of failure?
 
** what happens in case of failure?
 
** can you do job migration?
 
** can you do job migration?
 
* add MPI support to both clusters
 
* add MPI support to both clusters
* add a GT2 or GT4 GRAM service to both clusters
+
* add a GT2 or GT4 GRAM service to both clusters (use the same on both initially!)
 +
* build an accounting data collector for each of the clusters. Provide usage data summaries on a per-user and per-VO basis.
 +
 
 +
== Integration ==
 +
Scheduling and brokering in a grid. We now have to separate administrative domains (the two clusters, one Torque and one Condor) that could benefit from
 +
some collaboration. Grid middleware is there is provide collective services across these two systems.
 +
 
 +
For this integration, it is needed that the clusters run Grid middleware (in this case a Globus Toolkit GRAM service).
 +
 
 +
=== Literature ===
 +
* [http://www.cs.wisc.edu/condor/condorg/ Condor-G] from UWisc-Madison.
 +
* [http://www.csse.monash.edu.au/~davida/nimrod/nimrodg.htm NIMROD-G] for parameter sweeping over the grid.
 +
* [http://www.globus.org/grid_software/computation/mpich-g2.php MPICH-G2], Grid-enabled implementation of the popular MPI.
 +
 
 +
Other Grid links and middleware:
 +
* [http://www.hipersoft.rice.edu/grads/ GRADS] GRid Application Development Software project.
 +
* [http://www.cs.virginia.edu/~legion/ LEGION] Worldwide Virtual Computer
 +
* [http://www.unicore.org/ UniCore] Java based grid middleware, see also [http://www.eurogrid.org/ the EuroGrid project pages], and [http://www.deisa.org/ the DEISA project].
 +
* [http://www.globus.org/wsrf/ WS-Resource Framework], thew new direction for many Grid middlewares.
 +
 
 +
=== Projects ===
 +
* build a broker that looks at the info system and find the empty cluster (find a grid scheduler like Condor-G)
 +
* try multi-cluster MPI with MPICH-G2 (and a GT2 GRAM on each cluster)
 +
 
 +
== Portals ==
 +
This is the optional spare assignment: build a web portal that allows laymen to use the grid compute service (see above), or other services such as those provided in the GT4 tutorial.
 +
 
 +
=== Literature ===
 +
A list of portlet engines is forthcoming.
 +
* NPACI GridPort toolkit [https://gridport.npaci.edu/]
 +
* GridPort (by TACC) [http://gridport.net/]
 +
* the use of MyProxy in portal environments [http://grid.ncsa.uiuc.edu/myproxy/portals.html]
 +
* GENIUS, the INFN EnginFrame job submission portal [https://genius.ct.infn.it/]
 +
* GridSpeed [http://grid.is.titech.ac.jp/gridspeed-www/]
 +
 
 +
Some un-maintained portal toolkits (might be useful for a small project, though):
 +
* DoEScienceGrid portal development kit [http://doesciencegrid.org/projects/GPDK/]
 +
 
 +
Additional literature and groups:
 +
* Grid Computing Environments WG [http://www.computingportals.org/]
 +
* PortalLab (papers only, no source) [http://doi.ieeecomputersociety.org/10.1109/CCGRID.2003.1199368] and other refs.
 +
 
 +
=== Projects ===
 +
* Build a web portal with MyProxy and a portal engine.
 +
* Try integrating with a few of the web and grid services provided by the GT4 teams.
 +
 
 +
= Grid Services using GT4 - Paced Tutorial =
 +
Globus Toolkit version 4 provides a hosting environment for grid services. You can build your own grid service and deploy it inside the GT4 container. The container itself is based on an evolution of the Apache AXIS system, but has been enhanced with grid security mechanisms and support for WS-RF.
 +
 
 +
=== Literature ===
 +
* [http://gdp.globus.org/gt4-tutorial/ GT4 Programmers tutorial], Borja Sotomayor, Univ. of Chicago.
 +
* [http://www.globus.org/toolkit/support.html GT4 Support]
 +
 
 +
=== Projects ===
 +
* build a grid service.
 +
* in small groups, aggregate your service with some-one else's to provide additional, higher-level services. For example, a stock quote service that notifies a client watchdog, that will buy more stock based on the rate of change in the share prices, but only if the client still has enough funds in the back. When the amount exceeds a threshold, have the higher-level service notify the person by e-mail.
 +
 
 +
But of course you're strongly encouraged to think of your own set of services!

Latest revision as of 14:20, 21 September 2005

Structure

The aim of the lab courses will be to install, deploy and operate a mini-grid, with some applications and services. The entire minigrid will be build and run by students partipating in the course (of course with some help from the tutors). At the end of the lab course you'll know what a grid is, be able to build one, and what's needed to make it useable by applications.

There are two tracks in the lab course:

  • build a mini-grid from scratch (for advanced students). The list of projects and literature in section 2 makes - when put together - a basic mini-grid. A team of 2-3 sudents can pick up a topic in the list below (items 1-6) and provide that as a service to the others. All students in this track should look at the authentication part (section 1)
  • paced grid services tutorial with Globus Toolkit 4 "GT4" (for 'new' students). Section 3 has the references to tutorial and background material. GT4 is already pre-installed on some systems for you. Based on these services, you can build a small application using the latest Web Services and Grid protocols.

Important Notes and AUP

  • Before you start, read the Lab Course Acceptable Use Policy. You need to comply with it in order to get graded for the course. When in doubt, ask any of the tutors.
  • Keep a logbook (either electronically, on paper, or whatever). You'll appreciate it when you try to reproduce your results, or when a disk crashes. You will also need it to write your project result paper.
  • grading is integrated with the IGC lecture series and will be explained in the first IGC lecture on Monday

Building the mini-grid

Authentication

Trust in the grid today is established via a Public Key Infrastructure (PKI). Every entity in the system is issues with a "certificate" that links an identifier (the persons name, or a DNS name) to a piece of unique cryptographic data (an RSA keypair, for instance). These certificates usually have a limited lifetime when stored in a file, or are carried on hardware tokens like smart-cards and USB keys.

Commercial providers, like Verisign, Thawte, or Entrust, operate a Certification Authority and sellX.509 public key certificates.

CAcert.org provides free certificates based on face-to-face meetings and a web of trust.

You can also setup an X.509 Certification Authority (CA) yourself. The simplest is to use the OpenSSL commands, that even come with a shellscript to automate the task. More complete functionality can be found in OpenCA. Recent version of the Globus Toolkit also come with a package called "globus-simple-ca".

But there is more to authentication than just issuing certificates to users and hosts. Keys can be compromised or lost, the data in the certificate may become invalid, etc. These issues must be considered, also for the course's CA service.

Literature

  • CA Gymnastics - experiments in certificate maniopulation
  • on the blackboard site -- Course Documents -- Lab Course docs: ITU-T X.500 Document series, Open Systems Interconnection — The Directory: Overview of Concepts, Models, and Services, Recommendation X.500, ISO/IEC 9594-1
  • Global Grid Forum CAOPS-WG
  • RFC 2459, Certificate Revocation Lists
  • RFC 2560, On-line Certificate Status Protocol
  • RFC 3647, Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework
  • EUGridPMA, the European Grid Authentication Policy Management Authority in e-Science
  • IGTF, the International Grid Trust Federation
  • Minimum CA Requirements for Traditional X.509 Public Key Certification Authorities with secured infrastructure.
  • e-IRG White Paper (Dublin), e-Infrastructure Reflection Group 2004, (specifically section 5).
  • WS-Trust Web Services Trust Language, (defines extensions that build on WS-Security to provide a framework for requesting and issuing security tokens, and to broker trust relationships).
  • DFN bericht nr. 89: Aufbau und Betrieb einer Zertifizierungsinstanz

Project proposals

  • For everyone: try to setup your own mini-CA, issue a cert to yourself and a friend, and try to setup an authentication connection between the two of you (use openssl s_client and s_server)
  • Build a simple CA service, e.g. based on OpenSSL, that can be used by your fellow students to obtain certificates.
  • Describe the way in which you would identify entities, and what the level of trust in your certificates should be. Describe what the limitations, vulnerabilities, and possible attack vectors.
  • Build a more scalable system, incorporating Registration Authorities, and on-line checking of the status of your certificates (using an independent client program).
  • Integrate on-line checks in a piece of middleware (optional)

Authorization

Users and resources in a grid are grouped in Virtual Organisations. These can be based on directories of users stored in LDAP directories, on attributes issued to the user by the VO, and embedded in the proxy certificate, like in VOMS, or by having a Community Authorization Service (CAS) issue the proxy to the user.

The proxy certificate is the basis for grid authorization today, and enables single sign-on. To access these proxy certs from web portals (and for proxy renewal for long-running jobs), a MyProxy service has been built. This MyProxy service is required for portal operations.

Literature

Project proposals

  • Provide a VO management service for the two grid clusters that will be built lateron (this can best be done with a VO-LDAP server).
  • Old-style systems required the system administrators of a grid site to maintain a file (grid-mapfile) with a list of the authorized users. With VO-LDAP and VOMS, the membership list can be maintained in a central directory for the VO. What else is needed for smooth operation with a VO-LDAP, i.e. how to prevent the sysadmin from having to type something for each new member? (keywords: gridmapdir, LCMAPS, WorkSpace Service/WSS).
  • Setup a CAS service (with GT4) and CAS-enable an example service.

Information Services

A grid consists of many autonomous resources, that come and go. A resource information system to find the resources available for you is therefor vitally important. The system must be stable, scalable to several hunderd sites, hunderds of queries per second, and universally understood.

Information systems have evolved significantly over the years. The Globus Toolkit shipped originally with the "Metacomputing Directory Service" (later renamed to Monitoring and Discovery Service, MDS). The information was presented via an LDAP interface with a proprietary schema. The EU DataGrid [1] and the LHC Computing Grid Project [2] evolved this system later into the Berkeley Database Information Index (BDII) for increased performance and stability.

R-GMA [3] (a relational implementation of the GGF Grid Monitoring Archirecture) uses a structured, SQL based 'virtual database' across all sites in the grid to propagate information in a producer-consumer paradigm.

The Web services based GT4 release contains a completely new version of MDS, that's based on a notification/subscription mechanism that are part of the WS-Resource Framework set of specifications.

And with Condor you get it's own monitoring system Hawkeye.

Note also the existence of UDDI [4], but that is a registry only (not an information or disvcovery service, the W3C Web Services Design Issues page by Tim BL has some details.

Essential for any information system is a common way to express the information in a schema so that others understand the content and meaning of the information contained therein. There are many schema in use. The most popular one today in production grids is the GLUE schema.

Besides there are various management presentation tools like GridICE, MapCenter, GOC Monitor &c.

Literature

"Manager" style monitoring tools:

Projects

  • extract resource information from a host (or a cluster when available) and express it in one of the information systems listed above.
  • once both clusters are operational to some degree, make sure you get the same information fromn both systems, so that the users can decide which one is the best to use (i.e. make a brokering decision based on the information)

Job Management and Clusters

The first resources to populate a grid were compute clusters (a computational grid). These resources were inintially supercomputers, but since we don't have one handy we stick with clusters. A cluster usually consists of a head node (called master, server, scheduler or the like), and a set of worker nodes. Jobs are submitted by users to the head node, and sent to worker node for execution. When there are no free worker nodes left, jobs are queued on the head node.

There are a lot of different batch systems around, both open source and commercial. The references list a few of them. For this course, we will stick with open source or free schedulers.

When you have built the batch system, try running a some jobs through it.

Literature

Batch system software:

Building batch farms:

MPI related links:

Accounting:

Projects

(for two teams of ~3 students each)

  • build a PBS/Torque based cluster with a single head-node a two worker nodes.
    • can you run multi-node jobs?
    • what happens if a node fails (try pulling the network plug!)
    • can you influence scheduling?
    • (optional) implement policy-based scheduling with MAUI
  • build a Condor based cluster with three nodes in total, of which one is also used for other tasks
    • can use use idle cycles on the shared node?
    • what happens in case of failure?
    • can you do job migration?
  • add MPI support to both clusters
  • add a GT2 or GT4 GRAM service to both clusters (use the same on both initially!)
  • build an accounting data collector for each of the clusters. Provide usage data summaries on a per-user and per-VO basis.

Integration

Scheduling and brokering in a grid. We now have to separate administrative domains (the two clusters, one Torque and one Condor) that could benefit from some collaboration. Grid middleware is there is provide collective services across these two systems.

For this integration, it is needed that the clusters run Grid middleware (in this case a Globus Toolkit GRAM service).

Literature

  • Condor-G from UWisc-Madison.
  • NIMROD-G for parameter sweeping over the grid.
  • MPICH-G2, Grid-enabled implementation of the popular MPI.

Other Grid links and middleware:

Projects

  • build a broker that looks at the info system and find the empty cluster (find a grid scheduler like Condor-G)
  • try multi-cluster MPI with MPICH-G2 (and a GT2 GRAM on each cluster)

Portals

This is the optional spare assignment: build a web portal that allows laymen to use the grid compute service (see above), or other services such as those provided in the GT4 tutorial.

Literature

A list of portlet engines is forthcoming.

  • NPACI GridPort toolkit [5]
  • GridPort (by TACC) [6]
  • the use of MyProxy in portal environments [7]
  • GENIUS, the INFN EnginFrame job submission portal [8]
  • GridSpeed [9]

Some un-maintained portal toolkits (might be useful for a small project, though):

  • DoEScienceGrid portal development kit [10]

Additional literature and groups:

  • Grid Computing Environments WG [11]
  • PortalLab (papers only, no source) [12] and other refs.

Projects

  • Build a web portal with MyProxy and a portal engine.
  • Try integrating with a few of the web and grid services provided by the GT4 teams.

Grid Services using GT4 - Paced Tutorial

Globus Toolkit version 4 provides a hosting environment for grid services. You can build your own grid service and deploy it inside the GT4 container. The container itself is based on an evolution of the Apache AXIS system, but has been enhanced with grid security mechanisms and support for WS-RF.

Literature

Projects

  • build a grid service.
  • in small groups, aggregate your service with some-one else's to provide additional, higher-level services. For example, a stock quote service that notifies a client watchdog, that will buy more stock based on the rate of change in the share prices, but only if the client still has enough funds in the back. When the amount exceeds a threshold, have the higher-level service notify the person by e-mail.

But of course you're strongly encouraged to think of your own set of services!