Difference between revisions of "Virtual Machines working group"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 85: Line 85:
  
 
=== Local caching ===
 
=== Local caching ===
The clone should be made on the local disk for performance reasons and reduction of network load. Only when a clone is saved (requires explicit configuration option), it needs to be transfered back to the image repository. LVM provides the means for this kind of management. Each worker node has a partition /dev/sda3, which is assigned to the logical volume with the Virtual Machine images. The LVM clone is forced to reside on this partition. Note that the partition is not available over iSCSI, so it can only be accessed by the local machine. The Volume Group will show "PV unknown device" and "Couldn't find device with uuid ..." on any other node. When the clone is saved, it is moved from /dev/sda3 to /dev/sdb, which is the iSCSI mounted partition from the image repository. As a result, only saved images are visible in the image repository. The following changes were made to accomplish this behavior:
+
Network traffic for Virtual Machine management can be optimized with two caches on the worker node:
 +
1. A read cache from the original Virtual Machine image to facilitate reuse.
 +
2. A write-back cache for the copy-on-write clone to allow local writes.
  
Added /dev/sda3 at the end of the clone line in tm_clone.sh:
+
The copy-on-write clone can be moved back to the image repository with a synchronization of the write-back cache. After the synchroniztion, the write-back cache becomes obsolete and can be removed.
exec_and_log "ssh $DST_HOST sudo /usr/sbin/lvcreate -s -L$SIZE -n $LV_PATH $SRC_PATH /dev/sda3"
 
 
 
Added a line in tm_mv.sh to copy a saved clone back to the image repository:
 
exec_and_log "ssh $DST_HOST sudo /usr/sbin/pvmove -n $DST_PATH /dev/sda3 /dev/sdb"
 
  
 
== Other information ==
 
== Other information ==
 
* CERN [http://indico.cern.ch/conferenceDisplay.py?confId=56353 June 2009 workshop] on virtual machines
 
* CERN [http://indico.cern.ch/conferenceDisplay.py?confId=56353 June 2009 workshop] on virtual machines

Revision as of 12:45, 3 August 2009

Members and assignment

Sander Klous - Nikhef (Chair)<br\> Ronald Starink - Nikhef<br\> Marc van Driel - NBIC<br\> Pieter van Beek - SARA<br\> Ron Trompert - SARA<br\> <br\> Charge

Meetings

Kick-off - Monday July 6, 2009: agenda (dutch), minutes (dutch)

Presentations

Sky computing - Sander@nikhef.nl Klous (Monday July 6, 2009), a summary of the CERN virtual machines workshop (see other information) and an introduction for the kick-off meeting of the BIG grid virtual machines working group.

Open Issues

  • Network Address Translation - What is the load?<br\>
  • Virtual Machine Isolation - Prohibit internal network connectivity with IPTables.<br\>
  • Image repository - Storage Area Network or distributed over worker nodes.<br\>
  • Policy document<br\>

Infrastructure

We are setting up a testbed to investigate technical issues related to virtual machine management.

Hardware and Operating Systems

  • Two Dell 1950 machines, dual CPU, 4 cores per CPU
    • One machine has a CentOS-5 installation
    • One machine has a Debian-squeeze installation

Software

  • CentOS-5 comes with Xen 3.0
  • Debian-squeeze comes with Xen 3.3
    • Debian-squeeze Xen packages have a problem with tap:aio.
Fix:
ln -s /usr/lib/xen-3.2-1/bin/tapdisk /usr/sbin
echo xenblktap >> /etc/modules
  • Opennebula has been installed (stand alone) on CentOS-5 following this guide
    • A few additional staps were needed:
      • Install rubygems and rubygem-sqlite3
      • Opennebula has to be added to the sudoers file for xm and xentop
      • Sudoers should not require a tty
wget ftp://fr.rpmfind.net/linux/EPEL/5/x86_64/rubygem-sqlite3-ruby-1.2.4-1.el5.x86_64.rpm
wget ftp://fr.rpmfind.net/linux/EPEL/5/x86_64/rubygems-1.3.1-1.el5.noarch.rpm
sudo rpm -Uvh rubygems-1.3.1-1.el5.noarch.rpm rubygem-sqlite3-ruby-1.2.4-1.el5.x86_64.rpm 
In /etc/sudoers (on all machines)
opennebula ALL = NOPASSWD: /usr/sbin/xm
opennebula ALL = NOPASSWD: /usr/sbin/xentop
#Defaults    requiretty
  • Installed iSCSI target and client software for shared image repository
  • Image repository consists of LVM volume groups
    • Performance of LVM is better than file based images
    • Each logical volume contains an image
    • This allows easy creation/deletion of new images
    • VMs can run from cloned (Copy-On-Write) images

Implementation issues

Implemented iSCSI image management for opennebula following the storage guide

In /opt/opennebula/etc/oned.conf:
TM_MAD = [
   name       = "tm_iscsi",
   executable = "one_tm",
   arguments  = "tm_iscsi/tm_iscsi.conf",
   default    = "tm_iscsi/tm_iscsi.conf" ]
/opt/opennebula/etc/tm_iscsi/tm_iscsi.conf
/opt/opennebula/etc/tm_iscsi/tm_iscsirc
/opt/opennebula/lib/tm_commands/iscsi/tm_clone.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_delete.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_ln.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mkimage.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mkswap.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mv.sh
.../one-1.2.0/src/vmm/XenDriver.cc
.../one-1.2.0/src/tm/TransferManager.cc

Local caching

Network traffic for Virtual Machine management can be optimized with two caches on the worker node: 1. A read cache from the original Virtual Machine image to facilitate reuse. 2. A write-back cache for the copy-on-write clone to allow local writes.

The copy-on-write clone can be moved back to the image repository with a synchronization of the write-back cache. After the synchroniztion, the write-back cache becomes obsolete and can be removed.

Other information