Difference between revisions of "Agile testbed"
From PDP/Grid Wiki
Jump to navigationJump to searchLine 355: | Line 355: | ||
== Storage == | == Storage == | ||
− | The hypervisors of the testbed all connect to the same shared storage backend ( | + | The hypervisors of the testbed all connect to the same shared storage backend (a Fujitsu DX200 system called KLAAS) over iSCSI. |
− | The | + | The storage backend exports a number of pools to the testbed. These are formatted as LVM groups and shared through a clustered LVM setup. |
− | In libvirt, | + | In libvirt, the VG is known as a 'pool' under the name <code>vmachines</code>. |
− | + | === Clustered LVM setup === | |
− | + | The clustering of nodes is provided by corosync. Here are the contents of the configuration file /etc/corosync/corosync.conf: | |
+ | totem { | ||
+ | version: 2 | ||
+ | cluster_name: p4ctb | ||
+ | token: 3000 | ||
+ | token_retransmits_before_loss_const: 10 | ||
+ | clear_node_high_bit: yes | ||
+ | crypto_cipher: aes256 | ||
+ | crypto_hash: sha256 | ||
+ | interface { | ||
+ | ringnumber: 0 | ||
+ | bindnetaddr: 10.198.0.0 | ||
+ | mcastport: 5405 | ||
+ | ttl: 1 | ||
+ | } | ||
+ | } | ||
+ | |||
+ | logging { | ||
+ | fileline: off | ||
+ | to_stderr: no | ||
+ | to_logfile: no | ||
+ | to_syslog: yes | ||
+ | syslog_facility: daemon | ||
+ | debug: off | ||
+ | timestamp: on | ||
+ | logger_subsys { | ||
+ | subsys: QUORUM | ||
+ | debug: off | ||
+ | } | ||
+ | } | ||
+ | |||
+ | quorum { | ||
+ | provider: corosync_votequorum | ||
+ | expected_votes: 2 | ||
+ | } | ||
+ | |||
+ | The crypto settings refer to a file /etc/corosync/authkey which must be present on all systems. There is no predefined definition of the cluster, any node can join and that is why the security token is a good idea. You don't want any unexpected members joining the cluster. The quorum of 2 is, of course, because there are only 3 machines at the moment. | ||
− | + | As long as the cluster is quorate everything should be fine. That means that at any time, one of the machines can be maintained, rebooted, etc. without affecting the availability of the storage on the other nodes. | |
− | + | As long as at least one node has the cluster up and running, others should be able to join even if the cluster is not quorate. That means that if only a single node out of three is up, the cluster is no longer quorate and storage queries are blocked. But when another node joins the cluster is again quorate and should unblock. | |
− | |||
==== installation ==== | ==== installation ==== | ||
+ | |||
+ | Based on Debian 9. | ||
Install the required packages: | Install the required packages: | ||
− | apt-get install | + | apt-get install corosync clvm |
Set up clustered locking in lvm: | Set up clustered locking in lvm: | ||
Line 380: | Line 417: | ||
sed -i 's/^ locking_type = 1$/ locking_type = 3/' /etc/lvm/lvm.conf | sed -i 's/^ locking_type = 1$/ locking_type = 3/' /etc/lvm/lvm.conf | ||
− | + | Make sure all nodes have the same corosync.conf file and the same authkey. A key can be generated with corosync-keygen. | |
==== Running ==== | ==== Running ==== | ||
− | + | Start corosync | |
+ | |||
+ | systemctl start corosync | ||
+ | |||
+ | Test the cluster status with | ||
− | + | corosync-quorumtool -s | |
− | + | dlm_tool -n ls | |
− | + | Should show all nodes. | |
− | Start the | + | Start the iscsi daemon |
− | + | systemctl start iscsid | |
+ | systemctl start multipathd | ||
− | + | See if the iscsi paths are visible. | |
− | + | ||
− | + | multipath -ll | |
− | + | 3600000e00d2900000029295000110000 dm-1 FUJITSU,ETERNUS_DXL | |
+ | size=2.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw | ||
+ | |-+- policy='service-time 0' prio=50 status=active | ||
+ | | |- 6:0:0:1 sdi 8:128 active ready running | ||
+ | | `- 3:0:0:1 sdg 8:96 active ready running | ||
+ | `-+- policy='service-time 0' prio=10 status=enabled | ||
+ | |- 4:0:0:1 sdh 8:112 active ready running | ||
+ | `- 5:0:0:1 sdf 8:80 active ready running | ||
+ | 3600000e00d2900000029295000100000 dm-0 FUJITSU,ETERNUS_DXL | ||
+ | size=2.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw | ||
+ | |-+- policy='service-time 0' prio=50 status=active | ||
+ | | |- 4:0:0:0 sdb 8:16 active ready running | ||
+ | | `- 5:0:0:0 sdc 8:32 active ready running | ||
+ | `-+- policy='service-time 0' prio=10 status=enabled | ||
+ | |- 3:0:0:0 sdd 8:48 active ready running | ||
+ | `- 6:0:0:0 sde 8:64 active ready running | ||
+ | |||
+ | Only then start the clustered lvm. | ||
− | + | systemctl start lvm2-cluster-activation.service | |
− | |||
==== Troubleshooting ==== | ==== Troubleshooting ==== |