VC-replication-1

From PRAGMA wiki
Jump to: navigation, search

An example

In this example,

  • the VM hosting server is fiji - running Rocks 5.3 with XEN roll
  • the original baseline cluster is rocks-185
    • 8 compute nodes
    • Software: Intel Fortran Compiler, Globus, SGE
    • Setup: see Base-line
  • the new "cloned" cluster is rocks-186
    • 2 compute nodes
    • Could specify different memory at "rocks add cluster ..." (not tested at this time)
    • Cannot change disk capacity

Get the disk image file of a virtual cluster you want to clone

  • Shutdown the original cluster (in this example - rocks-185)
    • Logon to rocks-185.sdsc.edu and run "poweroff" command
  • Logon to the VM hosting server (in this example - fiji)
  • find the image file path for rocks-185
# rocks list host vm rocks-185.sdsc.edu showdisks=yes
  • copy the disk image file

Create a new cluster with a disk image file

  • Obtain the disk image file of a virtual cluster which you want to clone from
  • create a new cluster on your VM hosting server (in this example - fiji)
# rocks add cluster rocks-186.sdsc.edu 198.202.88.186 2
  • find the image file path for the new cluster
# rocks list host vm rocks-186.sdsc.edu showdisks=yes
  • copy disk file to new cluster file location
cp <original-disk-image-file.hda> /state/partition1/xen/disks/frontend-0-0-6.hda

Modify the new disk image before boot

# ssh vm-container-0-0
# cd /state/partition1/xen/disks/
# df
# lomount -diskimage hosted-vm-0-0-6.hda -partition 1 /media/
# cd /media/etc/sysconfic/network-scripts
(edit ifcfg-eth0 and ifcfg-eth1 files to set ONBOOT=no) 
(add your ssh key to /root/.ssh/authorize_keys file)
# cd
# umount /media

Change hostname and network configurations

  • change rocks-186 boot setting to os (from install)
rocks set host boot rocks-186.sdsc.edu action=os
  • start rocks-186
rocks start host vm rocks-186.sdsc.edu
  • login to rocks-186.sdsc.edu as root
  • In /etc/sysconfig/network-scripts/, in files ifcfg-eth0 and ifcfg-eth1 comment out HWADDR, change IPADDR, and set ONBOOT=yes
  • Reboot
# /sbin/init 6
  • login to rocks-186.sdsc.edu as root
  • Update Rocks database
# rocks remove host compute
# hostname rocks-186
# rocks set host interface name localhost eth1 rocks-186.sdsc.edu
# rocks set host interface name localhost eth0 rocks-186

Get the mac addresses for the new cluster on fiji,

# rocks list host interface rocks-186.sdsc.edu

Set the MAC addresses on rocks-186

# rocks set host interface mac localhost eth1 xxxx
# rocks set host interface mac localhost eth0 xxxx

change hostname in file /opt/gridengine/default/common/act_qmaster to rocks-186.local

  • Update network information
# rocks report host interface localhost | rocks report script | sh
# rocks report host route localhost > /etc/sysconfig/static-routes
# service network restart
# rocks sync dns
# rocks sync config

edit /etc/sysconfig/network and change hostname and gateway ip
edit /etc/auto.home and modify hostname
reboot rocks-186

# rocks sync host network localhost

Rebuild rocks distribution on rocks-186

# cd /export/home/install
# rocks create distro
  • Add compute nodes to rocks-186
    • on rocks-186.sdsc.edu, run "insert-ethers", choose "Compute"
    • on the fiji, find compute nodes for rocks-186, then start compute nodes
# rocks list cluster rocks-186.sdsc.edu
FRONTEND            CLIENT NODES    TYPE
rocks-186.sdsc.edu: --------------- VM  
:                   hosted-vm-0-0-6 VM  
:                   hosted-vm-0-1-5 VM 
# rocks start host vm hosted-vm-0-0-6
# rocks start host vm hosted-vm-0-1-5
  • Copy rocks-186.sdsc.edu host certificate files to /etc/grid-security directory

Fix SGE configurations on rocks-186

  • Fix SGE submit host list
# qconf -ss
rocks-185.sdsc.edu
# qconf -as rocks-186.sdsc.edu
rocks-186.sdsc.edu added to submit host list
# qconf -ss
rocks-185.sdsc.edu
rocks-186.sdsc.edu
# qconf -ds rocks-185.sdsc.edu
root@rocks-186.sdsc.edu removed "rocks-185.sdsc.edu" from submit host list
# qconf -ss
rocks-186.sdsc.edu
  • Fix SGE administrative host list
# qconf -sh
compute-0-0.local
compute-0-1.local
rocks-185.local
rocks-186.local
# qconf -dh rocks-185.local
root@rocks-186.sdsc.edu removed "rocks-185.local" from administrative host list
# qconf -sh
compute-0-0.local
compute-0-1.local
rocks-186.local
  • Fix SGE configuration list
# qconf -sconfl
compute-0-0.local
compute-0-1.local
compute-0-2.local
compute-0-3.local
compute-0-4.local
compute-0-5.local
compute-0-6.local
compute-0-7.local
rocks-185.sdsc.edu
# qconf -dconf rocks-185.sdsc.edu
# qconf -aconf rocks-186.sdsc.edu
(Add 2 lines
mailer  /bin/mail
xterm   /usr/bin/X11/xterm)
# qconf -dconf compute-0-2.local
# qconf -dconf compute-0-3.local
# qconf -dconf compute-0-4.local
# qconf -dconf compute-0-5.local
# qconf -dconf compute-0-6.local
# qconf -dconf compute-0-7.local
# qconf -sconfl
compute-0-0.local
compute-0-1.local
rocks-186.sdsc.edu
  • Fix compute slots in queue configuration
# qconf -mq all.q
  • Reboot frontend and compute nodes.

Testing

See Test_Essential_Services

Issues

  • Start compute node, but "insert-ethers" on the front-end failed to catch
    • Updated hostname in mysql database, then run "rocks sync config"