VC-replication-2

From PRAGMA wiki
Jump to: navigation, search

An example

In this example,

  • the VC image was made on a VM hosting server at SDSC, fiji.rocksclusters.org, running Rocks 5.3 with XEN roll
  • the original cluster is rocks-184.sdsc.edu
    • 8 compute nodes
    • Software: Intel Fortran Compiler, Globus, SGE, MyProxy
    • Setup: see Base-line
    • NBCR Jane Ren installed autodock and some user written scripts in nbcruser home directory
    • Tested Avian Flu Grid application by Jane Ren
    • The image file was gzip'ed for download
  • AIST replicate the AFG VM image on a VM hosting server at AIST, servername.hpcc.jp, running Rocks 5.3 with XEN roll
    • 2 compute nodes


Create a new cluster with AFG VC disk image on AIST VM hosting server

  • Download the AFG VC disk image file afg-rocks-184.hda.gz
  • Create a new cluster

We created this cluster with 0 compute node initially, and then allocated vm-container-0-14 and vm-container-0-15 to it.

# rocks add cluster fqdn="afgvm.hpcc.jp" ip="163.220.26.130" vlan="9" num-computes="0"
# rocks add host vm vm-container-0-14 membership="Hosted VM" vlan="9" cpus="12" mem="30720"
# rocks add host vm vm-container-0-15 membership="Hosted VM" vlan="9" cpus="12" mem="30720"
  • Get the MAC addresses and find the image file path for the new cluster.
# rocks list host vm afgvm.hpcc.jp showdisks=y
SLICE MEM   CPUS MAC               HOST   DISK                                                      DISKSIZE
8     1024  1    7a:1b:dc:40:01:15 istsvc file:/state/partition1/xen/disks/frontend-0-0-8.hda,hda,w 36      
----- ----- ---- 7a:1b:dc:40:01:16 ------ --------------------------------------------------------- --------
  • Copy disk image file to new cluster file location.
# gunzip afg-rocks-184.hda.gz
# mv -i afg-rocks-184.hda /state/partition1/xen/disks/frontend-0-0-8.hda


Modify the new disk image before boot

  • Mount the image file on the VM hosting server.
# lomount -diskimage /state/partition1/xen/disks/frontend-0-0-8.hda -partition 1 /media
  • Modify network interface config.
# cd /media/etc/sysconfig/network-scripts
# vi ifcfg-eth0
DEVICE=eth0
HWADDR=7a:1b:dc:40:01:15        # get from "rocks list host vm hostname"
IPADDR=10.1.9.1
NETMASK=255.255.255.0
BOOTPROTO=static
ONBOOT=yes
MTU=1500

# vi ifcfg-eth1
DEVICE=eth1
HWADDR=7a:1b:dc:40:01:16        # get from "rocks list host vm hostname"
IPADDR=163.220.26.130
NETMASK=255.255.224.0
BOOTPROTO=static
ONBOOT=yes
MTU=1500
  • Add ssh public key to root authrized_keys.
# cd /media/root/.ssh
# vi authorized_keys
(add ssh public key)
  • Modify sshd config. (as appropriate)
# cd /media/etc/ssh/
# vi sshd_config
(PasswordAuthentication yes -> no)
  • Unmount.
# umount /media/
  • Start the new Afg VM
# rocks set host boot afgvm.hpcc.jp action=os
# rocks list host boot afgvm.hpcc.jp
ACTION
os    
# rocks start host vm afgvm.hpcc.jp


Change hostname and network configurations

  • Login to Afg VM
ssh root@afgvm.hpcc.jp
  • Change hostname

At this time, you must specify not FQDN but short hostname.
Change hostname in mysql database.

# echo "update nodes set name='afgvm' where id=1;" | /opt/rocks/bin/mysql --user=apache cluster

And change hostname.

# hostname afgvm
  • Change rocks attribute
rocks set attr Kickstart_PublicHostname         afgvm.hpcc.jp
rocks set attr Kickstart_PrivateHostname        afgvm
rocks set attr Kickstart_PublicDNSDomain        hpcc.jp
rocks set attr Kickstart_PublicAddress          163.220.26.130
rocks set attr Kickstart_PublicNetmask          255.255.224.0
rocks set attr Kickstart_PublicNetwork          163.220.0.0
rocks set attr Kickstart_PublicBroadcast        163.220.31.255
rocks set attr Kickstart_PublicNetmaskCIDR      19
rocks set attr Kickstart_PublicGateway          163.220.0.1
rocks set attr Kickstart_PublicDNSServers       163.220.2.34,192.50.74.11
rocks set attr Kickstart_Timezone               Asia/Tokyo
rocks set attr Kickstart_PublicNTPHost          163.220.2.2
rocks set attr tripwire_mail                    aist-support@apgrid.org
  • Remove nonexistent compute nodes.
# rocks list host
HOST         MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
rocks-184:   Frontend   1    0    0    os        install      
compute-0-0: Compute    1    0    0    os        install      
compute-0-1: Compute    1    0    1    os        install      
compute-0-2: Compute    1    0    2    os        install      
compute-0-3: Compute    1    0    3    os        install      
compute-0-4: Compute    1    0    4    os        install      
compute-0-5: Compute    1    0    5    os        install      
compute-0-6: Compute    1    0    6    os        install      
compute-0-7: Compute    1    0    7    os        install      

# rocks remove host compute
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
error: sge_gethostbyname failed
(ignore these errors)
  • Set interface name
# rocks list host interface
SUBNET  IFACE MAC               IP             NETMASK       MODULE NAME               VLAN
private eth0  b6:58:ca:00:00:1f 10.1.1.1       255.255.0.0   xennet rocks-184          ----
public  eth1  b6:58:ca:00:00:20 198.202.88.184 255.255.255.0 xennet rocks-184.sdsc.edu ----

# rocks set host interface name localhost eth1 afgvm.hpcc.jp
# rocks set host interface name localhost eth0 afgvm

# rocks list host interface
SUBNET  IFACE MAC               IP             NETMASK       MODULE NAME          VLAN
private eth0  b6:58:ca:00:00:1f 10.1.1.1       255.255.0.0   xennet afgvm         ----
public  eth1  b6:58:ca:00:00:20 198.202.88.184 255.255.255.0 xennet afgvm.hpcc.jp ----
  • Set appropriate MAC address

Get appropriate MAC addresses.

# ifconfig eth1 | grep HWaddr
eth1      Link encap:Ethernet  HWaddr 7A:1B:DC:40:01:16
# ifconfig eth0 | grep HWaddr
eth0      Link encap:Ethernet  HWaddr 7A:1B:DC:40:01:15

And set.

# rocks set host interface mac localhost eth1 7A:1B:DC:40:01:16
# rocks set host interface mac localhost eth0 7A:1B:DC:40:01:15

# rocks list host interface
SUBNET  IFACE MAC               IP             NETMASK       MODULE NAME          VLAN
private eth0  7A:1B:DC:40:01:15 10.1.1.1       255.255.0.0   xennet afgvm         ----
public  eth1  7A:1B:DC:40:01:16 198.202.88.184 255.255.255.0 xennet afgvm.hpcc.jp ----
  • Set appropriate Netmask
# rocks set network netmask public 255.255.224.0
# rocks list host interface
SUBNET  IFACE MAC               IP             NETMASK       MODULE NAME          VLAN
private eth0  7A:1B:DC:40:01:15 10.1.1.1       255.255.0.0   xennet afgvm         ----
public  eth1  7A:1B:DC:40:01:16 198.202.88.184 255.255.224.0 xennet afgvm.hpcc.jp ----
  • Set appropriate Subnet
# rocks set network subnet public 163.220.0.0
  • Set appropriate IP address
# rocks set host interface ip localhost eth1 163.220.26.130
# rocks list host interface
SUBNET  IFACE MAC               IP             NETMASK       MODULE NAME          VLAN
private eth0  7A:1B:DC:40:01:15 10.1.1.1       255.255.0.0   xennet afgvm         ----
public  eth1  7A:1B:DC:40:01:16 163.220.26.130 255.255.224.0 xennet afgvm.hpcc.jp ----
  • Change hostname to FQDN
# hostname afgvm.hpcc.jp
  • Update routing information
# rocks list route
NETWORK          NETMASK         GATEWAY
224.0.0.0:       255.0.0.0       eth0
255.255.255.255: 255.255.255.255 eth0
0.0.0.0:         0.0.0.0         10.1.1.1
198.202.88.184:  255.255.255.255 10.1.1.1

# rocks remove route 198.202.88.184
# rocks list route
NETWORK          NETMASK         GATEWAY
224.0.0.0:       255.0.0.0       eth0
255.255.255.255: 255.255.255.255 eth0
0.0.0.0:         0.0.0.0         10.1.1.1

# rocks add route 163.220.26.130 10.1.1.1 netmask=255.255.255.255
# rocks list route
NETWORK          NETMASK         GATEWAY
224.0.0.0:       255.0.0.0       eth0
255.255.255.255: 255.255.255.255 eth0
0.0.0.0:         0.0.0.0         10.1.1.1
163.220.26.130:  255.255.255.255 10.1.1.1

# rocks report host route localhost > /etc/sysconfig/static-routes
  • Update network information
# rocks report host interface localhost | rocks report script | sh
  • Edit system network configuration.
# vi /etc/sysconfig/network
-HOSTNAME=rocks-184.sdsc.edu
-GATEWAY=198.202.88.20
+HOSTNAME=afgvm.hpcc.jp
+GATEWAY=163.220.0.1

# vi /etc/sysconfig/iptables
--A INPUT -m state --state NEW -p tcp --dport https --source 198.202.88.0/255.255.255.0 -j ACCEPT
--A INPUT -m state --state NEW -p tcp --dport www --source 198.202.88.0/255.255.255.0 -j ACCEPT
+-A INPUT -m state --state NEW -p tcp --dport https --source 163.220.26.0/255.255.224.0 -j ACCEPT
+-A INPUT -m state --state NEW -p tcp --dport www --source 163.220.26.0/255.255.224.0 -j ACCEPT

# vi /etc/auto.home 
(s/rocks-184.local/afgvm.local/)

# vi /etc/resolv.conf
(change search and nameserver)
  • change timezone
# cp -p /usr/share/zoneinfo/Japan /etc/localtime
  • Network restart and rocks sync
# service network restart
# rocks sync dns
# rocks sync config
ERROR: unable to send message to qmaster using port 536 on host "rocks-184.local": can't resolve host name
(ignore this error)
  • Reboot
# reboot
  • Rocks sync host
# rocks sync host network localhost


Rebuild rocks distribution on AFG VC

# cd /export/rocks/install
# rocks create distro
  • Add compute nodes to afgvm
    • On afgvm.hpcc.jp, run "insert-ethers", choose "Compute"
    • On physical frontend node, run "rocks start host vm hosted-vm-X-X-X install=yes"


Fix SGE configurations on afgvm.hpcc.jp

  • Fix SGE submit host list
# qconf -ss
rocks-184.sdsc.edu
# qconf -as afgvm.hpcc.jp
afgvm.hpcc.jp added to submit host list

# qconf -ss
afgvm.hpcc.jp
rocks-184.sdsc.edu

# qconf -ds rocks-184.sdsc.edu
root@rocks-201.sdsc.edu removed "rocks-184.sdsc.edu" from submit host list
# qconf -ss
afgvm.hpcc.jp
  • Fix SGE administrative host list
# qconf -sh
afgvm.hpcc.jp
compute-0-0.local
compute-0-1.local
compute-0-2.local
compute-0-3.local
compute-0-4.local
compute-0-5.local
compute-0-6.local
compute-0-7.local
rocks-184.local

# qconf -dh rocks-184.local
# qconf -dh compute-0-2.local
# qconf -dh compute-0-3.local
# qconf -dh compute-0-4.local
# qconf -dh compute-0-5.local
# qconf -dh compute-0-6.local
# qconf -dh compute-0-7.local

# qconf -sh
afgvm.hpcc.jp
compute-0-0.local
compute-0-1.local
  • Fix SGE configuration list
# qconf -sconfl
compute-0-0.local
compute-0-1.local
compute-0-2.local
compute-0-3.local
compute-0-4.local
compute-0-5.local
compute-0-6.local
compute-0-7.local
rocks-184.sdsc.edu

# qconf -aconf afgvm.hpcc.jp
(Add 2 lines
mailer  /bin/mail
xterm   /usr/bin/X11/xterm)

# qconf -dconf rocks-184.sdsc.edu
# qconf -dconf compute-0-2.local
# qconf -dconf compute-0-3.local
# qconf -dconf compute-0-4.local
# qconf -dconf compute-0-5.local
# qconf -dconf compute-0-6.local
# qconf -dconf compute-0-7.local
# qconf -sconfl
afgvm.hpcc.jp
compute-0-0.local
compute-0-1.local
  • Fix compute slots in queue configuration
# qconf -mq all.q
< slots                 1,[compute-0-0.local=12],[compute-0-1.local=12], \
<                       [compute-0-2.local=1],[compute-0-3.local=1], \
<                       [compute-0-5.local=1],[compute-0-6.local=1], \
<                       [compute-0-7.local=1],[compute-0-4.local=1]
---
> slots                 1,[compute-0-0.local=12],[compute-0-1.local=12]
  • Reboot frontend and compute nodes.


Replace Globus HOST certification

(You must obtain your Globus HOST certification in advance.)

  • backup original *.pem files
# cd /etc/grid-security
# mkdir original
# cp -ip *.pem original
  • copy your host*.pem files
# cp -pi /path/to/your/*.pem /etc/grid-security
# cd /etc/grid-security
# cp -ip hostcert.pem containercert.pem
# cp -ip hostkey.pem containerkey.pem
# chown root:root hostcert.pem hostkey.pem
# chown globus:globus containercert.pem containerkey.pem
  • add your DN to grid-mapfile
# vi /etc/grid-security/grid-mapfile
  • restart ws container
# /etc/init.d/globus-ws stop
Stopping Globus container. PID: 4136
Container stopped
# /etc/init.d/globus-ws start
Starting Globus container. PID: 19945


Modify globus_gram_fs_map_config.xml

  • Change the hostname from rocks-184.sdsc.edu to afgvm.hpcc.jp.
# cd $GLOBUS_LOCATION/etc/gram-service
# vi globus_gram_fs_map_config.xml
(change all rocks-184.sdsc.edu to afgvm.hpcc.jp)

Testing

See Test_Essential_Services