Test WRF with cicese

From PRAGMAgridWIKI

Jump to: navigation, search

Results of testing WRF

test via gxrun

1. As a cicese account, logged in eros01, vega01, and orion01 and launch a deamon named gxlrm. orion01 is a cluster where each compute node has private IP address. The others are clusters where each compute node has public IP address.

$ ssh pluto
$ ssh eros01
$ /usr/local/gt3.2.1/bin/gxlrm
$ ssh pluto
$ ssh vega01
$ /usr/local/gt3.2.1/bin/gxlrm
$ ssh pluto
$ ssh orion01
$ /usr/local/gt3.2.1/bin/gxlrm 

2. As a cicese account, logged in pluto.

3. Compile and stage WRF.

4. Make a JRDL file.

5. Submit the job

--- 4 CPU orion, 4 CPU eros, 4 CPU vega
$ time gxrun wrf_three.xml 
Request a new job 63
02:57:43 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 0 is requesting to orion01.gridcenter.or.kr
02:57:43 [Thread-6] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 2 is requesting to vega01.gridcenter.or.kr
02:57:43 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 1 is requesting to eros01.gridcenter.or.kr
02:57:57 [Thread-8] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 2 status: RUNNING
02:57:58 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: RUNNING
02:57:58 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: RUNNING
Job is running.
checkPointPeriod: 36000000
03:22:02 [Thread-8] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 2 status: DONE
03:22:02 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: DONE
03:22:03 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: DONE
Job is done.

real    24m22.109s
user    0m1.020s
sys     0m0.120s

</table>
--- 10 CPU orion, 10 CPU eros
$ time gxrun ./wrf_orion-eros.xml 
Request a new job 66
02:15:13 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 0 is requesting to orion01.gridcenter.or.kr
02:15:13 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 1 is requesting to eros01.gridcenter.or.kr
02:25:40 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: RUNNING
02:25:40 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: RUNNING
Job is running.
checkPointPeriod: 36000000
02:41:48 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: DONE
02:41:50 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: DONE
Job is done.

real    26m38.663s
user    0m0.860s
sys     0m0.170s

--- 8 CPU orion, 8 CPU eros, 8 CPU vega
$ time gxrun ./wrf_three.xml 
Request a new job 69
03:38:44 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 0 is requesting to orion01.gridcenter.or.kr
03:38:44 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 1 is requesting to eros01.gridcenter.or.kr
03:38:44 [Thread-6] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 2 is requesting to vega01.gridcenter.or.kr
03:39:05 [Thread-8] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 2 status: RUNNING
03:39:05 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: RUNNING
03:39:05 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: RUNNING
Job is running.
checkPointPeriod: 36000000
04:11:54 [Thread-8] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 2 status: DONE
04:11:54 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: DONE
04:11:55 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: DONE
Job is done.

real    33m12.448s
user    0m0.890s
sys     0m0.140s

Another Test

This test

--- 10 CPU pegasus, 10 CPU cygnus
$ time gxrun ./wrf_pegasus-cygnus.xml
Request a new job 34
05:10:44 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 0 is requesting to pegasus01.gridcenter.or.kr
05:10:44 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread  - Subjob 1 is requesting to cygnus01.gridcenter.or.kr
05:10:59 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 0 status: RUNNING
05:11:00 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: RUNNING
Job is running.
checkPointPeriod: 36000000
08:04:36 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread  - New sub-job 1 status: DONE
15:11:00 [Thread-6] ERROR org.moredream.mpichgx.client.GXRunThread  - java.net.ConnectException: Connection refused
15:11:00 [Thread-6] ERROR org.moredream.mpichgx.client.GXRunThread  - org.moredream.mpichgx.MPIJobException: Connection refused
cancel is called
The connection to pegasus01 was lost at some point, killing the gxlrm process as well, but the run finished successfully. There are connection losses to pegasus01 every once in a while.
--- 6 CPU orion, 6 CPU eros
$ time gxrun wrf_orion-eros.xml
Request a new job 62
02:20:12 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 0 is requesting to orion01.gridcenter.or.kr
02:20:12 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 1 is requesting to eros01.gridcenter.or.kr
02:20:33 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: RUNNING
02:20:33 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: RUNNING
Job is running.
checkPointPeriod: 36000000
02:40:04 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: DONE
02:40:05 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: DONE
Job is done.

real    19m53.791s
user    0m0.840s
sys     0m0.090s

--- 6 CPU eros, 6 CPU vega
$ time gxrun wrf_eros-vega.xml 
Request a new job 64
03:34:12 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 0 is requesting to eros01.gridcenter.or.kr
03:34:12 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 1 is requesting to vega01.gridcenter.or.kr
03:34:23 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: RUNNING
03:34:23 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: RUNNING
Job is running.
checkPointPeriod: 36000000
04:11:41 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: DONE
04:11:41 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: DONE
Job is done.

real    37m30.391s
user    0m0.950s
sys     0m0.170s

--- 8 CPU orion, 8 CPU eros
$ time gxrun wrf_orion-eros.xml
Request a new job 61
01:56:54 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 0 is requesting to orion01.gridcenter.or.kr
01:56:54 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 1 is requesting to eros01.gridcenter.or.kr
01:57:18 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: RUNNING
01:57:18 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: RUNNING
Job is running.
checkPointPeriod: 36000000
02:15:03 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: DONE
02:15:04 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: DONE
Job is done.

real    18m11.732s
user    0m0.850s
sys     0m0.070s

--- 8 CPU eros, 8 CPU vega
$ time gxrun wrf_eros-vega.xml 
Request a new job 65
04:12:34 [Thread-1] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 0 is requesting to eros01.gridcenter.or.kr
04:12:34 [Thread-3] INFO  org.moredream.mpichgx.client.GXRunThread
  - Subjob 1 is requesting to vega01.gridcenter.or.kr
04:12:48 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: RUNNING
04:12:49 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: RUNNING
Job is running.
checkPointPeriod: 36000000
04:40:14 [Thread-5] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 1 status: DONE
04:40:14 [Thread-2] DEBUG org.moredream.mpichgx.client.GXRunThread
  - New sub-job 0 status: DONE
Job is done.

real    27m40.772s
user    0m0.880s
sys     0m0.070s

Personal tools