GridFMO
Network/Grid
- Free TCP/IP access among assigned computation nodes is
necessary. (They need not be reachable from outside of the
cluster. [1])
- The client program is currently running on
p06svc000.asc.hpcc.jp (163.220.35.131).
- Both in/out connection between the client and a front node
should be allowed.
It is not only the gatekeeper port (2119).
The Globus session uses https connections of both direction,
which are opened on arbitrary port.
- Backend nodes should be allowed to connect the client.
NinfG processes launched on backends span callback connections
to the client process.
- Regular update of the AIST CRL file is required.
For those CRL savvy sites, the AIST CRL file has to be updated
regularly, or the globus access with the AIST certificate is
impossible.
The latest CRL file will be found at:
https://www.apgrid.org/CA/AIST/Production/a317c467.r0.
It is updated on every 30 - alpha days, which makes the
requirement troublesome.
An automatic update mechanism are now under consideration.
- It is desirable for a local queue system to allow exclusive use
of backend nodes.
The solution is mostly case-by-case, twicking both the queue
and jobmanager configurations.
Hardware
- 0.5 GB of memory per CPU (the more the better)
- 1 GB of local scratch disk
Software
Necessities
-
C and Fortran77 compilers (I can transfer binaries from others.)
-
Ninf-G (Version 2.4.0 or later) (Patched version will be installed by myself.)
Desirable
-
Lapack package (MKL, ACML, ...)
- Download user info paks for
Tsutomu Ikegami,
Toshiya Takami
and Jun Maki
- Please considering using the PRAGMA user account setup script
- Download the script
- To run the script, make it executable, login as root and run, (for example)
# ./pragma_setuser ikegami.tar.gz
- You may modify the script to fit your system and site policies
- If you are not going to use the PRAGMA user account setup script, please unzip and untar the user info pak and use all the info and files provided there to setup the user account.
- If your site has a firewall, please configure it to allow ssh
access to the front machine from p06svc000.asc.hpcc.jp
(163.220.35.131) and/or 150.29.148.52 (hidden behind our
firewall).
- When the accounts are ready, please email users and Cindy and let us know. Thank you!
Setup
- Login the machine via ssh.
- Collect informations, such as machine architecture, amount of
memory, scratch area, batch queues, etc.
- Transfer source tarball and make (or just copy appropriate binaries).
Binaries are placed under the home directory.
- Submit a benchmark test, and check if the binaries work fine.
The computation time is also measured to evaluate the machine
performance, which is used for load balancing.
- Make pathced version of Ninf-G with Globus, and generate a stub
application with it.
- Logout.
- Globus-job-run a probe job to check environment variables.
Application Run
- Contact GRAM to run a "multiple" job to launch a bunch of Ninf-G stubs.
- Collect information of the assigned nodes, and launches a sentinel.
- Transfer data (200-300 MB), and distribute it among assigned
nodes.
- Allocate a job on a subset of the nodes, where the job is
processed in parallel. When finished, another job is allocated.
- In case of emergency (like queue timeout), the sentinel will
(hopefully) cleanup garbages. (Occasionally, shared memories are
left over, which should be freed manually...)
Application
GridFMO is a Grid version of the Fragment Molecular Orbital (FMO)
calculation.
FMO method enables a first-principle calculation of macromolecules
such as proteins.
Thanks to the first-principle nature of the FMO method, an energy of
the molecules is obtained accurately (including polarization effect),
and their chemical reactions can be described with few assumptions.
The FMO method is now used to study a reaction mechanism of enzymes,
an electronic structure of the photosynthetic systems, and so on.
Implementation
In the FMO method, the electronic state of the whole molecule is
calculated by splitting it into small fragments.
The electronic state of each fragment is calculated under the
electrostatic environment posed by the other fragments.
The environment is then reconstructed from the calculated electronic
states, so that the calculation should be iterated to the
self-consistency.
To improve the accuracy, the electronic states of the fragment pairs
are also calculated after the convergence.
Each electronic state calculation can be performed independently under
the given environment.
This is the source of the "Grid" in the GridFMO calculation.
Those calculation processes are spawned on remote machines by using
Ninf-G2 (or SSH), which may run further in parallel.
For the fine-grained parallel execution, the DDI library of GAMESS is
utilized, where the free TCP/IP access among computation nodes becomes
necessary.
Planned experiment
We are planning to draw potential energy curves of proteins.
A number of GridFMO calculations should be run to draw a curve, which
will be performed on the PRAGMA testbed.
In order to reduce inactivity during the serial part of the GridFMO
calculation, it will be tested to share the assigned computational
resources among two or more GridFMO processes.
GridFMO is implemented with fault-tolerance, where faults are detected
automatically while recoveries are done manually.
Feasibility of the model will also be assessed in the experiment.
- In some cluster machines, each computation node has two network
cards, one is public and another is private. The official
hostname of the node may be bound to the public IP address, which
is tightly gurded by a firewall. If this is the case, please
tell us to use the private address.
Grid Technology Research Center / AIST / IKEGAMI, Tsutomu
Back