GridFMO


System Requirement

Network/Grid

Hardware

Software

Necessities

Desirable


Create 3 User Accounts


What we do on the machine

Setup

  1. Login the machine via ssh.
  2. Collect informations, such as machine architecture, amount of memory, scratch area, batch queues, etc.
  3. Transfer source tarball and make (or just copy appropriate binaries). Binaries are placed under the home directory.
  4. Submit a benchmark test, and check if the binaries work fine. The computation time is also measured to evaluate the machine performance, which is used for load balancing.
  5. Make pathced version of Ninf-G with Globus, and generate a stub application with it.
  6. Logout.
  7. Globus-job-run a probe job to check environment variables.

Application Run

  1. Contact GRAM to run a "multiple" job to launch a bunch of Ninf-G stubs.
  2. Collect information of the assigned nodes, and launches a sentinel.
  3. Transfer data (200-300 MB), and distribute it among assigned nodes.
  4. Allocate a job on a subset of the nodes, where the job is processed in parallel. When finished, another job is allocated.
  5. In case of emergency (like queue timeout), the sentinel will (hopefully) cleanup garbages. (Occasionally, shared memories are left over, which should be freed manually...)

About GridFMO

Application

GridFMO is a Grid version of the Fragment Molecular Orbital (FMO) calculation. FMO method enables a first-principle calculation of macromolecules such as proteins. Thanks to the first-principle nature of the FMO method, an energy of the molecules is obtained accurately (including polarization effect), and their chemical reactions can be described with few assumptions. The FMO method is now used to study a reaction mechanism of enzymes, an electronic structure of the photosynthetic systems, and so on.

Implementation

In the FMO method, the electronic state of the whole molecule is calculated by splitting it into small fragments. The electronic state of each fragment is calculated under the electrostatic environment posed by the other fragments. The environment is then reconstructed from the calculated electronic states, so that the calculation should be iterated to the self-consistency. To improve the accuracy, the electronic states of the fragment pairs are also calculated after the convergence.

Each electronic state calculation can be performed independently under the given environment. This is the source of the "Grid" in the GridFMO calculation. Those calculation processes are spawned on remote machines by using Ninf-G2 (or SSH), which may run further in parallel. For the fine-grained parallel execution, the DDI library of GAMESS is utilized, where the free TCP/IP access among computation nodes becomes necessary.

Planned experiment

We are planning to draw potential energy curves of proteins. A number of GridFMO calculations should be run to draw a curve, which will be performed on the PRAGMA testbed. In order to reduce inactivity during the serial part of the GridFMO calculation, it will be tested to share the assigned computational resources among two or more GridFMO processes. GridFMO is implemented with fault-tolerance, where faults are detected automatically while recoveries are done manually. Feasibility of the model will also be assessed in the experiment.


Remarks

  1. In some cluster machines, each computation node has two network cards, one is public and another is private. The official hostname of the node may be bound to the public IP address, which is tightly gurded by a firewall. If this is the case, please tell us to use the private address.

----------
Grid Technology Research Center / AIST / IKEGAMI, Tsutomu

Back