Start Multi-Grid Interoperation Experiment with TDDFT Run

Lessons Learned

2/11/2006

 

Many scientists work across different grid projects and their applications should not be restricted by grid boundaries. Expanding collaborations among grids is important and beneficial to global scientific communities. But how can different grids interoperate and how can we make it easier for scientists to use multiple grids? To learn the issues and solutions, PRAGMA and TeraGrid initiated the first step in Multi-grid interoperation experiment, under the umbrella of the Multi-grid Interoperation https://forge.gridforum.org/projects/mgi [Charter document by C. Catlett and M. Satsuoka] activity in the Global Grid Forum.

 

As a principle within PRAGMA, we let applications drive interoperation. After some discussions between PRAGMA and TeraGrid, we selected the TDDFT (Time Dependent Density Functional Theory - a quantum chemistry application) as the first application to run across PRAGMA Grid and TeraGrid.

 

Within a week, we were able to start TDDFT run on 4 heterogeneous sites across both grids, thus achieving interoperations. Our experiences show that a level of interoperability is neither automatic nor unattainable.

 

Through this experiment, we have learned many valuable lessons as grid infrastructure supporters and grid application users. We also gained useful insights as middleware developers which will help to improve and advance future grid middleware development. In addition, Multi-Grid Testbed brought different grids together to work and learn from each other. All grids involved benefit greatly from this experiment.

 

We summarize our experiences and lessons learned in 3 sections below: People Involved, Process and Time, and Lessons Learned. For more details about the Multi-Grid Interoperation testbed and experiment, please see http://pragma-goc.rocksclusters.org/pragma-doc/multigrid.html.

 

We envision this as the first step to broader interoperation experiences with production grids involved in the Multi-grid Interoperation activity. We believe that the lessons we learned in this first step will help us more easily engage other grids.

 

 

People Involved

 

PRAGMA Grid

UCSD/SDSC, USA: Peter Arzberger, Phil Papadopoulos, Mason Katz, Cindy Zheng

AIST, Japan: Yoshio Tanaka, Yusuke Tanimura

KU, Thailand: Putchong Uthayopas, Somsak Sriprayoonsakul

TeraGrid:

ANL, USA: Charlie Catlett, Dane Skow, JP Navarro

 

 

Process and Time

 

How did we start and conducted this experiment? Here are the steps and time frames:

 

11/17/05 - 01/20/06: High-level discussions and commitment to the Multi-Grid Interoperation experiment

01/20/06 - 01/26/06: Select and agree on the first application

01/26/06 - 02/03/06: Each grid allocates resources and designates technical support/contact and application drivers

01/27/06 - 01/31/06: Application drivers prepare and publish application requirements

02/03/06 - 02/04/06: Application drivers apply for user accounts on each grid

01/31/06 - 02/04/06: each site setup user accounts

01/31/06 - 02/08/06: each site implements the application requirements

02/01/06 - 02/08/06: Application drivers test user account access on each site

02/01/06 - 02/08/06: Application drivers deploy then test the application on each site

02/09/06 - 02/09/06: Application drivers start the application run**** (Interoperation)

02/09/06 - : Start discussion and deployment of grid monitoring software - SCMSWeb, for cross grid monitoring

 

 

Lessons Learned

 

Human Communication

Issues: Team numbers reside in 3 different time zones with up to 17 hours time difference, real-time communication is difficult and costly.

Solutions: Change working hours - U.S. team members work at night. Use skype and email as main communication tools.

 

Trust and Access

Issues: Trust between different certificate authorities are not readily established and may take time to formally change the policy.

Solution: Temporary solution to establish localized trust.

Lessons: IGTF needs to take effect for all the major grid operations. Not just to have trust agreements at high-level, but actually provide easy way for all trusting entities to install all IGTF trusted certificates. A grid is interested in interoperate with other grid should make sure that its trust policy and practice allow the trust of the other grid.

 

Application requirements

Issues: Software stack and versions are different amongst grids and from application requirements. As the pool of interoperable grids grows, we are likely to run into conflicting requirements between grid applications.

Solution: Applicatioins should minimize assumptions and prerequisites, or include pre-requisites as much as possible.

On the other hand, software stack providers need to adapt their stacks to multi-grid environments: not assume Linux, RPM, and only one version of package at a time, try to meet as many requirements as possible.

 

Issues: For new applications, users may learn new details about what their applications really need during deployment process. New grid environment may require modification and clarification to application requirements.

Solution: Provided committed support, ready to install and upgrade software as needed. Keep close communications between the system administrators and users, to understand how the application would work in a different grid environment in order to deploy software correctly.

 

Differences among Grids

Issues: Different system configurations and software stack/configuration present difficulties for users and middleware developers alike, and may cause problem to some applications.

Sample cases:

"job_type = multiple" behaves differently (solved via testing)

Calculation results are different (under investigation)

Different count attributes are required on different grids (modify middleware code - we have the middleware developer in the team)

Solution: Timely help/support to aid user with application deployment and execution.

Lessons Learned: Users need help initially to learn the main differences. Further collaborative research and development is required to extend and to standardize grid interfaces to meet broader requirements and to enable interoperations among different grids.

 

Issues: Different grid monitoring software and lack of interfaces among them prevent easy cross-grid monitoring.

Lessons Learned: Further collaborative research and development is required to build and to standardize grid monitoring interface, in order to bridge different grid monitoring software and to enable interoperations among different grids.