Condor-G server

From PRAGMAgridWIKI

Jump to: navigation, search

Condor-G Servers in PRAGMA Grid

Also a compute cluster. You can submit jobs using Condor-G from rocks-153 to rocks-153 as well as to other clusters in PRAGMA Grid.

  • nucleus.mygridusbio.net.my, MIMOS, Malaysia - Also a compute cluster. You can submit jobs using Condor-G from nucleus to nucleus as well as to other clusters in PRAGMA Grid.

How to submit jobs on a Condor-G server

Here is a relevant chapter in Condor manual:

5.3.3.1 Running a Globus Universe Job Under Condor, successful job submission to the Globus universe requires credentials. An X.509 certificate is used to create a proxy, and an account, authorization, or allocation to use a grid resource is required. For more information on proxies and certificates, please consult the Alliance PKI pages at

http://archive.ncsa.uiuc.edu/SCD/Alliance/GridSecurity/

Before submitting a job to Condor under the Globus universe, make sure you have your Grid credentials and have used grid-proxy-init to create a proxy.

A job is submitted for execution to Condor using the condor_ submit command. condor_ submit takes as an argument the name of a file called a submit description file. The following sample submit description file runs a job on the Origin2000 at NCSA.

executable = test
globusscheduler = modi4.ncsa.uiuc.edu/jobmanager
universe = globus
output = test.out
log = test.log
queue

The executable for this example is transferred from the local machine to the remote machine. By default, Condor transfers the executable, as well as any files specified by the input command. Note that this executable must be compiled for the correct intended platform.

The globusscheduler command is dependent on the scheduling software available on remote resource. This required command will change based on the Grid resource intended for execution of the job. A jobmanager is the Globus service that is spawned at a remote site to submit, keep track of, and manage Grid I/O for jobs running on the local batch system there. There is a specific jobmanager for each type of batch system supported by Globus (examples are Condor, LSF, and PBS).

All Condor-G jobs (intended for execution on Globus-controlled resources) are submitted to the globus universe. The universe = globus command is required in the submit description file.

No input file is specified for this example job. Any output (file specified by the output) or error (file specified by the error) is transferred from the remote machine to the local machine as it is produced. This implies that these files may be incomplete in the case where the executable does not finish running on the remote resource. The job log file is maintained on the local machine.

To submit this job to Condor-G for execution on the remote machine, use

condor_submit test.submit

where test.submit is the name of the submit description file. Example output from condor_ q for this submission looks like:

% condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER         SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   epaulson     3/26 14:08   0+00:00:00 I  0   0.0  test

1 jobs; 1 idle, 0 running, 0 held

After a short time, Globus accepts the job. Again running condor_ q will now result in

% condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER         SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   epaulson     3/26 14:08   0+00:01:15 R  0   0.0  test

1 jobs; 0 idle, 1 running, 0 held

Then, very shortly after that, the queue will be empty again, because the job has finished:

% condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held

A second example of a submit description file runs the Unix ls program on a different Globus resource.

executable = /bin/ls
Transfer_Executable = false
globusscheduler = vulture.cs.wisc.edu/jobmanager
universe = globus
output = ls-test.out
log = ls-test.log
queue

In this example, the executable (the binary) has been pre-staged. The executable is on the remote machine, and it is not to be transferred before execution. Note that the required globusscheduler and universe commands are present. The command

Transfer_Executable = FALSE

within the submit description file identifies the executable as being pre-staged. In this case, the executable command gives the path to the executable on the remote machine. A third example submits a Perl script to be run as a submitted Condor job. The Perl script both lists and sets environment variables for a job. Save the following Perl script with the name env-test.pl, to be used as a Condor job executable.

#!/usr/bin/env perl

foreach $key (sort keys(%ENV))
{
   print "$key = $ENV{$key}\n"
}

exit 0;

Run the Unix command

chmod 755 env-test.pl

to make the Perl script executable. Now create the following submit description file (Replace biron.cs.wisc.edu/jobmanager with a resource you are authorized to use.):

executable = env-test.pl
globusscheduler = biron.cs.wisc.edu/jobmanager
universe = globus
environment = foo=bar; zot=qux
output = env-test.out
log = env-test.log
queue

When the job has completed, the output file env-test.out should contain something like this:

GLOBUS_GRAM_JOB_CONTACT = https://biron.cs.wisc.edu:36213/30905/1020633947/
GLOBUS_GRAM_MYJOB_CONTACT = URLx-nexus://biron.cs.wisc.edu:36214
GLOBUS_LOCATION = /usr/local/globus
GLOBUS_REMOTE_IO_URL = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633948
HOME = /home/epaulson
LANG = en_US
LOGNAME = epaulson
X509_USER_PROXY = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633951
foo = bar
zot = qux

Of particular interest is the GLOBUS_REMOTE_IO_URL environment variable. Condor-G automatically starts up a GASS remote I/O server on the submitting machine. Because of the potential for either side of the connection to fail, the URL for the server cannot be passed directly to the job. Instead, it is put into a file, and the GLOBUS_REMOTE_IO_URL environment variable points to this file. Remote jobs can read this file and use the URL it contains to access the remote GASS server running inside Condor-G. If the location of the GASS server changes (for example, if Condor-G restarts), Condor-G will contact the Globus gatekeeper and update this file on the machine where the job is running. It is therefore important that all accesses to the remote GASS server check this file for the latest location.

The following example is a Perl script that uses the GASS server in Condor-G to copy input files to the execute machine. In this example, the remote job counts the number of lines in a file.

#!/usr/bin/env perl
use FileHandle;
use Cwd;

STDOUT->autoflush();
$gassUrl = `cat $ENV{GLOBUS_REMOTE_IO_URL}`;
chomp $gassUrl;

$ENV{LD_LIBRARY_PATH} = $ENV{GLOBUS_LOCATION}. "/lib";
$urlCopy = $ENV{GLOBUS_LOCATION}."/bin/globus-url-copy";

# globus-url-copy needs a full pathname
$pwd = getcwd();
print "$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts\n\n";
`$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts`;

open(file, "temporary.hosts");
while(<file>) {
print $_;
}

exit 0;

The submit description file used to submit the Perl script as a Condor job appears as:

executable = gass-example.pl
globusscheduler = biron.cs.wisc.edu/jobmanager
universe = globus
output = gass.out
log = gass.log
queue

There are two optional submit description file commands of note: x509userproxy and globusrsl. The x509userproxy command specifies the path to an X.509 proxy. The command is of the form:

x509userproxy = /path/to/proxy

If this optional command is not present in the submit description file, then Condor-G checks the value of the environment variable X509_USER_PROXY for the location of the proxy. If this environment variable is not present, then Condor-G looks for the proxy in the file /tmp/x509up_u0000, where the trailing zeros in this file name are replaced with the Unix user id. The globusrsl command is used to add additional attribute settings to a job's RSL string. The format of the globusrsl command is

globusrsl = (name=value)(name=value)

Here is an example of this command from a submit description file:

 
globusrsl = (project=Test_Project)

This example's attribute name for the additional RSL is project, and the value assigned is Test_Project.


5.3.3.2 Limitations of Condor-G Submitting jobs to run under the globus universe has not yet been perfected. The following is a list of known limitations:

  • No checkpoints.
  • No matchmaking.
  • File transfer is limited. There are no file transfer mechanisms for files other than the executable, stdin, stdout, and stderr.
  • No job exit codes. Job exit codes are not available.
  • Limited platform availability. Condor-G is only available on Linux, Solaris, Digital UNIX, and IRIX. HP-UX support will hopefully be available later.
Personal tools