Xcrypt – Job Level Parallel Script Language

Computational scientists often perform large scale simulations in their research or development such as car body design and drug discovery. For parameter sweeps or optimal parameter searches, such a simulation often forms Plan-Do-Check-Act (PDCA) cycles, that is, iterations of plenty of sequential/parallel job executions with different parameters, shown as Figure 1.


Figure 1 PDCA Cycle

PDCA cycles should be automated. However, pre-existing general script languages, such as Perl or Ruby, are hard for typical computational scientists to use for preparing input files, generating a job script for each job, extracting necessary parts from output files to analyze results, and managing plenty of asynchronously running jobs. Though they can also use GUI-based workflow tools, it is difficult to describe some kind of complicated workflows with them.

Therefore, we are developing a job-level parallel script language named Xcrypt that helps such automation.

What is Xcrypt?

The goal of Xcrypt is to give a simple way to computational scientists, who are typically familiar with C or FORTRAN and not familiar with general script languages such as Perl and Ruby, to automate various workflows that consist of plenty of runs of programs and dependencies among them.

Differently from pre-existing workflow tools, Xcrypt is required to be not only simple but also flexible as a programming language; we should be able to implement from simple parameter sweeps to complicated search algorithms using Xcrypt. We realized these requirements by starting with Perl, a general script programming language, and extending it by adding features to release programmers from various annoying tasks such as writing job scripts for batch systems, generating/analyzing a huge number of input/output files, and managing states of asynchronously running jobs.

In addition, we provided a mechanism that enables “Perl wizards” to add various helpful “spells” (e.g., smart search algorithms) as modules in the way that end-users can use them easily.

Due to all of these features, Xcrypt users can run a wide variety of workflows only by writing simple scripts.


Figure 2 Two Layer Parallelism

As shown in Figure 2, we aim to achieve peta/exa-scale computing easily by combination with lower-layer parallelization implemented using OpenMP, MPI, and/or XscalableMP.

Example

use base qw(limit core);
limit::initialize (30);
%template = (
‘id’ => ‘example’,
‘RANGE0′ => [1..5000],
‘exe’ => ‘./a.out’,
‘arg0@’ => ‘”input$_[0]“‘
‘arg1@’ => ‘”output$_[0]“‘,
‘copiedfile0′ => ‘a.out’,
‘copiedfile1@’ => ‘”input$_[0]“‘,
‘queue’ => ‘medium’,
);
@jobs = prepare (%template);
submit (@jobs);
sync (@jobs);

Figure 3 Xcrypt Script for a Parameter Sweep

Figure 3 shows a simple example of an end-user script of Xcrypt. This script submits 5,000 jobs that execute a single program “a.out” with different command line arguments for each job, with limiting simultaneously running jobs up to 30.

As this example, a typical Xcrypt script consists of:

  • preamble that specifies modules to be loaded, initializes some global parameters, and so on,
  • definition of jobs to be submitted, and
  • function calls to prepare for submitting jobs (e.g., creating working directories), submit the jobs, and wait for the jobs finished.

Job Definition

Jobs are defined declaratively as a Perl hash object that contains parameter values as members’ values. Using parameters named RANGEn, we can define a single object for a sequence of jobs. In that case, we can set a different parameter for each job using a parameter name ending withand a string evaluated by Perl interpreter in the environment where $_[n] binds the corresponding value in the RANGEn.

Submitting/Synchronizing Jobs

Defined jobs are submitted imperatively by the submit() function. Before submitting, we need to call prepare() to make a working directory and copy necessary files for each submitting job. All the submitted jobs are executed asynchronously; we can wait for the jobs finished by sync().

Defining/Using Modules

package limit;
use NEXT;
use Thread::Semaphore;

my $smph;

sub initialize { $smph = Thread::Semaphore->new($_[0]); }

sub before {
my $self = shift;
$smph->down;
$self->NEXT::before();
}

sub after {
my $self = shift;
$self->NEXT::after();
$smph->up;
}

Figure 4 The “limit” module

For limiting the number of simultaneously running jobs, the script in Figure 3 uses themodule, which is implemented as shown in Figure 4. An Xcrypt module is defined as an extension to the class for job objects namedIn theclass and its subclasses, methods namedandhave special meanings; they are invoked asynchronously before submitting a job and after the job finished, respectively.

Due to this mechanism, a wide variety of functionalities can be developed and end-users can use them easily only by writing module names for (multiple) class inheritance. For instance, we have also implemented the modules for dry execution and for allowing management of the order of submitting jobs by declarative description of dependencies among them.

Job Script Generation

When submit() is invoked, Xcrypt runtime generates a job script for the batch scheduler (e.g., NQS, Torque, LSF, or SGE) based on information in a job object. In order to support a wide variety of batch schedulers, which have different command-line interfaces, specifications for job scripts, and so on each other, Xcrypt provides a mechanism that enable programmers or system administrators define a new batch scheduler by writing a Perl-based configuration script.

$jobsched::jobsched_config{“NQS”} = {
qsub_command => “/usr/local/bin/qsub”,
qdel_command => “/usr/local/bin/qdel -K”,
qstat_command => “/usr/local/bin/qstat”,
jobscript_option_queue => ‘# @$-q ‘,
jobscript_option_stdout => ‘# @$-o ‘,
jobscript_option_stderr => ‘# @$-e ‘,
extract_req_id_from_qsub_output => sub {
my (@lines) = @_;
if ($lines[0] =~ /([0-9]*).nqs/) { return $1 ;}
else { return -1; }
},

}

Figure 5 Configuration Script to Define a Batch Scheduler

Figure 5 shows an example of such a script. Each parameter value is allowed to be a string or a function object, which realizes both easiness to write and flexibility for various specifications of batch schedulers.

Libraries for Input File Generation and Output File Extraction

Of course, we can use legacy Unix tools such as grep, sed, and awk in order to generate input files and extract data from output files for a huge number of jobs. However, it is not so easy for users who are unfamiliar with regular expressions to generate a large number of FORTRAN namelists that are slightly different each other or extract certain elements from an output file that represents a matrix.

Therefore, Xcrypt provides higher level generation/extraction libraries; we improved usability by specializing them for use in computational science such as modifying a FORTRAN namelist and extracting data from output files by specifying both row and column numbers.


Download

Latest version

Older version

Latest Development Version

You can get the real-time latest changeset of Mercurial by:
% hg clone http://super.para.media.kyoto-u.ac.jp/xcrypt-hg

or by downloading its tgz archive from here
Note that this development version may be unstable.

Manual (2011/10/06 updated)


Papers

  • Tasuku Hiraishi, Tatsuya Abe, Yohei Miyake, Takeshi Iwashita, Hiroshi Nakashima:
    Xcrypt: Flexible and Intuitive Job-Parallel Script Language,Symposium on Advanced Computing Systems and Infrastructures (SACSIS2010), pp. 183–191, Nara, Japan, 2010 (In Japanese)
  • Tasuku Hiraishi, Takeshi Iwashita, Hiroshi Nakashima: Towards Seamless and Highly-Productive Parallel Script Language, The 119th IPSJ Workshop on High Performance Computing (HOKKE-2009), pp. 175–180, Sapporo, 2009 (In Japanese)  [PDF]
  • Tasuku Hiraishi, Tatsuya Abe, Yohei Miyake, Takeshi Iwashita, Hiroshi Nakashima:
    Development of Xcrypt: Highly Productive Parallel Script Language, IPSJ Summer Programming Symposium 2009, pp. 67–73, Nasushiobara, Tochigi, (In Japanese)
    [PDF]

Presentations

  • T2K Open Supercomputer Alliance:
    Seamless and Highly-Productive Parallel Programming Environment for High-Performance Computing, SC09 Exhibition, Portland, Oregon, 2009 [PDF]
  • Tasuku HIRAISHI, Hiroshi NAKASHIMA:Xcrypt: Highly-Productive Parallel Script Language, Third French-Japanese Workshop Petascale Applications, Algorithms and Programming (PAAP2009), Kyoto University, 2009 link
  • Hiroshi Nakashima, Tasuku Hiraishi, Tatsuya Abe, Yohei Miyake:Progress Report of Xcrypt: What You Can Do Now with the Parallel Script Language, WPSE2010, Kyoto, 2010 link
  • Tasuku HIRAISHI: Xcrypt Tutorial, PC Cluster Workshop in Kyoto 2010, Kyoto,
    2010 (In Japanese) link