Paraloop

Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 05/04/10
  • Minor correction: 04/04/13
Keywords

Paraloop : distributing parallel jobs

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System: UNIX-like
  • Current version: 1.3 - 2008 September
  • License(s): CeCILL
  • Status: stable release
  • Support: maintained, ongoing development
  • Designer(s): Emmanuel COURCELLE
  • Contact designer(s): emmanuel.courcelle at toulouse.inra.fr
  • Laboratory, service: LIPM, Service Bioinformatique

 

General software features

Paraloop distributes your jobs on several processors of a machine, independently of its architecture: it may be a single SMP computer with shared memory, as well as a cluster, or even a network of workstations.

Paraloop is best suited for the use cases when we have a high number of independant tasks to execute, as is often the case in the data treatment pipelines found in bioinformatics projects.

Paraloop is a tool for programmers, who are able to easily distribute their jobs, while using the same script whatever the machine they run on. It is a perl object program: data treatment is wrapped inside an object (called a "plugin"), the code responsible for the machine interaction is wrapped inside another object (called "scheduler"). It is thus relatively easy to adapt paraloop to a new architecture: it just means writing a new scheduler (in fact, just a few methods). The same is true for plugins: they are able to read and treat data, using some particular format.

A few plugins are delivered with paraloop: some of them are specific to the bioinformatics field (one of them is useful to execute BLAST in parallel, for instance), while others are completely generic (reading a text file, ...). However, writing plugins dedicated for other thematic fields would be a quite useful task.

When used in a queue context, with a limited cpu time per job, it is possible to configure paraloop so that the current job is interrupted before being killed by the system; the job is resubmitted to the queue just before the interruption, so that it will be resumed as soon as permitted by the system.

Besides, paraloop includes a command to print the progress report of each job.

Finally, a "load balancing mode" is available: it may be used to insure that all the jobs take approximately the same time to execute.

Context in which the software is used

We currently use paraloop for our bioinformatics computations, whether on SMP servers or computer clusters. Besides, paraloop is integrated to our bioinformatics projects (eugene, LeARN, Narcisse, ...)

Publications related to the software

Paraloop was described in a poster session a J-RES 2005 http://2005.jres.org/resume/poster/138.pdf and at JOBIM, in 2005 too http://pbil.univ-lyon1.fr/events/jobim2005/proceedings/P64Courcelle.pdf

Paraloop is currently hosted on the SourceSup forge: http://sourcesup.cru.fr/projects/paraloop