Pegasus Workflows

Pegasus Workflows

Overview

Pegasus is a scientific workflow management system that enables users to run computational workflows across ACCESS resources. It simplifies the orchestration of jobs and data transfers across multiple providers.

soykb.png
Example workflow from the SoyKB project

Pegasus is used in production across diverse domains including astronomy, gravitational-wave physics, bioinformatics, earthquake engineering, helio-seismology, limnology, machine learning, and molecular dynamics. It provides powerful abstractions to define workflows and ensures reliable and scalable execution across a wide range of computing platforms. For more details, visit the Pegasus website or explore the user guide.

How Pegasus Works

Workflows are defined using Pegasus APIs. A common approach is using the Python API within a Jupyter Notebook, starting from example workflows that can be easily customized. Each workflow is described as a set of compute jobs, specifying executables, input files, and expected outputs. Pegasus automatically infers dependencies between jobs based on file usage. Workflows can also include nested sub-workflows to scale to hundreds, thousands, or even millions of tasks.

After defining a workflow, Pegasus transforms it into an executable workflow tailored to the target execution environment—this is called the planning phase. Because Pegasus uses abstract workflows, they are portable and can be re-planned for different systems. During planning, Pegasus also applies optimizations to improve efficiency and reliability.

Hosted Pegasus Deployments on ACCESS

Three Pegasus environments are currently available on ACCESS:

  • ACCESS Pegasus
    A Jupyter-based interface that submits pilot (glidein) jobs across multiple ACCESS sites. Ideal for HTC workloads. Available to all ACCESS users. More info.

  • PSC Neocortex and Bridges-2
    A Jupyter setup integrated with PSC’s Neocortex and Bridges-2 systems. See PSC documentation for details. There are also examples available.

  • Purdue Anvil
    A JupyterLab environment running on Anvil’s composable partition, allowing direct submission to Slurm and access to Anvil file systems. Available to all Anvil users via Anvil Notebooks.

Need Help?

Several support options are available:

  • ACCESS Support Tickets
    Submit a ticket via the ACCESS help desk and tag it with workflows for faster routing.

  • Pegasus Slack Workspace
    Join the Pegasus user community for live support and discussion. Try joining pegasus-users.slack.com in the Slack app or email pegasus-support@isi.edu to request an invite.

  • Pegasus Users Mailing List
    pegasus-users@isi.edu is an open discussion list. You can subscribe here.