Example Workflow Structures

1 Single Job
2 Set of Independent Jobs
3 Simple Split/Merge Workflow

Scientific workflows are commonly used in scientific research and other data-intensive applications to manage complex computational processes. However, complexity is not a requirement - even simple structures can benefit from being implemented as a workflow. Here are some basic examples.

Single Job

This is the simplest type of scientific workflow, where a single job is executed to perform a specific task. For example, running a single simulation or data analysis on a set of input files.

wf = Workflow(name="single-job")

in_file = File("input.data")
out_file = File("output.data")
job = Job(my_process)\
        .add_args("example")\
        .add_inputs(in_file)\
        .add_outputs(out_file)

wf.add_jobs(job)
wf.write()

Set of Independent Jobs

This type of workflow consists of multiple jobs executed in parallel, with each job performing a distinct task. Each job is independent of other jobs and can run simultaneously on different computing resources, making this approach more efficient than a single job. For example, processing a set of images or running multiple simulations with varying input parameters.

wf = Workflow(name="independent-jobs")

for i in range(1, n + 1):

    in_file = File(f"input-{i}.data")
    out_file = File(f"output-{i}.data")
    job = Job(my_process)\
            .add_args(str(i))\
            .add_inputs(in_file)\
            .add_outputs(out_file)

    wf.add_jobs(job)

wf.write()

Simple Split/Merge Workflow

Workflow with split and merge jobs: This workflow involves splitting a single input into multiple independent jobs, each of which performs a distinct task. The outputs of these jobs are then merged back together to form a single output. This approach can be used to parallelize a workflow across multiple computing resources, allowing multiple jobs to be executed in parallel. For example, processing large datasets or running multiple simulations in parallel.

wf = Workflow(name="split-merge")

common_file = File("common.data")
common_job = Job(my_process)\
               .add_outputs(common_file)
wf.add_jobs(common_job)

# define merge jobs first, so we can reference
# it easily in the loop
output_file = File("output.data")
merge_job = Job(my_process)\
              .add_outputs(output_file)
wf.add_jobs(merge_job)

for i in range(1, n + 1):

    proc_out_file = File(f"file-{i}.data")
    job = Job(my_process)\
            .add_args(str(i))\
            .add_inputs(common_file)\
            .add_outputs(proc_out_file)
    wf.add_jobs(job)

    # make sure merge jobs get the correct data
    merge_job.add_inputs(proc_out_file)

wf.write()

ACCESS Documentation

Example Workflow Structures

Analytics

Single Job

Set of Independent Jobs

Simple Split/Merge Workflow

Related content