Delta (NCSA)

Last update: July 29, 2022

Delta is supported by the National Science Foundation under Grant No. OAC-2005572.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Delta is now accepting proposals.

Status Updates and Notices

Delta is tentatively scheduled to enter production in Q3 2022.
A preliminary version of the Delta User Guide is available.

Introduction

Delta is a dedicated, ACCESS-allocated resource designed by HPE and NCSA, delivering a highly capable GPU-focused compute environment for GPU and CPU workloads. Besides offering a mix of standard and reduced-precision GPU resources, Delta offers GPU-dense nodes with both NVIDIA and AMD GPUs, with high-performance node-local SSD scratch filesystems and both standard Lustre and relaxed-POSIX parallel filesystems (docs pending) spanning the entire resource.

Delta's standard CPU nodes are each powered by two 64-core AMD EPYC 7763 ("Milan") processors, with 256 GB of DDR4 memory. The Delta GPU resource has four node types: one with 4 NVIDIA A100 GPUs (40 GB HBM2 RAM each) connected via NVLINK and 1 64-core AMD EPYC 7763 ("Milan") processor, the second with 4 NVIDIA A40 GPUs (48 GB GDDR6 RAM) connected via PCIe 4.0 and 1 64-core AMD EPYC 7763 ("Milan") processor, the third with 8 NVIDIA A100 GPUs in a dual socket AMD EPYC 7763 ("Milan") (128-cores per node) node with 2 TB of DDR4 RAM and NVLINK, and the fourth with 8 AMD MI100 GPUs (32GB HBM2 RAM each) in a dual socket AMD EPYC 7763 ("Milan") (128-cores per node) node with 2 TB of DDR4 RAM and PCIe 4.0.

Delta has 124 standard CPU nodes, 100 4-way A100-based GPU nodes, 100 4-way A40-based GPU nodes, 5 8-way A100-based GPU nodes, and 1 8-way MI100-based GPU node. Every Delta node has high-performance node-local SSD storage (740 GB for CPU nodes, 1.5 TB for GPU nodes), and is connected to the 7 PB Lustre parallel filesystem via the high-speed interconnect. The Delta resource uses the SLURM workload manager for job scheduling.

Delta supports the ACCESS core software stack, including remote login, remote computation, data movement, science workflow support, and science gateway support toolkits.

Account Administration

  • For ACCESS projects please use the ACCESS user portal for project and account management.

  • Non-ACCESS Account and Project administration is handled by NCSA Identity and NCSA group management tools. For more information please see the NCSA Allocation and Account Management documentation page.

Configuring Your Account

  • Bash is the default shell; submit a support request to change your default shell

  • Environment variables: ACCESS CUE, SLURM batch

  • Using Modules

System Architecture

Delta is designed to help applications transition from CPU-only to GPU or hybrid CPU-GPU codes. Delta has some important architectural features to facilitate new discovery and insight:

  • A single processor architecture (AMD) across all node types: CPU and GPU

  • Support for NVIDIA A100 MIG GPU partitioning allowing for fractional use of the A100s if your workload isn't able to exploit an entire A100 efficiently

  • Ray-tracing hardware support from the NVIDIA A40 GPUs

  • Nine large memory (2 TB) nodes

  • A low-latency and high-bandwidth HPE/Cray Slingshot interconnect between compute nodes

  • Lustre for home, projects and scratch file systems

  • Support for relaxed and non-posix I/O (docs pending)

  • Shared-node jobs and the single core and single MIG GPU slice

  • Resources for persistent services in support of Gateways, Open OnDemand and Data Transport nodes

  • Unique AMD MI-100 resource

Model Compute Nodes

The Delta compute ecosystem is composed of 5 node types:

  1. Dual-socket CPU-only compute nodes

  2. Single-socket 4-way NVIDIA A100 GPU compute nodes

  3. Single-socket 4-way NVIDIA A40 GPU compute nodes

  4. Dual-socket 8-way NVIDIA A100 GPU compute nodes

  5. Single-socket 8-way AMD MI100 GPU compute nodes

The CPU-only and 4-way GPU nodes have 256 GB of RAM per node while the 8-way GPU nodes have 2 TB of RAM. The CPU-only node has 0.74 TB of local storage while all GPU nodes have 1.5 TB of local storage.

CPU and GPU Node Specifications

Table 1. CPU Compute Node Specifications

NUMBER OF NODES

124

CPU

AMD Milan (PCIe Gen4)

SOCKETS PER NODE

2

CORES PER SOCKET

64

CORES PER NODE

128

HARDWARE THREADS PER CORE

1

HARDWARE THREADS PER NODE

128

CLOCK RATE (GHZ)

~2.45

RAM (GB)

256

CACHE (KB)

64/512/32768

LOCAL STORAGE (TB)

0.74 TB

The AMD CPUs are set for 4 NUMA domains per socket (NPS=4).

Table 2. GPU Node Specifications

 

4-WAY NVIDIA A40

4-WAY NVIDIA A100

8-WAY NVIDIA A100 LARGE MEMORY

8-WAY AMD MI100 LARGE MEMORY

 

4-WAY NVIDIA A40

4-WAY NVIDIA A100

8-WAY NVIDIA A100 LARGE MEMORY

8-WAY AMD MI100 LARGE MEMORY

NUMBER OF NODES

100

100

5

1

GPU

NVIDIA A40 (Vendor page)

NVIDIA A100 (Vendor page)

NVIDIA A100 (Vendor page)

AMD MI100 (Vendor page)

GPUS PER NODE

4

4

8

8

GPU MEMORY (GB)

48 DDR6 with ECC

40

40

32

CPU

AMD Milan

AMD Milan

AMD Milan

AMD Milan

CPU SOCKETS PER NODE

1

1

2

2

CORES PER SOCKET

64

64

64

64

CORES PER NODE

64

64

128

128

HARDWARE THREADS PER CORE

1

1

1

1

HARDWARE THREADS PER NODE

64

64

128

128

CLOCK RATE (GHZ)

~ 2.45

~ 2.45

~ 2.45

~ 2.45

RAM (GB)

256

256

2048

2048

CACHE (KB) L1/L2/L3

64/512/32768

64/512/32768

64/512/32768

64/512/32768

LOCAL STORAGE (TB)

1.5

1.5

1.5

1.5

The AMD CPUs are set for 4 NUMA domains per socket (NPS=4).

The A40 GPUs are connected via PCIe Gen4 and have the following affinitization to NUMA nodes on the CPU. Note that the relationship between GPU index and NUMA domain are inverse.

Table 2-1. 4-way NVIDIA A40 Mapping and GPU-CPU Affinitization

 

GPU0

GPU1

GPU2

GPU3

HSN

CPU AFFINITY

NUMA AFFINITY

 

GPU0

GPU1

GPU2

GPU3

HSN

CPU AFFINITY

NUMA AFFINITY

GPU0

X

SYS

SYS

SYS

SYS

48-63

3

GPU1

SYS

X

SYS

SYS

SYS

32-47

2

GPU2

SYS

SYS

X

SYS

SYS

16-31

1

GPU3

SYS

SYS

SYS

X

PHB

0-15

0

HSN

SYS

SYS

SYS

PHB

X

 

 

Table 2-1 Legend

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
NV# = Connection traversing a bonded set of # NVLinks

Table 2-2. 4-way NVIDIA A100 Mapping and GPU-CPU Affinitization

 

GPU0

GPU1

GPU2

GPU3

HSN

CPU AFFINITY

NUMA AFFINITY

 

GPU0

GPU1

GPU2

GPU3

HSN

CPU AFFINITY

NUMA AFFINITY

GPU0

X

NV4

NV4

NV4

SYS

48-63

3

GPU1

NV4

X

NV4

NV4

SYS

32-47

2

GPU2

NV4

NV4

X

NV4

SYS

16-31

1

GPU3

NV4

NV4

NV4

X

PHB

0-15

0

HSN

SYS

SYS

SYS

PHB

X

 

 

Table 2-2 Legend

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
NV# = Connection traversing a bonded set of # NVLinks

Table 2-3. 8-way NVIDIA A100 Mapping and GPU-CPU Affinitization

 

GPU0

GPU1

GPU2

GPU3

GPU4

GPU5

GPU6

GPU7

HSN

CPU AFFINITY

NUMA AFFINITY

 

GPU0

GPU1

GPU2

GPU3

GPU4

GPU5

GPU6

GPU7

HSN

CPU AFFINITY

NUMA AFFINITY

GPU0

X

NV12

NV12

NV12

NV12

NV12

NV12

NV12

SYS

48-63

3

GPU1

NV12

X

NV12

NV12

NV12

NV12

NV12

NV12

SYS

48-63

3

GPU2

NV12

NV12

X

NV12

NV12

NV12

NV12

NV12

SYS

16-31

1

GPU3

NV12

NV12

NV12

X

NV12

NV12

NV12

NV12

SYS

16-31

1

GPU0

NV12

NV12

NV12

NV12

X

NV12

NV12

NV12

SYS

112-127

7

GPU1

NV12

NV12

NV12

NV12

NV12

X

NV12

NV12

SYS

112-127

7

GPU2

NV12

NV12

NV12

NV12

NV12

NV12

X

NV12

SYS

80-95

5

GPU3

NV12

NV12

NV12

NV12

NV12

NV12

NV12

X

SYS

80-95

5

HSN

SYS

SYS

SYS

SYS

SYS

SYS

SYS

SYS

X

 

 

Table 2-3 Legend

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
NV# = Connection traversing a bonded set of # NVLinks

Login Nodes

Login nodes provide interactive support for code compilation.

Specialized Nodes

Delta will support data transfer nodes (serving the "NCSA Delta" Globus Online collection) and nodes in support of other services. (docs pending)

Network

Delta is connected to the NPCF core router & exit infrastructure via two 100Gbps connections, NCSA's 400Gbps+ of WAN connectivity carry traffic to/from users on an optimal peering.

Delta resources will be inter-connected with HPE/Cray's 100Gbps/200Gbps SlingShot interconnect.

File Systems

Note: Users of Delta have access to three file systems at the time of system launch, a fourth relaxed-POSIX file system will be made available at a later date.(docs pending)

Delta

The Delta storage infrastructure provides users with their $HOME and $SCRATCH areas. These file systems are mounted across all Delta nodes and are accessible on the Delta DTN Endpoints. The aggregate performance of this subsystem is 70GB/s and it has 6PB of usable space. These file systems run Lustre via DDN's ExaScaler 6 stack (Lustre 2.14 based).

Hardware

DDN SFA7990XE (Quantity: 3), each unit contains

  • One additional SS9012 enclosure

  • 168 x 16TB SAS Drives

  • 7 x 1.92TB SAS SSDs

The $HOME file system has 4 OSTs and is set with a default stripe size of 1.

The $SCRATCH file system has 8 OSTs and has Lustre Progressive File Layout (PFL) enabled which automatically restripes a file as the file grows. The thresholds for PFL striping for $SCRATCH are:

FILE SIZE

STRIPE COUNT

FILE SIZE

STRIPE COUNT

0-32M

1 OST

32M-512M

4 OST

512M+

8 OST

Best Practices

  • To reduce the load on the file system metadata services, the ls option for context dependent font coloring, --color, is disabled by default.

Future Hardware

An additional pool of NVME flash from DDN has been installed in early summer 2022. This flash is initially deployed as a tier for "hot" data in scratch. This subsystem will have an aggregate performance of 500GB/s and will have 3PB of raw capacity. As noted above this subsystem will transition to an independent relaxed POSIX namespace file system, communications on that timeline will be announced as updates are available.

Taiga

Taiga is NCSA's global file system which provides users with their $WORK area. This file system is mounted across all Delta systems at /taiga (also /taiga/nsf/delta is bind mounted at /projects) and is accessible on both the Delta and Taiga DTN endpoints. For NCSA & Illinois researchers, Taiga is also mounted on NCSA's HAL and Radiant systems. This storage subsystem has an aggregate performance of 140GB/s and 1PB of its capacity allocated to users of the Delta system. /taiga is a Lustre file system running DDN Exascaler software.

Hardware

DDN SFA400NVXE (Quantity: 2), each unit contains:

  • 4 x SS9012 enclosures

  • NVME for metadata and small files

DDN SFA18XE (Quantity: 1) coming soon, unit contains:

  • 10 x SS9012 enclosures

  • NVME for metadata and small files

$WORK and $SCRATCH
A "module reset" in a job script will populate $WORK and $SCRATCH environment variables automatically, or you may set them as WORK=/projects/<account>/$USER, SCRATCH=/scratch/<account>/$USER.

Table 3. Filesystems Feature Comparison

FILE SYSTEM

QUOTA

SNAPSHOTS

PURGED

KEY FEATURES

FILE SYSTEM

QUOTA

SNAPSHOTS

PURGED

KEY FEATURES

HOME

25GB. 400,000 files per user

No/TBA

No

Area for software, scripts, job files, etc. NOT intended as a source/destination for I/O during jobs

WORK

500 GB Up to 1-25 TB by allocation request

No/TBA

No

Area for shared data for a project, common data sets, software, results, etc.

SCRATCH

1000 GB Up to 100 TB by allocation request

No

Yes; files older than 30 days (access time)

Area for computation, largest allocations, where I/O from jobs should occur

/tmp

0.74-1.50 TB shared or dedicated depending on node usage by job(s), no quotas in place

No

After each job

Locally attached disk for fast small file I/O

Quota usage

The quota command allows you to view your use of the file systems and use by your projects. Below is a sample output for a person "user" who is in two projects: aaaa, and bbbb. The home directory quota does not depend on which project group the file is written with.

quota command

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 @dt-login01 ~]$ quota Quota usage for user : ------------------------------------------------------------------------------------------- | Directory Path | User | User | User | User | User | User | | | Block| Soft | Hard | File | Soft | Hard | | | Used | Quota| Limit | Used | Quota | Limit| -------------------------------------------------------------------------------------- | /u/ | 20k | 25G | 27.5G | 5 | 300000 | 330000 | -------------------------------------------------------------------------------------- Quota usage for groups user is a member of: ------------------------------------------------------------------------------------- | Directory Path | Group | Group | Group | Group | Group | Group | | | Block | Soft | Hard | File | Soft | Hard | | | Used | Quota | Limit | Used | Quota | Limit | ------------------------------------------------------------------------------------------- | /projects/aaaa | 8k | 500G | 550G | 2 | 300000 | 330000 | | /projects/bbbb | 24k | 500G | 550G | 6 | 300000 | 330000 | | /scratch/aaaa | 8k | 552G | 607.2G| 2 | 500000 | 550000 | | /scratch/bbbb | 24k | 9.766T| 10.74T| 6 | 500000 | 550000 | ------------------------------------------------------------------------------------------

Accessing the System

Direct Access

Direct access to the Delta login nodes is via ssh using your NCSA username, password and NCS Duo MFA. Please see User Services NCSA Allocation and Account Management page for links to NCSA Identity and NCSA Duo services. The login nodes provide access to the CPU and GPU resources on Delta.

LOGIN NODE HOSTNAME

EXAMPLE USAGE WITH SSH

LOGIN NODE HOSTNAME

EXAMPLE USAGE WITH SSH

dt-login01.delta.ncsa.illinois.edu

ssh -Y username@dt-login01.delta.ncsa.illinois.edu
(-Y allows X11 forwarding from linux hosts)

dt-login02.delta.ncsa.illinois.edu

ssh -l username dt-login02.delta.ncsa.illinois.edu
(-l username alt. syntax for user@host)

login.delta.ncsa.illinois.edu
(round robin DNS name for the set of login nodes)

ssh username@login.delta.ncsa.illinois.edu

If you need to set an NCSA password for direct access, please contact help@ncsa.illinois.edu for assistance.

Use of ssh-key pairs is disabled for general use. Please contact NCSA Help at help@ncsa.illinois.edu for key-pair use by Gateway allocations.

maintaining persistent sessions: tmux
tmux is available on the login nodes to maintain persistent sessions. See the tmux man page for more information. Use the targeted login hostnames (dt-login01 or dt-login02) to attach to the login node where you started tmux after making note of the hostname. Avoid the round-robin hostname when using tmux.

ACCESS

ACCESS users can connect to Delta via an SSH client. Note that Delta does not support the use of ssh keys but requires the use of NCSA Duo multi-factor authentication.

To set up NCSA Duo once you have been added to a ACCESS Delta allocation please see the User Services NCSA Allocation and Account Management page. You will need to know your NCSA username.

With NCSA Duo configured and your NCSA username available, an ACCESS user can login using the ssh command:

1 $ ssh yourNCSAUserName@login.delta.ncsa.illinois.edu

When reporting a problem to the help desk, please execute the ssh command with the -vvv option and include the verbose output in your problem description.

Citizenship

You share Delta with thousands of other users, and what you do on the system affects others. Exercise good citizenship to ensure that your activity does not adversely impact the system and the research community with whom you share it. Here are some rules of thumb:

  • Don't run production jobs on the login nodes (very short time debug tests are fine)

  • Don't stress filesystems with known-harmful access patterns (many thousands of small files in a single directory)

  • Submit an informative help-desk ticket including loaded modules (module list) and stdout/stderr messages

Managing and Transferring Files

File Systems

Each user has a home directory, $HOME, located at /u/$USER.

For example, a user (with username auser) who has an allocated project with a local project serial code abcd will see the following entries in their $HOME and in the project and scratch file systems. To determine the mapping of ACCESS project to local project please use the accounts command or the userinfo command.

Directory access changes can be made using the facl command. Contact help@ncsa.illinois.edu if you need assistance with enabling access to specific users and projects.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 $ ls -ld /u/$USER drwxrwx---+ 12 root root 12345 Feb 21 11:54 /u/$USER $ ls -ld /projects/abcd drwxrws---+ 45 root delta_abcd 4096 Feb 21 11:54 /projects/abcd $ ls -l /projects/abcd total 0 drwxrws---+ 2 auser delta_abcd 6 Feb 21 11:54 auser drwxrws---+ 2 buser delta_abcd 6 Feb 21 11:54 buser ... $ ls -ld /scratch/abcd drwxrws---+ 45 root delta_abcd 4096 Feb 21 11:54 /scratch/abcd $ ls -l /scratch/abcd total 0 drwxrws---+ 2 auser delta_abcd 6 Feb 21 11:54 auser drwxrws---+ 2 buser delta_abcd 6 Feb 21 11:54 buser ...

To avoid issues when file systems become unstable or non-responsive, we recommend not putting symbolic links from $HOME to the project and scratch spaces.

/tmp on compute nodes (job duration)
The high performance ssd storage (740GB cpu, 1.5TB gpu) is available in /tmp (unique to each node and job - not a shared filesystem) and may contain less than the expected free space if the node(s) are running multiple jobs. Codes that need to perform I/O to many small files should target /tmp on each node of the job and save results to other filesystems before the job ends.

Transferring your Files

To transfer files to and from the Delta system:

Sharing Files with Collaborators

Please email help@ncsa.illinois.edu if you want to share with other groups and projects on Delta.
See Globus collection sharing for sharing with collaborators not on Delta.

Building Software

The Delta programming environment supports the GNU, AMD (AOCC), Intel and NVIDIA HPC compilers. Support for the HPE/Cray Programming environment is forthcoming. (docs pending)

Modules provide access to the compiler + MPI environment.

The default environment includes the GCC 11.2.0 compiler + OpenMPI with support for cuda and gdrcopy. nvcc is in the cuda module and is loaded by default.

AMD recommended compiler flags for GNU, AOCC, and Intel compilers for Milan processors can be found in the AMD Compiler Options Quick Reference Guide for Epyc 7xx3 processors [PDF].

Serial

To build (compile and link) a serial program in Fortran, C, and C++:

GCC

AOCC

NVHPC

GCC

AOCC

NVHPC

gfortran myprog.f

flang myprog.f

nvfortran myprog.f

gcc myprog.c

clang myprog.c

nvc myprog.c

g++ myprog.cc

clang myprog.cc

nvc++ myprog.cc

Table 4. Compile Commands for Serial Codes

MPI

To build (compile and link) a MPI program in Fortran, C, and C++:

MPI IMPLEMENTATION

MODULEFILE FOR MPI/COMPILER

BUILD COMMANDS

MPI IMPLEMENTATION

MODULEFILE FOR MPI/COMPILER

BUILD COMMANDS

OpenMPI
Home Page | Documentation

aocc/3.2.0 openmpi
gcc/11.2.0 openmpi
nvhpc/22.2 openmpi

Fortran 77: mpif77 myprog.f
Fortran 90: mpif90 myprog.f90
C: mpicc myprog.c
C++: mpic++ myprog.cc

Table 5. Compile Commands for MPI Codes

OpenMP

To build an OpenMP program, use the -fopenmp / -mp option:

GCC

AOCC

NVHPC

GCC

AOCC

NVHPC

gfortran -fopenmp myprog.f

flang -fopenmp myprog.f

nvfortran -mp myprog.f

gcc -fopenmp myprog.c

clang -fopenmp myprog.c

nvc -mp myprog.c

g++ -fopenmp myprog.cc

clang -fopenmp myprog.cc

nvc++ -mp myprog.cc

Table 6. Compile Commands for OpenMP Codes

Hybrid MPI/OpenMP

To build an MPI/OpenMP hybrid program, use the -fopenmp / -mp option with the MPI compiling commands:

GCC

PGI/NVHPC

GCC

PGI/NVHPC

mpif77 -fopenmp myprog.f

mpif77 -mp myprog.f

mpif90 -fopenmp myprog.f90

mpif90 -mp myprog.f90

mpicc -fopenmp myprog.c

mpicc -mp myprog.c

mpic++ -fopenmp myprog.cc

mpic++ -mp myprog.cc

Table 7. Compile Commands for Hybrid MPI/OpenMP Codes

Cray xthi.c sample code

Document - XC Series User Application Placement Guide CLE6..0UP01 S-2496 | HPE Support

This code can be compiled using the methods show above. The code appears in some of the batch script examples below to demonstrate core placement options.

xthi.c source

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 #define _GNU_SOURCE #include #include #include #include #include #include /* Borrowed from util-linux-2.13-pre7/schedutils/taskset.c */ static char *cpuset_to_cstr(cpu_set_t *mask, char *str) { char *ptr = str; int i, j, entry_made = 0; for (i = 0; i < CPU_SETSIZE; i++) { if (CPU_ISSET(i, mask)) { int run = 0; entry_made = 1; for (j = i + 1; j < CPU_SETSIZE; j++) { if (CPU_ISSET(j, mask)) run++; else break; } if (!run) sprintf(ptr, "%d,", i); else if (run == 1) { sprintf(ptr, "%d,%d,", i, i + 1); i++; } else { sprintf(ptr, "%d-%d,", i, i + run); i += run; } while (*ptr != 0) ptr++; } } ptr -= entry_made; *ptr = 0; return(str); } int main(int argc, char *argv[]) { int rank, thread; cpu_set_t coremask; char clbuf[7 * CPU_SETSIZE], hnbuf[64]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); memset(clbuf, 0, sizeof(clbuf)); memset(hnbuf, 0, sizeof(hnbuf)); (void)gethostname(hnbuf, sizeof(hnbuf)); #pragma omp parallel private(thread, coremask, clbuf) { thread = omp_get_thread_num(); (void)sched_getaffinity(0, sizeof(coremask), &coremask); cpuset_to_cstr(&coremask, clbuf); #pragma omp barrier printf("Hello from rank %d, thread %d, on %s. (core affinity = %s)\n", rank, thread, hnbuf, clbuf); } MPI_Finalize(); return(0); }

A version of xthi is also available from ORNL.

1 % git clone https://github.com/olcf/XC30-Training/blob/master/affinity/Xthi.c

OpenACC

To build an OpenACC program, use the -acc option and the -mp option for multi-threaded:

NON-MULTITHREADED

MULTITHREADED

NON-MULTITHREADED

MULTITHREADED

nvfortran -acc myprog.f

nvfortran -acc -mp myprog.f

nvc -acc myprog.c

nvc -acc -mp myprog.c

nvc++ -acc myprog.cc

nvc++ -acc -mp myprog.cc

Table 8. Compile Commands for OpenACC Codes

CUDA

Cuda compilers (nvcc) are included in the cuda module which is loaded by default under modtree/gpu. For the cuda fortran compiler and other Nvidia development tools, load the "nvhpc" module.

nv* commands when nvhpc is loaded:

1 2 3 4 5 6 7 8 9 10 11 12 [arnoldg@dt-login03 namd]$ nv nvaccelerror nvidia-bug-report.sh nvlink nvaccelinfo nvidia-cuda-mps-control nv-nsight-cu nvc nvidia-cuda-mps-server nv-nsight-cu-cli nvc++ nvidia-debugdump nvprepro nvcc nvidia-modprobe nvprof nvcpuid nvidia-persistenced nvprune nvcudainit nvidia-powerd nvsize nvdecode nvidia-settings nvunzip nvdisasm nvidia-sleep.sh nvvp nvextract nvidia-smi nvzip nvfortran nvidia-xconfig

See also: NVIDIA HPC SDK

HIP / ROCM (AMD MI100)

To access the development environment for the gpuMI100x8 partition, load the pytorch container from AMD's Infinity HUB. The paths to compilers will then be set for you (works from a login node for compiling, use the partition and srun to run/test):

AMD HIP development environment (container):

1 2 3 4 5 [arnoldg@dt-login03 vectorAdd]$ apptainer run /sw/external/MI100/pytorch_rocm5.0_ubuntu18.04_py3.7_pytorch_1.10.0.sif Singularity> hipcc vectoradd_hip.cpp Singularity> which hipcc /opt/rocm/hip/bin/hipcc Singularity>

See also:

Software

Delta software is provisioned, when possible, using spack to produce modules for use via the lmod based module system. Select NVIDIA NGC containers are made available (see the container section below) and are periodically updated from the NVIDIA NGC site. An automated list of available software can be found on the ACCESS website.

Managing Your Environment (Modules)

Delta provides two sets of modules and a variety of compilers in each set. The default environment is modtree/gpu which loads a recent version of gnu compilers , the openmpi implementation of MPI, and cuda. The environment with gpu support will build binaries that run on both the gpu nodes (with cuda) and cpu nodes (potentially with warning messages because those nodes lack cuda drivers). For situations where the same version of software is to be deployed on both gpu and cpu nodes but with separate builds, the modtree/cpu environment provides the same default compiler and MPI but without cuda. Use module spider package_name to search for software in lmod and see the steps to load it for your environment.

Useful Modules commands

  1. module list command: (display the currently loaded modules)

1 2 3 4 $ module list Currently Loaded Modules: 1) gcc/11.2.0 3) openmpi/4.1.2 5) modtree/gpu 2) ucx/1.11.2 4) cuda/11.6.1
  1. module load package_name command: (loads a package or metamodule such as modtree/gpu or netcdf-c)

1 2 3 4 5 6 7 $ module load modtree/cpu Due to MODULEPATH changes, the following have been reloaded: 1) gcc/11.2.0 2) openmpi/4.1.2 3) ucx/1.11.2 The following have been reloaded with a version change: 1) modtree/gpu => modtree/cpu
  1. module spider package_name command: (finds modules and displays the ways to load them)

1 2 3 4 5 6 7 8 9 10 11 12 13 $ module spider openblas ---------------------------------------------------------------------------- openblas: openblas/0.3.20 --------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "openblas/0.3.20" module is available to load. aocc/3.2.0 gcc/11.2.0 Help: OpenBLAS: An optimized BLAS library
  1. module -r spider regular expression command:

1 2 3 4 5 6 7 8 $ module -r spider "^r$" ---------------------------------------------------------------------------- r: ---------------------------------------------------------------------------- Versions: r/4.1.3 ...

See also: User Guide for Lmod

Please open a service request ticket by sending email to help@ncsa.illinois.edu for help with software not currently installed on the Delta system. For single user or single project use cases the preference is for the user to use the spack software package manager to install software locally against the system spack installation as documented (docs pending). Delta support staff are available to provide limited assistance. For general installation requests the Delta project office will review requests for broad use and installation effort.

Python

On Delta, you may install your own python software stacks as needed. There are a couple choices when customizing your python setup. You may use any of these methods with any of the python versions or instances described below (or you may install your own python versions):

  • pip3: pip3 install --user <python_package>

    • useful when you need just 1 python environment per python version or instance

  • venv (python virtual environment)

    • can name environments (metadata) and have multiple environments per python version or instance

  • conda environments

    • similar to venv but with more flexibility; See this comparison (halfway down the page)

NGC containers for gpu nodes
The Nvidia NGC containers on Delta provide optimized python frameworks built for Delta's A100 and A40 gpus. Delta staff recommend using an NGC container when possible with the gpu nodes (or use the anaconda3_gpu module described later).

The default gcc (latest version) programming environment for either modtree/cpu or modtree/gpu contains the following:

Anaconda

anaconda3_cpu

Use python from the anaconda3_cpu module if you need some of the modules provided by Anaconda in your python workflow. See the "managing environments" section of the Conda getting started guide to learn how to customize Conda for your workflow and add extra python modules to your environment. We recommend starting with anaconda3_cpu for modtree/cpu and the cpu nodes, do not use this module with gpus, use anaconda3_gpu instead.

anaconda and containers
If you use anaconda with NGC containers, take care to use the python from the container and not the python from anaconda or one of its environments. The container's python should be 1st in $PATH. You may --bind the anaconda directory or other paths into the container so that you can start your conda environments, but with the container's python (/usr/bin/python).

1 2 3 4 5 6 7 8 9 10 $ module load modtree/cpu $ module load gcc anaconda3_cpu $ which conda /sw/external/python/anaconda3_cpu/conda $ module list Currently Loaded Modules: 1) cue-login-env/1.0 6) libfabric/1.14.0 11) ucx/1.11.2 2) default 7) lustre/2.14.0_ddn23 12) openmpi/4.1.2 3) gcc/11.2.0 8) openssh/8.0p1 13) modtree/cpu 4) knem/1.1.4 9) pmix/3.2.3 14) anaconda3_cpu/4.13.0 5) libevent/2.1.8 10) rdma-core/32.0

List of modules in anaconda3_cpu

The current list of modules available in anaconda3_cpu (including tensorflow, pytorch, etc), is shown via:

1 $ conda list

anaconda3_gpu

Similar to the setup for anaconda, we have a gpu version of anaconda3 (module load anaconda3_gpu) and have installed pytorch and tensorflow cuda-aware python modules into this version. You may use this module when working with the gpu nodes. See conda list after loading the module to review what is already installed. As with anaconda3_cpu, let Delta staff know if there are generally useful modules you would like us to try to install for the broader community.

A sample tensorflow test script:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 #!/bin/bash #SBATCH --mem=64g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=2 # <- match to OMP_NUM_THREADS #SBATCH --partition=gpuA100x4-interactive #SBATCH --time=00:10:00 #SBATCH --account=YOUR_ACCOUNT-delta-gpu #SBATCH --job-name=tf_anaconda ### GPU options ### #SBATCH --gpus-per-node=1 #SBATCH --gpus-per-task=1 #SBATCH --gpu-bind=verbose,per_task:1 ###SBATCH --gpu-bind=none # <- or closest module purge # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) module load anaconda3_gpu module list # job documentation and metadata echo "job is starting on `hostname`" which python3 conda list tensorflow srun python3 \ tf_gpu.py exit

Jupyter notebooks

The Delta Open OnDemand portal provides an easier way to start a Jupyter notebook. Please see OpenOnDemand to access the portal.

The Jupyter notebook executables are in your $PATH after loading the anaconda3 module. Don't run Jupyter on the shared login nodes. Instead, follow these steps to attach a Jupyter notebook running on a compute node to your local web browser:

Step 1. Start a Jupyter job via srun and note the hostname

(you pick the port number for --port).

1 2 3 4 5 $ srun --account=bbka-delta-cpu --partition=cpu-interactive \ --time=00:30:00 --mem=32g \ jupyter-notebook --no-browser \ --port=8991 --ip=0.0.0.0< ...

Or copy and paste one of these URLs:

1 http://cn093.delta.internal.ncsa.edu:8891/?token=e5b500e5aef67b1471ed1842b2676e0c0ae4b5652656feea

or

1 http://127.0.0.1:8991/?token=e5b500e5aef67b1471ed1842b2676e0c0ae4b5652656feea

Use the 2nd URL in Step 3. Note the internal hostname in the cluster for Step 2.

When using a container with a gpu node, run the container's jupyter-notebook:

NGC container for gpus, jupyter-notebook, bind a directory

1 2 3 4 5 6 7 8 9 10 # container notebook example showing how to access a directory outside # of $HOME ( /projects/bbka in the example ) $ srun --account=bbka-delta-gpu --partition=gpuA100x4-interactive \ --time=00:30:00 --mem=64g --gpus-per-node=1 \ singularity run --nv --bind /projects/bbka \ /sw/external/NGC/pytorch:22.02-py3 jupyter-notebook \ --notebook-dir /projects/bbka --no-browser --port=8991 --ip=0.0.0.0 ... http://hostname:8888/?token=73d96b99f2cfc4c3932a3433d1b8003c052081c5411795d5

In Step 3 to start the notebook in your browser, replace http://hostname:8888/ with http://127.0.0.1:8991/

You may not see the job hostname when running with a container, find it with squeue:

squeue -u $USER

1 2 3 $ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 156071 gpuA100x4 singular arnoldg R 1:00 1 gpua045

Then specify the host your job is using in the next step (gpua045 for example ).

Step 2. From your local desktop or laptop create an ssh tunnel to the compute node via a login node of delta.

ssh tunnel for jupyter

1 2 3 $ ssh -l my_delta_username \ -L 127.0.0.1:8991:cn093.delta.internal.ncsa.edu:8991 \ dt-login.delta.ncsa.illinois.edu

Authenticate with your login and 2-factor as usual.

Step 3. Paste the 2nd URL (containing 127.0.0.1:port_number and the token string) from Step 1 into your browser

You will be connected to the Jupyter instance running on your compute node of Delta.

Other Pythons (recent or latest versions)

If you do not need all of the extra modules provided by Anaconda, use the basic python installation under the gcc module. You can add modules via pip3 install --user &lt;modulename&gt;, setup virtual environments, and customize as needed for your workflow but starting from a smaller installed base of python than Anaconda.

1 2 3 4 5 6 7 8 $ module load gcc python $ which python /sw/spack/delta-2022-03/apps/python/3.10.4-gcc-11.2.0-3cjjp6w/bin/python $ module list Currently Loaded Modules: 1) modtree/gpu 3) gcc/11.2.0 5) ucx/1.11.2 7) python/3.10.4 2) default 4) cuda/11.6.1 6) openmpi/4.1.2

This is the list of modules available in the python from pip3 list:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Package Version ------------------ --------- certifi 2021.10.8 cffi 1.15.0 charset-normalizer 2.0.12 click 8.1.2 cryptography 36.0.2 globus-cli 3.4.0 globus-sdk 3.5.0 idna 3.3 jmespath 0.10.0 pip 22.0.4 pycparser 2.21 PyJWT 2.3.0 requests 2.27.1 setuptools 58.1.0 urllib3 1.26.9

Launching Applications

See the Sample Job Scripts section for examples of different execution configurations.

Running Jobs

Job Accounting

The charge unit for Delta is the Service Unit (SU). This corresponds to the equivalent use of one compute core utilizing less than or equal to 2G of memory for one hour, or 1 GPU or fractional GPU using less than the corresponding amount of memory or cores for 1 hour (see table below). Keep in mind that your charges are based on the resources that are reserved for your job and don't necessarily reflect how the resources are used. Charges are based on either the number of cores or the fraction of the memory requested, whichever is larger. The minimum charge for any job is 1 SU.

NODE TYPE

SERVICE UNIT EQUIVALENCE

CORES

GPU FRACTION

HOST MEMORY

CPU NODE

1

N/A

2 GB

GPU NODE

Quad A100

2

1/7 A100

8 GB

Quad A40

16

1 A40

64 GB

8-way A100

2

1/7 A100

32 GB

8-way MI100

16

1 MI100

256 GB

Please note that a weighting factor will discount the charge for the reduced-precision A40 nodes, as well as the novel AMD MI100 based node - this will be documented through the ACCESS SU converter.

Local Account Charging

Use the accounts command to list the accounts available for charging. CPU and GPU resources will have individual charge names. For example in the following, abcd-delta-cpu and abcd-delta-gpu are available for user gbauer to use for the CPU and GPU resources.

1 2 3 4 $ accounts available Slurm accounts for user gbauer: abcd-delta-cpu my_prefix my project abcd-delta-gpu my_prefix my project

Job Accounting Considerations

  • A node-exclusive job that runs on a compute node for one hour will be charged 128 SUs (128 cores x 1 hour)

  • A node-exclusive job that runs on a 4-way GPU node for one hour will be charge 4 SUs (4 GPU x 1 hour)

  • A node-exclusive job that runs on a 8-way GPU node for one hour will be charge 8 SUs (8 GPU x 1 hour)

  • A shared job that runs on an A100 node will be charged for the fractional usage of the A100 (eg, using 1/7 of an A100 for one hour will be 1/7 GPU x 1 hour, or 1/7 SU per hour, except the first hour will be 1 SU (minimum job charge).

Accessing the Compute Nodes

Delta implements the Slurm batch environment to manage access to the compute nodes. Use the Slurm commands to run batch jobs or for interactive access to compute nodes. See an introduction to Slurm. There are two ways to access compute nodes on Delta.

Batch jobs can be used to access compute nodes. Slurm provides a convenient direct way to submit batch jobs. See Heterogeneous Job Support in the Slurm guide.

Sample Slurm batch job scripts are provided in the Job Scripts section below.

Direct ssh access to a compute node in a running batch job from a dt-loginNN node is enabled, once the job has started.

1 2 3 $ squeue --job jobid JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 12345 cpu bash gbauer R 0:17 1 cn001

Then in a terminal session:

1 2 3 4 5 $ ssh cn001 cn001.delta.internal.ncsa.edu (172.28.22.64) OS: RedHat 8.4 HW: HPE CPU: 128x RAM: 252 GB Site: mgmt Role: compute $

Scheduler

For information, consult: Slurm Quick Start User Guide and the slurm quick reference guide [PDF].

Partitions (Queues)

PARTITION/QUEUE

NODE TYPE

MAX NODES PER JOB

MAX DURATION

MAX RUNNING IN QUEUE/USER*

CHARGE FACTOR

PARTITION/QUEUE

NODE TYPE

MAX NODES PER JOB

MAX DURATION

MAX RUNNING IN QUEUE/USER*

CHARGE FACTOR

cpu

CPU

TBD

24 hr / 48 hr

8448 cores

1.0

cpu-interactive

CPU

TBD

30 min

in total

2.0

gpuA100x4*
(asterisk indicates this is the default queue, but submit jobs to gpuA100x4)

quad-A100

TBD

24 hr / 48 hr

3200 cores and 200 gpus

1.0

gpuA100x4-interactive

quad-A100

TBD

30 min

in total

2.0

gpuA100x8

octa-A100

TBD

24 hr / 48 hr

TBD

2.0

gpuA100x8-interactive

octa-A100

TBD

30 min

TBD

4.0

gpuA40x4

quad-A40

TBD

24 hr / 48 hr

3200 cores and 200 gpus

0.6

gpuA40x4-interactive

quad-A40

TBD

30 min

in total

1.2

gpuMI100x8

octa-MI100

TBD

24 hr / 48 hr

TBD

1.5

gpuMI100x8-interactive

octa-MI100

TBD

30 min

TBD

3.0

Table 9. Delta Early Access Period Production Partitions/Queues

sview view of slurm partitions

Node Policies

Node-sharing is the default for jobs. Node-exclusive mode can be obtained by specifying all the consumable resources for that node type or adding the following Slurm options:

1 --exclusive --mem=0

GPU NVIDIA MIG (GPU slicing) for the A100 will be supported at a future date.

Pre-emptive jobs will be supported at a future date.

Interactive Sessions

Interactive sessions can be implemented in several ways depending on what is needed.

To start up a bash shell terminal on a cpu or gpu node

  • single core with 1GB of memory, with one task on a cpu node

1 2 3 4 srun --account=account_name --partition=cpu-interactive \ --nodes=1 --tasks=1 --tasks-per-node=1 \ --cpus-per-task=1 --mem=16g \ --pty bash
  • single core with 20GB of memory, with one task on a A40 gpu node

1 2 3 4 srun --account=account_name --partition=gpuA40x4-interactive \ --nodes=1 --gpus-per-node=1 --tasks=1 \ --tasks-per-node=1 --cpus-per-task=1 --mem=20g \ --pty bash

interactive jobs: a case for mpirun
Since interactive jobs are already a child process of srun, one cannot srun applications from within them. Use mpirun to launch mpi jobs from within an interactive job. Within standard batch jobs submitted via sbatch, use srun to launch MPI codes.

Interactive X11 Support

To run an X11 based application on a compute node in an interactive session, the use of the --x11 switch with srun is needed. For example, to run a single core job that uses 1g of memory with X11 (in this case an xterm) do the following:

1 2 3 4 srun -A abcd-delta-cpu --partition=cpu-interactive \ --nodes=1 --tasks=1 --tasks-per-node=1 \ --cpus-per-task=1 --mem=16g \ --x11 xterm

Sample Job Scripts

Serial Example Script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 $ cat job.slurm #!/bin/bash #SBATCH --mem=16g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 # match to OMP_NUM_THREADS #SBATCH --partition=cpu # or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=account_name #SBATCH --job-name=myjobtest #SBATCH --time=00:10:00 # hh:mm:ss for the job ### GPU options ### ##SBATCH --gpus-per-node=2 ##SBATCH --gpu-bind=none # or closest ##SBATCH --mail-user=you@yourinstitution.edu ##SBATCH --mail-type="BEGIN,END" See sbatch or srun man pages for more email options module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module load python # ... or any appropriate modules module list # job documentation and metadata echo "job is starting on `hostname`" srun python3 myprog.py

MPI Example Script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 #!/bin/bash #SBATCH --mem=16g #SBATCH --nodes=2 #SBATCH --ntasks-per-node=32 #SBATCH --cpus-per-task=1 # match to OMP_NUM_THREADS #SBATCH --partition=cpu # or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=account_name #SBATCH --job-name=mympi #SBATCH --time=00:10:00 # hh:mm:ss for the job ### GPU options ### ##SBATCH --gpus-per-node=2 ##SBATCH --gpu-bind=none # or closest ##SBATCH --mail-user=you@yourinstitution.edu ##SBATCH --mail-type="BEGIN,END" See sbatch or srun man pages for more email options module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module load gcc/11.2.0 openmpi # ... or any appropriate modules module list # job documentation and metadata echo "job is starting on `hostname`" srun osu_reduce

OpenMP Example Script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #!/bin/bash #SBATCH --mem=16g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=32 # <- match to OMP_NUM_THREADS #SBATCH --partition=cpu # <- or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=account_name #SBATCH --job-name=myopenmp #SBATCH --time=00:10:00 # hh:mm:ss for the job ### GPU options ### ##SBATCH --gpus-per-node=2 ##SBATCH --gpu-bind=none # <- or closest ##SBATCH --mail-user=you@yourinstitution.edu ##SBATCH --mail-type="BEGIN,END" See sbatch or srun man pages for more email options module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module load gcc/11.2.0 # ... or any appropriate modules module list # job documentation and metadata echo "job is starting on `hostname`" export OMP_NUM_THREADS=32 srun stream_gcc

Hybrid (MPI + OpenMP) Example Script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #!/bin/bash #SBATCH --mem=16g #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=4 # <- match to OMP_NUM_THREADS #SBATCH --partition=cpu # <- or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=account_name #SBATCH --job-name=mympi+x #SBATCH --time=00:10:00 # hh:mm:ss for the job ### GPU options ### ##SBATCH --gpus-per-node=2 ##SBATCH --gpu-bind=none # <- or closest ##SBATCH --mail-user=you@yourinstitution.edu ##SBATCH --mail-type="BEGIN,END" See sbatch or srun man pages for more email options module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module load gcc/11.2.0 openmpi # ... or any appropriate modules module list # job documentation and metadata echo "job is starting on `hostname`" export OMP_NUM_THREADS=4 srun xthi

Parametric / Array / HTC jobs Example Script (docs pending)

Job Management

Batch jobs are submitted through a job script using the sbatch command. Job scripts generally start with a series of SLURM directives that describe requirements of the job such as number of nodes and wall time required to the batch system/scheduler (SLURM directives can also be specified as options on the sbatch command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.

The syntax for sbatch is:

sbatch [options] <script_name>

Refer to the man page for options.

squeue/scontrol/sinfo

Commands that display batch job and partition information.

SLURM EXAMPLE COMMAND

DESCRIPTION

SLURM EXAMPLE COMMAND

DESCRIPTION

squeue -a

List the status of all jobs on the system.

squeue -u $USER

List the status of all your jobs in the batch system.

squeue -j jobID

List nodes allocated to a running job in addition to basic information.

scontrol show job jobID

List detailed information on a particular job.

sinfo -a

List summary information on all the partition.

Job Status

NODELIST(REASON)

MaxGRESPerAccount - a user has exceeded the number of cores or gpus allotted per user or project for a given partition.

Useful Batch Job Environment Variables

See the sbatch man page for additional environment variables available.

DESCRIPTION

SLURM ENVIRONMENT VARIABLE

DETAIL DESCRIPTION

DESCRIPTION

SLURM ENVIRONMENT VARIABLE

DETAIL DESCRIPTION

JobID

$SLURM_JOB_ID

Job identifier assigned to the job

Job Submission Directory

$SLURM_SUBMIT_DIR

By default, jobs start in the directory that the job was submitted from. So the "cd $SLURM_SUBMIT_DIR" command is not needed.

Machine(node) list

$SLURM_NODELIST

variable name that contains the list of nodes assigned to the batch job

Array JobID

$SLURM_ARRAY_JOB_ID
$SLURM_ARRAY_TASK_ID

each member of a job array is assigned a unique identifier

srun

The srun command initiates an interactive job on the compute nodes.

For example, the following command:

1 srun -A account_name --time=00:30:00 --nodes=1 --ntasks-per-node=64 --mem=16g --pty /bin/bash

will run an interactive job in the default queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. You can also use other sbatch options such as those documented above.

After you enter the command, you will have to wait for SLURM to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:

1 srun: job 123456 queued and waiting for resources

Once the job starts, you will see:

1 srun: job 123456 has been allocated resources

and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.

When you are done with your work, you can use the exit command to end the job.

scancel

The scancel command deletes a queued job or terminates a running job.

  • scancel JobID deletes/terminates a job.

Refunds

Refunds are considered, when appropriate, for jobs that failed due to circumstances beyond user control.

Projects wishing to request a refund should email help@ncsa.illinois.edu. Please include the batch job ids and the standard error and output files produced by the job(s).

Visualization (docs pending)

Delta A40 nodes support NVIDIA raytracing hardware. This section will describe:

  • visualization capabilities & software

  • how to establish VNC/DVC remote desktop

Containers

Apptainer (formerly Singularity)

Container support on Delta is provided by Apptainer/Singularity.

Docker images can be converted to Singularity sif format via the singularity pull command. Commands can be run from within a container using singularity run command (or apptainer run).

If you encounter quota issues with Apptainer/Singularity caching in ~/.singularity , the environment variable SINGULARITY_CACHEDIR can be set to use a different location such as a scratch space.

Your $HOME is automatically available from containers run via Apptainer/Singularity. You can run pip3 install --user against a container's python, setup virtualenv's or similar while useing a containerized application. Just run the container's /bin/bash to get a Singularity prompt. Here's an srun example of that with tensorflow:

srun the bash from a container to interact with programs inside it

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ srun \ --mem=32g \ --nodes=1 \ --ntasks-per-node=1 \ --cpus-per-task=1 \ --partition=gpuA100x4-interactive \ --account=bbka-delta-gpu \ --gpus-per-node=1 \ --gpus-per-task=1 \ --gpu-bind=verbose,per_task:1 \ --pty \ apptainer run --nv \ /sw/external/NGC/tensorflow:22.06-tf2-py3 /bin/bash # job starts ... Singularity> hostname gpua068.delta.internal.ncsa.edu Singularity> which python # the python in the container /usr/bin/python Singularity> python --version Python 3.8.10 Singularity>

NVIDIA NGC Containers

Delta provides NVIDIA NGC Docker containers that we have pre-built with Singularity. Look for the latest binary containers in /sw/external/NGC. The containers are used as shown in the sample scripts below:

PyTorch example script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #!/bin/bash #SBATCH --mem=64g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=64 # match to OMP_NUM_THREADS, 64 requests whole node #SBATCH --partition=gpuA100x4 # one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=bbka-delta-gpu #SBATCH --job-name=pytorchNGC ### GPU options ### #SBATCH --gpus-per-node=1 #SBATCH --gpus-per-task=1 #SBATCH --gpu-bind=verbose,per_task:1 module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module list # job documentation and metadata echo "job is starting on `hostname`" # run the container binary with arguments: python3 program.py apptainer run --nv \ /sw/external/NGC/pytorch:22.02-py3 python3 tensor_gpu.py

Tensorflow example script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 #!/bin/bash #SBATCH --mem=64g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=64 # match to OMP_NUM_THREADS #SBATCH --partition=gpuA100x4 # one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8 #SBATCH --account=bbka-delta-gpu #SBATCH --job-name=tfNGC ### GPU options ### #SBATCH --gpus-per-node=1 #SBATCH --gpus-per-task=1 #SBATCH --gpu-bind=verbose,per_task:1 module reset # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) # $WORK and $SCRATCH are now set module list # job documentation and metadata echo "job is starting on `hostname`" # run the container binary with arguments: python3 program.py singularity run --nv \ /sw/external/NGC/tensorflow:22.06-tf2-py3 python3 \ tf_matmul.py

Container list (as of March 2022)

catalog.txt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 caffe:20.03-py3 caffe2:18.08-py3 cntk:18.08-py3 , Microsoft Cognitive Toolkit digits:21.09-tensorflow-py3 lammps:patch_4May2022 matlab:r2021b mxnet:21.09-py3 namd_3.0-alpha11.sif paraview_egl-py3-5.9.0.sif # /opt/paraview/* pytorch:22.02-py3 tensorflow:22.06-tf1-py3 tensorflow:22.06-tf2-py3 tensorrt:22.02-py3 theano:18.08 torch:18.08-py2

AMD Infinity Hub containers for MI100

The AMD node in partition gpuMI100x8 (-interactive) will run containers from the AMD Infinity Hub. The Delta team has pre-loaded the following containers in /sw/external/MI100 and will retrieve others upon request.

AMD MI100 containers in /sw/external/MI100

1 2 3 4 5 6 7 8 9 cp2k_8.2.sif gromacs_2021.1.sif lammps_2021.5.14_121.sif milc_c30ed15e1-20210420.sif namd_2.15a2-20211101.sif namd3_3.0a9.sif openmm_7.7.0_49.sif pytorch_rocm5.0_ubuntu18.04_py3.7_pytorch_1.10.0.sif tensorflow_rocm5.0-tf2.7-dev.sif

MI100 sample pytorch script

A sample batch script for pytorch resembles:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 #!/bin/bash #SBATCH --mem=64g #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --partition=gpuMI100x8 #SBATCH --account=bbka-delta-gpu #SBATCH --job-name=tfAMD #SBATCH --reservation=amd #SBATCH --time=00:15:00 ### GPU options ### #SBATCH --gpus-per-node=1 ##SBATCH --gpus-per-task=1 ##SBATCH --gpu-bind=none # or closest module purge # drop modules and explicitly load the ones needed # (good job metadata and reproducibility) module list # job documentation and metadata echo "job is starting on `hostname`" # https://apptainer.org/docs/user/1.0/gpu.html#amd-gpus-rocm # https://pytorch.org/docs/stable/notes/hip.html time \ apptainer run --rocm \ ~arnoldg/delta/AMD/pytorch_rocm5.0_ubuntu18.04_py3.7_pytorch_1.10.0.sif \ python3 tensor_gpu.py exit

Other Containers

Extreme-scale Scientific Software Stack (E4S)

The E4S container with GPU (cuda and rocm) support is provided for users of specific ECP packages made available by the E4S project. The singularity image is available as:

1 /sw/external/E4S/e4s-gpu-x86_64.sif

To use E4S with NVIDIA GPUs:

1 2 3 4 5 6 $ srun --account=account_name --partition=gpuA100-interactive \ --nodes=1 --gpus-per-node=1 --tasks=1 --tasks-per-node=1 \ --cpus-per-task=1 --mem=20g \ --pty bash $ singularity exec --cleanenv /sw/external/E4S/e4s-gpu-x86_64.sif \ /bin/bash --rcfile /etc/bash.bashrc

The spack package inside of the image will interact with a local spack installation. If ~/.spack directory exists, it might need to be renamed.

More information can be found at Acquiring E4S Containers.

Science Gateway and Open OnDemand

Open OnDemand

The Delta Open OnDemand portal is now available for use. Current supported Interactive apps: Jupyter notebooks.

To connect to the Open OnDemand portal, access NCSA OnDemand with your NCSA username and password with NCSA Duo with the CILogin page.

Science Gateways

The Delta Science Gateway is currently under development, and opportunities for Delta Gateway Allocations will be announced soon. Send email to help@ncsa.illinois.edu for questions.

Help

For assistance with the use of Delta:

  • ACCESS users can create a ticket via the Help Desk

  • All other users (Illinois allocations, Diversity Allocations, etc) please send email to help@ncsa.illinois.edu.

Acknowledge

To acknowledge the NCSA Delta system in particular, please include the following:

This research is part of the Delta research computing project, which is supported by the National Science Foundation (award OCI 2005572) and the State of Illinois. Delta is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.

To include acknowledgement of ACCESS contributions to a publication or presentation please see:

  • How to Acknowledge ACCESS

  • Acknowledgement for ACCESS Users

References

Supporting documentation resources:

Purdue Anvil User Guide

Stanford SLURM Guide