KyRIC - Kentucky
If you encounter issues while using the KyRIC cluster, please submit tickets through the ACCESS portal with information detailing your problems.
Introduction
The KyRIC cluster has large memory nodes that are increasingly needed by a wide range of ACCESS researchers, particularly researchers working with big data. Each KyRIC node in the cluster has large 6TB SSD drives that are suitable to perform analytics on big data along with a traditional NFS mounted scratch.
The system is well suited for running computations in high-throughput genome sequencing, natural language processing of large datasets, and data scientists working on massive data graphs and big data analytics. Note that the cluster’s networking backend limits the cluster to accommodate only single-node jobs (not multi-node/parallel jobs).
Innovative Components: Large memory nodes with local SSD drives and NFS-mounted scratch.
Award Number: NSF MRI infrastructure award (ACI-1626364)
ACCESS hostname: kxc.ccs.uky.edu
Allocation Information
As an ACCESS computing resource, KyRIC is accessible to ACCESS users who are given time on the system. To obtain an account, users may submit a proposal through the ACCESS Allocation Request System (XRAS) or request a Trial Account. Interested parties may contact ACCESS User Support for help with a KyRIC proposal.
System Architecture
The KyRIC cluster consists of two subsystems: a 5 nodes cluster, each with 4 10-core processors, 3TB RAM, and a 6TB SSD array; Each of these nodes have 40 cores (Broadwell class and lntel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket). These 5 dedicated ACCESS nodes will have exclusive access to approximately 300 TB of network attached disk storage. All these compute nodes are interconnected through a 100 Gigabit Ethernet (l00GbE) backbone, and the cluster login and data transfer nodes are connected through a 100Gb uplink to internet2 for external connections. Due to the use of the 100GbE network, this cluster is for single node jobs only and is not recommended for multi-node jobs, such as those using MPI.
Compute Nodes
These nodes are where jobs are actually executed after being submitted via the user-facing login nodes.
Table 1. Compute Node Specifications
MODEL | PowerEdge R930 |
---|---|
NUMBER OF NODES | 5 |
TOTAL CORES PER NODE | 40 cores |
THREADS PER CORE | 2 |
THREADS PER NODE | 80 |
CLOCK RATE | 2.00 |
RAM | 3TB |
LOCAL STORAGE | 6TB (SSD) |
EXTENDED STORAGE | 300 TB (NFS-mounted) |
Login Nodes
The login node is what users will directly access in order to submit jobs that will get forwarded to and executed in the compute nodes.
Table 2. Login Node Specifications
MODEL | Virtual Machines hosted in bare metal server |
---|---|
NUMBER OF NODES | 2 |
TOTAL CORES PER NODE | 4 |
THREADS PER CORE | 2 |
THREADS PER NODE | 8 |
CLOCK RATE | 2.00 |
RAM | 16GB |
EXTENDED STORAGE | 300 TB (NFS-mounted) |
Data Transfer Node
This node facilitates the transfer of data in and out of the KyRIC system. Users will log in to this node with the same credentials as for the login nodes. Also, Globus endpoints are available only on this node for parallel transfers.
Table 3. Data Node Specification
MODEL | Virtual Machines hosted in bare metal server |
---|---|
NUMBER OF NODES | 1 |
TOTAL CORES PER NODE | 8 |
THREADS PER CORE | 2 |
THREADS PER NODE | 16 |
CLOCK RATE | 2.00 |
RAM | 32GB |
EXTENDED STORAGE | 300 TB (NFS-mounted) |
Network
All nodes are interconnected through a 100 Gigabit Ethernet (l00GbE) backbone, and the cluster login and data transfer nodes will be connected through a 100Gb uplink to internet2 for external connections.
FILE SYSTEM | QUOTA | FILE RETENTION |
---|---|---|
| 10GB | No file deletion policy applied on this partition |
| 500GB | No file deletion policy applied on this partition |
| 10TB | 30-day file deletion policy |
Accessing the System
The login node for the cluster is kxc.ccs.uky.edu
; authentication is accomplished via SSH keys. Users must generate and install their own SSH keys. For help with either of these, see the Generating SSH Keys and Uploading Your Public Key pages.
If all else fails, you may ‘upload’ your public key by contacting us. Please send an email to help-hpc@hpc.uky.edu with the subject line that begins with "[KXC] (your_XSEDE/ACCESS username)".
After the key is uploaded, you should be able to connect to the KyRIC system using an SSH client. For example, from a computer running a Linux, MacOS, Windows Powershell, or Windows Subsystem for Linux, you may connect to KyRIC by opening a Terminal and entering:
ssh -i path_to_private_key yourUserName@kxc.ccs.uky.edu
Third-party SSH clients that provide a GUI (e.g., Bitvise, MobaXterm, PuTTY) may also be used to connect to KyRIC.
Do not use the login nodes for computationally intensive processes. These nodes are meant for compilation, file editing, simple data analysis, and other tasks that use minimal compute resources. All computationally demanding jobs should be submitted and run through the batch queuing system.
Cluster Web GUI Access
You can log in to the cluster through a web GUI at https://kxc-ood.ccs.uky.edu, where you'll be prompted to log in using your ACCESS credentials. The web GUI allows you to navigate the cluster through a terminal as well as through a virtual desktop console in a web page. This method does not require any preparatory steps other than having an existing ACCESS account and an active project allocation.
Computing Environment
Modules
The Environment Modules package provides for dynamic modification of your shell environment. "module
" commands set, change, or delete environment variables, typically in support of an application. They also let the user choose between different versions of the same software or different combinations of related codes. Several modules that determine the default KyRIC environment are loaded at login time.
Citizenship
You share KyRIC with other users, and what you do on the system affects others. Exercise good citizenship to ensure that your activity does not adversely impact the system and the research community with whom you share it. Here are some rules of thumb:
Don't run jobs on the login nodes.
Don't stress the filesystem.
Do use the debug partition to test out your job submission script.
Do submit an informative help-desk ticket.
Managing Files
No user data is backed up. Users are responsible for their own backups.
Each project and user is given a scratch space and home space. A good practice is to write your job's output into your scratch space. All compute nodes also have a local 5 TB SSD disk attached to it, but this local temporary space is shared among all jobs running on a single node and will be cleaned up (deleted) upon job completion.
Transferring your Files
KyRIC nodes support the following file transfer protocols.
scp
( if you have your SSH keys setup)rsync
(if you have your SSH keys setup)Globus (Collection name = ACCESS_KXC_Collection)
Users are encouraged to transfer data using rclone
, scp
, globus
, etc. through the high-speed data transfer node (DTN) and not through the login nodes.
Building Software
Singularity containers are supported. Building a Singularity container requires root access outside of the cluster. If you have a Singularity container ready, you can copy it into the cluster and run your jobs. Most of the software will be provided through singularity containers. Standard GNU and Intel compilers will be provided.
Software
Discover installed software by running "module avail
".
Running Jobs
Job Accounting
KyRIC allocations are made in core-hours. The recommended method for estimating your resource needs for an allocation request is to perform benchmark runs. The core-hours used for a job are calculated by multiplying the number of processor cores used by the wall-clock duration in hours. KyRIC core-hour calculations should assume that all jobs will run in the regular queue and that they are charged for use of all 40 cores on each node.
The Slurm scheduler tracks and charges for usage to a granularity of a few seconds of wall clock time. The system charges only for the resources you use, not those you request. If your job finishes early and exits properly, Slurm will release the node back into the pool of available nodes. Your job will only be charged for as long as you are using the node.
Job Scheduler
KyRIC uses the Simple Linux Utility for Resource Management (SLURM) batch environment. When you run in batch mode, you submit jobs to be run on the compute nodes using the "sbatch
" command as described below. Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes.
The user must create a Slurm submission job script ("jobscript
") and the job can be executed by submitting a job to the queues:
login$ sbatch jobscript
Table 5. Common Slurm Commands
Command | Description |
| Submit SLURM job script |
| Cancel job that has job_id |
| Show jobs that are on queue for user_id |
| Show partitions/queues, their time limits, number of nodes, and which compute nodes are running jobs or idle. |
Table 6. sbatch
Options
PROPERTY & DESCRIPTION | SYNTAX | EXAMPLE USE |
---|---|---|
Job name |
|
|
Partition/queue |
|
|
Time limit |
|
|
Memory (RAM) |
|
|
Project account |
|
|
Standard output filename |
|
|
Number of nodes and cores |
|
|
Partitions (Queues)
Table 7. KyRIC Production Queues
QUEUE NAME | NODE TYPE | MAX NODES PER JOB | MAX DURATION | MAX JOBS IN QUEUE* | CHARGE RATE |
---|---|---|---|---|---|
normal | compute | 1 node | 72 hrs. | 5* | 1 SU |
Interactive Sessions
You can also login to the compute node and run the jobs interactively if only the node is allocated to you.
Sample Job Script
#!/bin/bash
#SBATCH --time=00:15:00 # Max run time
#SBATCH --job-name=my_test_job # Job name
#SBATCH --ntasks=1 # Number of cores for the job. Same as SBATCH -n 1
#SBATCH --partition=normal # Specify partition/queue
#SBATCH -e slurm-%j.err # Error file for this job.
#SBATCH -o slurm-%j.out # Output file for this job.
#SBATCH -A <your project account> # Project allocation account name (REQUIRED)
./myprogram # This is the program that will be executed on the compute node. You will substitute this with your scientific application.
Help
Please submit tickets through the ACCESS portal with information detailing your problems.