Preliminary access essentials

Disclaimer

Helios is still under development, and despite our best efforts, Helios might experience unscheduled outages or even data loss.

Content

Support

Please contact the PLGrid Helpdesk: https://helpdesk.plgrid.pl/ regarding any difficulties in using the cluster.

For important information and announcements, please follow this page and the messages displayed in the login message.

Access to Helios

We strongly suggest using SSH keys to access the machine! SSH key management can be done through the PLGrid portal. Password access will be disabled in the near future.

Computing resources on Helios are assigned based on PLGrid computing grants. To perform computations on Helios, you must obtain a computing grant through the PLGrid Portal (https://portal.plgrid.pl/) and apply for Helios access.

If your grant is active, and you have applied for the service access, the request should be accepted in about half an hour. Please report any issues through the helpdesk.

Machine description

Available login nodes:

ssh <login>@helios.cyfronet.pl

Note that Helios uses PLGrid accounts and grants. Make sure to request the "Helios access" access service in the PLGrid portal.

The node-job-exclusive policy will be retracted at the end of February 2025. Starting on the next day, Helios will be using the user-exclusive policy, where a single node can be used by multiple jobs submitted by a single user. Each job will be assigned exact resources as requested.

Helios is using the node job-exclusive policy. This means that nodes are allocated for a dedicated, single job which is using the resources. This also impacts the accounting where the minimum amount of resources used equals to one node.

Helios is a hybrid cluster. CPU nodes use x86_64 CPUs, while the GPU partition is based on GH200 superchips, which include an Nvidia Grace - ARM CPU and Nvidia Hopper GPU. HPE Slingshot is used as an interconnect. The login01 node uses an x86_64 CPU and RHEL 8. Please keep this in mind when compiling software, etc. Knowing the destination CPU architecture and operating system is important for selecting the proper modules and software. Each architecture has its own set of modules, in order to see the complete list of modules you need to run module avail on a node of a chosen type. Node specification can be found below:

Partition

Number of nodes

Operating system

CPU

RAM

RAM available

for job allocations

Default RAM per CPU

Proportional

RAM for one CPU

Proportional

RAM for one GPU

Proportional

CPU for one GPU

Accelerator

plgrid (includes plgrid-long)

272

RHEL 8

192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz
(a total of 8 NUMA nodes, each with 24 cores; SMT/hyperthreading disabled)

384GB

384000MB

1536MB

2000MB

n/a

plgrid-bigmem

160

RHEL 8

192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz
(a total of 8 NUMA nodes, each with 24 cores; SMT/hyperthreading disabled)

768GB

768000MB

3072MB

4000MB

n/a

plgrid-gpu-gh200

110

CrayOS

(SLES 15sp5)

288 cores, aarch64, 4x NVIDIA Grace CPU 72-Core @ 3.1 GHz
(a total of 4 NUMA nodes, each with 72 cores; no SMT/hyperthreading)

480GB

489600MB

1536MB

n/a

120GB

72

4x NVIDIA GH200 96GB

Note that Helios will soon be upgraded to RHEL 9. This change will be applied to all CPU and GPU nodes.

Job submission

Helios is using Slurm resource manager, jobs should be submitted to the following partitions:

Name	Timelimit	Resource type (account suffix)	Access requirements	Description
plgrid	72h	-cpu	Generally available.	Standard partition.
~~plgrid-now~~	~~12h~~	~~-cpu~~	~~Generally available.~~	~~The highest priority, interactive jobs, limited to 1 running or queued job.~~
plgrid-long	168h	-cpu	Requires a grant with a maximum job runtime of 168h.	Used for jobs with extended runtime.
plgrid-bigmem	72h	-cpu-bigmem	Requires a grant with CPU-BIGMEM resources.	Resources used for jobs requiring an extended amount of memory.
plgrid-gpu-gh200	48h	-gpu-gh200	Requires a grant with GPGPU resources.	GPU partition.

If you are unsure of how to properly configure your job on Helios please consult this guide: Job configuration

Accounts and computing grants

Helios uses a new naming scheme for CPU and GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:

Resource	account name
CPU	grantname-cpu
CPU bigmem nodes	grantname-cpu-bigmem
GPU	grantname-gpu-gh200

Please mind that sbatch -A grantname won't work on its own. You need to add the -cpu, -cpu-bigmem, or -gpu-gh200 suffix! Available computing grants, with respective account names (allocations), can be viewed using the hpc-grants command.

Resource allocated on Helios doesn't use normalization, which was used on Prometheus and previous clusters. 1 hour of CPU time equals 1 hour spent on a computing core with a proportional amount of memory (consult the table above). The billing system accounts for jobs with more memory than the proportional amount. If the job uses more memory for each allocated CPU than the proportional amount, it will be billed as it would have used more CPUs. The billed amount can be calculated by dividing the used memory by the proportional memory per core and rounding the result to the closest and larger integer. Jobs on CPU partitions are always billed in CPU hours.

The same principle was applied to GPU resources, where the GPU-hour is a billing unit, and there are proportional memory per GPU and proportional CPUs per GPU defined (consult the table above).

The cost can be expressed as a simple algorithm:

cost_cpu    = job_cpus_used * job_duration
cost_memory = ceil(job_memory_used/memory_per_cpu) * job_duration
final_cost  = max(cost_cpu, cost_memory)

and for GPUs, where a GPU has the respective amount of memory per GPU and CPUs per GPU, respectively:

cost_gpu    = job_gpus_used * job_duration
cost_cpu    = ceil(job_cpus_used/cpus_per_gpu) * job_duration
cost_memory = ceil(job_memory_used/memory_per_gpu) * job_duration
final_cost  = max(cost_gpu, cost_cpu, cost_memory)

Storage

Available storage spaces are described in the following table:

Location	Location in the filesystem	Purpose
$HOME	/net/home/plgrid/<login>	Storing own applications, and configuration files. Limited to 10GB.
$SCRATCH	/net/scratch/hscra/plgrid/<login>	High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable.
$PLG_GROUPS_STORAGE/<group name>	/net/storage/pr3/plgrid/<group name>	Long-term storage for data living for the period of computing grant. Should be used for storing significant amounts of data.

Current usage, capacity and other storage attributes can be checked by issuing the hpc-fs command.

System Utilities

Please use the following commands to interact with the account and storage management system:

hpc-grants - shows available grants, resource allocations, consumed resourced
hpc-fs - shows available storage
hpc-jobs - shows currently pending/running jobs
hpc-jobs-history - shows information about past jobs
hpc-modules - list or search modules in module hierarchy

Sample job scripts

Example job scripts (without -l option) are available on page: Sample scripts.

Add "-l" option!

Bash option -l is crucial for running jobs on Helios, especially on plgrid-gpu-gh200 partition. Please use the following shebang in the first line of your scripts:

#!/bin/bash -l

Software

Applications and libraries are available through the modules system (lmod). When looking for software, please keep in mind the following points:

modules for ARM and x86 CPUs are not interchangeable – selecting the right module for the destined architecture is critical for getting software to work
load the proper modules inside of the job script – you should not rely on loading modules on the login node before submitting a job!
not all modules are visible on the login node, some modules are available only on x86 nodes and some only on gh200 nodes
modules related to MPI/distributed software cannot be loaded on the login node

Hierarchical structure

The modules on the Helios supercomputer are organized hierarchically. This means that in order to load a module, its main dependencies (like compiler and MPI) must be loaded first.

Our tool hpc-modules can help you in exploring the hierarchy.

Working with modules

The list of available modules can be obtained by issuing the command:

module spider

the list is searchable by using the '/' key,
to get a full list of modules available on the given architecture (node type) run this command on a compute node!
more information can be found in Lmod documentation - using the module spider command.

A specific module can be loaded by the add/load command:

module load GCC/13.2.0 OpenMPI/5.0.3

note that modules' names on Helios are case sensitive!

The environment can be purged by:

module purge

Note that due to hierarchical structure, the command "module avail" will not show all modules available on the cluster, but only those that can be loaded at the moment.

Check User Guide for Lmod to learn more about the available commands.

In addition to standard "module" commands, we provide an extra tool for listing module hierarchy and searching modules:

hpc-modules x86      # list modules prepared for standard CPUs (x86_64)
hpc-modules gh200    # list modules prepared for gh200 superchips (aarch64)

Below we provide an example on how to use "module spider" to search for a specific software and discover which modules in the hierarchy are required in order to load it.

First we will try to use "module avail NAME" command:

$ module avail ORCA
No module(s) or extension(s) found!
Additional ways to search for software:
* Use "module spider" to find all possible modules and extensions.
* Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
See the PLGrid documentation at https://guide.plgrid.pl/ for more information on searching modules.
If then you still miss software, contact PLGrid HelpDesk via https://helpdesk.plgrid.pl/.

Following the above suggestion, now we will use "module spider NAME" command:

$ module spider ORCA
----------------------------------------------------------------------------------------
  ORCA:
----------------------------------------------------------------------------------------
(...)

     Versions:
        ORCA/5.0.4
        ORCA/6.0.0

----------------------------------------------------------------------------------------
  For detailed information about a specific "ORCA" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider ORCA/6.0.0

Next, we will use the "module spider NAME/version" command to discover details about loading the specified version:

$ module spider ORCA/6.0.0

----------------------------------------------------------------------------------------
  ORCA: ORCA/6.0.0
----------------------------------------------------------------------------------------
(...)
    ########## IMPORTANT - DEPENDENCY LOAD NEEDED ##########

    You will need to load all module(s) on any one of the lines below before the "ORCA/6.0.0" module is available to load.

      GCC/13.2.0  OpenMPI/5.0.3
(...)

Finally, we see that GCC compiler and OpenMPI library are required in order to load ORCA software in version 6.0.0. Thus, we can load it using "module load":

$ module load GCC/13.2.0 OpenMPI/5.0.3 ORCA/6.0.0
 GCCcore/13.2.0 loaded.
 zlib/1.2.13 loaded.
 binutils/2.40 loaded.
 GCC/13.2.0 loaded.
 libfabric/1.15.2.0 loaded.
 craype-network-ofi loaded.
 OpenMPI/5.0.3 loaded.
 OpenBLAS/0.3.24 loaded.
 FlexiBLAS/3.3.1 loaded.
 FFTW/3.3.10 loaded.
 FFTW.MPI/3.3.10 loaded.
 ScaLAPACK/2.2.0-fb loaded.
 xtb/6.7.1-ec4f388 loaded.
To run ORCA please use command "$ORCA" instead "orca", Eg. "$ORCA input.inp"
 ORCA/6.0.0 loaded.

Python and Machine Learning

As noted in the previous sections, nodes with GH-200 GPU superchips have CPUs with arm64 architecture, thus demanding modules built specifically for this architecture.

Warning about using conda!

We strongly advise against using Anaconda for virtual environment management when working with Python. In most cases conda environments ship with a separate Python installations, which may experience compatibility issues on the GH200's ARM architecture. Please use Anaconda only if you have a in-depth knowledge about the Anaconda inner workings and the way it supports ARM platforms.

We recommend creating virtual environments with Python's standard venv module.

We provide a special module for deep learning applications, called ML-bundle. It contains software that is often used by AI libraries. Always load this module before installing/building any packages or running Python programs relying on GPUs on GH-200 nodes - load it as the first step in the given job before activation of any virtual environments.

module add ML-bundle/24.06a

We also provide a custom pip repository, with popular machine learning packages pre-built for different versions with GPU support for ARM architecture. The packages (wheels) from this repo can be directly installed via pip, simply by specifying the correct name, version and tag of a package (pip install library==VER+TAG).

To see a list of available libraries with their tags, check the repo's content by listing the directory:

ls -1 /net/software/aarch64/el8/wheels/ML-bundle/24.06a

this repo becomes automatically visible when ML-bundle is loaded (see variable "$PIP_EXTRA_INDEX_URL"),
wheel files in the repo are named according to library-VER+TAG-suffix.whl pattern so you can easily extract the correct name, version and tag (see an example below).

Example scripts using ML-bundle

An example script that creates a virtual environment and installs the packages from Helios custom wheel repository and requirements.txt file:

preparing environment

#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --account=<your-grant-account>
#SBATCH --partition=plgrid-gpu-gh200
#SBATCH --output=job-%j.out
#SBATCH --error=job-%j.err

# IMPORTANT: load the modules for machine learning tasks and libraries
ml ML-bundle/24.06a

cd $SCRATCH

# create and activate the virtual environment 
python -m venv  my_venv_name/
source my_venv_name/bin/activate

# install one of torch versions available at Helios wheel repo
pip install --no-cache-dir torch==2.5.1+cu124.post3

# install the rest of requirements, for example via requirements file
pip install --no-cache-dir -r requirements.txt

An example script that executes Python program inside created virtual environment:

using environment

#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --account=<your-grant-account>
#SBATCH --partition=plgrid-gpu-gh200
#SBATCH --output=job-%j.out
#SBATCH --error=job-%j.err

# IMPORTANT: load the modules for machine learning tasks and libraries
ml ML-bundle/24.06a

cd $SCRATCH

# activate the virtual environment 
source my_venv_name/bin/activate

# run the program
python my_script_name.py

Examples of distributed AI jobs

We provide examples of using popular launchers (i.e. torchrun, deepspeed) to set up distributed training, both in single- and multinode settings. They can be found in Helios under the following path:

/net/software/examples/gh200/ml_distributed

Multiprocessing - potential problems

In some libraries that uses multiprocessing in Python, problems can be observed when spawning new processes.

For example, in the case of Pytorch, it can happen that the number of threads in new processes is determined based on environment variables that do not always accurately show a number of resources allocated by a given job. As a result, hundred or even thousands of threads can be spawned, leading to a huge performance drop.

The solution to this problem is the manual setting of OMP_NUM_THREADS environment variable. For example, to limit the number of threads per spawned process to 1:

# limit number of threads to prevent excessive thread creation
export OMP_NUM_THREADS=1

Note that the above setting is automatically set when you load ML-bundle.

It is always advised to properly profile the script execution upon first use, especially when using multiprocessing. This can be done in an interactive session via simple tools, such as htop.

More information

Helios follows Prometheus's configuration and usage patterns. Prometheus documentation can be found here: https://kdm.cyfronet.pl/portal/Prometheus:Basics

Space shortcuts

Child pages

Support

Access to Helios

Machine description

Job submission

Accounts and computing grants

Storage

System Utilities

Sample job scripts

Software

Working with modules

Python and Machine Learning

Example scripts using ML-bundle

Examples of distributed AI jobs

Multiprocessing - potential problems

More information