For important information and announcements, please follow this page and the messages displayed in the login message.

Access to Helios

We strongly suggest using SSH keys to access the machine! SSH key management can be done through the PLGrid portal. Password access will be disabled in the near future.

Computing resources on Helios are assigned based on PLGrid computing grants. To perform computations on Helios, you must obtain a computing grant through the PLGrid Portal (https://portal.plgrid.pl/) and apply for Helios access.

...

Note that Helios uses PLGrid accounts and grants. Make sure to request the "Helios access" access service in the PLGrid portal.

Helios is using the node job-exclusive policy. This means that nodes are allocated for a dedicated, single job which is using the resources. This also impacts the accounting where the minimum amount of resources used equals to one node.

Helios is built with Slingshot interconnect and nodes of the following specificationis a hybrid cluster. CPU nodes use x86_64 CPUs, while the GPU partition is based on GH200 superchips, which include an Nvidia Grace - ARM CPU and Nvidia Hopper GPU. HPE Slingshot is used as an interconnect. The login01 node uses an x86_64 CPU and RHEL 8. Please keep this in mind when compiling software, etc. Knowing the destination CPU architecture and operating system is important for selecting the proper modules and software. Each architecture has its own set of modules, in order to see the complete list of modules you need to run module avail on a node of a chosen type. Node specification can be found below:

384 GB2000 MB~~768 GB4000 MB~~, 4x 480 GB120 GB

Partition	Number of nodes	Operating system	CPU
plgrid (includes plgrid-long)	272	RHEL 8	192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz
384GB
2000MB	n/a	n/a
plgrid-bigmem	120	RHEL 8	192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz
768GB
4000MB	n/a	n/a
plgrid-gpu-gh200	110	CrayOS (SLES 15sp5)	288 cores
, aarch64, 4x NVIDIA Grace CPU 72-Core @ 3.1 GHz	480GB	n/a
120GB	72	4x NVIDIA GH200 96GB

Note that Helios will soon be upgraded to RHEL 9. This change will be applied to all CPU and GPU nodes.

Job submission

Helios is using Slurm resource manager, jobs should be submitted to the following partitions:

Name	Timelimit	Resource type (account suffix)	Access requirements	Description
plgrid	72h	-cpu	Generally available.	Standard partition.
~~plgrid-testing~~	1h	~~-cpu~~	~~Generally available.~~	~~High priority, testing jobs, limited to 1 running job.~~
~~plgrid-now~~	~~12h~~	~~-cpu~~	~~Generally available.~~	~~The highest priority, interactive jobs, limited to 1 running or queued job.~~
plgrid-long	168h	-cpu	Requires a grant with a maximum job runtime of 168h.	Used for jobs with extended runtime.
~~plgrid-bigmem~~	~~72h~~	~~-cpu-bigmem~~	~~Requires a grant with CPU-BIGMEM resources.~~	~~Resources used for jobs requiring an extended amount of memory.~~
plgrid-gpu-gh200	48h	-gpu-gh200	Requires a grant with GPGPU resources.	GPU partition.

If you are unsure of how to properly configure your job on Helios please consult this guide: Job configuration

...

Helios uses a new naming scheme for CPU and GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:

Resource	account name
CPU	grantname-cpu
~~CPU bigmem nodes~~	~~grantname-cpu-bigmem~~
GPU	grantname-gpu-gh200

Please mind that sbatch -A grantname won't work on its own. You need to add the -cpu, -cpu-bigmem, or -gpu-gh200 suffix! Available computing grants, with respective account names (allocations), can be viewed using the hpc-grants command.

...

Available storage spaces are described in the following table:

Location	Location in the filesystem	Purpose
$HOME	/net/home/plgrid/<login>	Storing own applications, and configuration files. Limited to 10GB.
$SCRATCH	/net/scratch/hscra/plgrid/<login>	High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable.
$PLG_GROUPS_STORAGE/<group name>	/net/storage/pr3/plgrid/<group name>	Long-term storage for data living for the period of computing grant. Should be used for storing significant amounts of data.

Current usage, capacity and other storage attributes can be checked by issuing the hpc-fs command.

...

Applications and libraries are available through the modules system. Please note that the module structure was flattened, and module paths have changed compared to Prometheus Modules for ARM and x86 CPUs are not interchangeable, and selecting the right module for the destined architecture is critical for getting software to work! Please load the proper modules on the node, inside of the job script! The list of available modules can be obtained by issuing the command:

module avail

This command should be run on a compute node to get a full list of modules available on the given architecture (node type)! The list is searchable by using the '/' key. The specific module can be loaded by the add command:

...

and the environment can be purged by:

module purge

Modules' names on Helios are case sensitive.

Sample job scripts

Example job scripts are available on this page: Sample scripts

...

Space shortcuts

Child pages

Versions Compared

Old Version 4

New Version Current

Key

Access to Helios

Job submission

Sample job scripts

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 4

New Version Current

Key

Access to Helios

Job submission

Sample job scripts