Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Helios is built with Slingshot interconnect and nodes of the following specification:

PartitionNumber of nodesCPURAM

Proportional

RAM for one CPU

Proportional

RAM for one GPU

Proportional

CPU for one GPU

Accelerator
plgrid (includes plgrid-long)272192 cores, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz384 GB2000 MBn/an/a
plgrid-bigmem120192 cores, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz768 GB4000 MBn/an/a
plgrid-gpu-gh200110288 cores, 4x 480 GBn/a120 GB724x NVIDIA GH200 96GB 

Job submission

Helios is using Slurm resource manager, jobs should be submitted to the following partitions:

NameTimelimit

Resource type

(account suffix)

Access requirementsDescription
plgrid72h-cpuGenerally available.Standard partition.
plgrid-testing1h-cpuGenerally available.High priority, testing jobs, limited to 1 running job.
plgrid-now12h-cpuGenerally available.The highest priority, interactive jobs, limited to 1 running or queued job.
plgrid-long168h-cpuRequires a grant with a maximum job runtime of 168h.Used for jobs with extended runtime.
plgrid-bigmem72h-cpu-bigmemRequires a grant with CPU-BIGMEM resources.Resources used for jobs requiring an extended amount of memory.
plgrid-gpu-gh20048h-gpu-gh200Requires a grant with GPGPU resources.GPU partition.

If you are unsure of how to properly configure your job on Helios please consult this guide: Job configuration

...

Helios uses a new naming scheme for CPU and GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:

Resourceaccount name
CPUgrantname-cpu
CPU bigmem nodesgrantname-cpu-bigmem
GPUgrantname-gpu-gh200

Please mind that sbatch -A grantname won't work on its own. You need to add the -cpu, -cpu-bigmem, or -gpu-gh200 suffix! Available computing grants, with respective account names (allocations), can be viewed using the hpc-grants command.

...

Available storage spaces are described in the following table:

LocationLocation in the filesystemPurpose
$HOME/net/home/plgrid/<login>Storing own applications, and configuration files. Limited to 10GB.
$SCRATCH

/net/scratch/hscra/plgrid/<login>

High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable.
$PLG_GROUPS_STORAGE/<group name>/net/storage/pr3/plgrid/<group name>Long-term storage for data living for the period of computing grant. Should be used for storing significant amounts of data.

Current usage, capacity and other storage attributes can be checked by issuing the hpc-fs command.

...