Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Computing resources on Athena are assigned based on PLGrid computing grants. To perform computations on Athena you need to obtain a computing grant , through the PLGrid Portal (https://portal.plgrid.pl/) and apply for Athena access , through the PLGrid portal (https://aplikacje.plgrid.pl/service/dostep-do-klastra-athena-w-osrodku-cyfronet/).

The work on Athena's store is still underway, thus there is no dedicated storage for performing high IO computations. For the time being please use the ramdisk $MEMFS functionality as the scratch space (https://kdm.cyfronet.pl/portal/Prometheus:Podstawy#Przestrze.C5.84_dyskowa_w_pami.C4.99ci_operacyjnej_MEMFS). Additionally, in current setup the long term storage is sourced from Ares, thus you need to have a grant with storage resources on Ares, to be able to use the group directory storage on Athena. Performing high IO computations on group space is strictly forbidden!

If your grant is active, and you have applied for the service access, the request should be accepted in about half an hour, please . Please report any issues through the helpdesk.

...

PartitionNumber of nodesCPURAM

Proportional

RAM for one GPU

Proportional

CPU for one GPU

Accelerator
plgrid-gpu-a10048128 cores, 2x AMD EPYC 7742 64-Core Processor @ 2.25 GHz1024 GB128000MB168x NVIDIA A100-SXM4-40GB

Job submission

...

NameTimelimitAccount  suffixRemarks
plgrid-gpu-a10048h-gpu-a100GPU A100 partition.

Please use Athena only for GPU-enabled jobs. Running extensive workloads not using GPUs will result in account suspension.

MEMFS RAM storage

In order to allocate RAM for the storage of temporary files, MEMFS uses RAM to create a temporary disk, for the duration of the job. This space is the fastest storage available and should be used to store temporary files. In order to use MEMFS please add the "-C memfs” parameter to your job specification. For example, use the following directive in your batch script: #SBATCH -C memfs

...

Accounts and computing grants

Athena uses a new naming scheme of naming Slurm accounts GPU computing grants. GPU computing grants using A100 GPU resources use the for GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:

Resourceaccount name
GPUgrantname-gpu-a100

suffix. Please mind that sbatch -A grantname won't work on its own. You need to add the -gpu-a100 suffix! Available computing grants, with respective account names (allocations), can be viewed by using the hpc-grants command.

Resource allocated on Athena doesn't use normalization, which was used on Prometheus. 1 hour of GPU time equals 1 hour spent using a GPU.on a GPU with a proportional amount of CPUs and memory (consult the table above). The billing system accounts for jobs that use more CPUs or memory than the proportional amount. If the job uses more CPU or memory for each allocated GPU than the proportional amount, it will be billed as it would have used more GPUs. The billed amount can be calculated by dividing the used number of CPUs or memory by the proportional amount per GPU and rounding the result to the closest and larger integer. Jobs on GPU partitions are always billed in GPU hours.

The cost can be expressed as a simple algorithm:

Code Block
cost_gpu    = job_gpus_used * job_duration
cost_cpu    = ceil(job_cpus_used/cpus_per_gpu) * job_duration
cost_memory = ceil(job_memory_used/memory_per_gpu) * job_duration
final_cost  = max(cost_gpu, cost_cpu, cost_memory)

Storage

Available storage spaces are described in the following table:

LocationLocation in the filesystemDescription
$HOME/net/people/plgrid/<login>Storing own applications, and configuration files
$SCRATCH

Scratch space is currently unavailable, please use a ramdisk created by the $MEMFS functionality:

/net/tscratch/people/<login>High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable.https://kdm.cyfronet.pl/portal/Prometheus:Podstawy#Przestrze.C5.84_dyskowa_w_pami.C4.99ci_operacyjnej_MEMFS
$PLG_GROUPS_STORAGE/<group name>/net/pr2/projects/plgrid/<group name>

Long-term storage, for data living for the period of computing grant.

This space is provided by using Ares storage. If you need permanent space for data, please apply for storage on the Ares cluster.

...

Please use the following commands for interacting to interact with the account and storage management system:

  • hpc-grants - shows available grants, resource allocations
  • hpc-fs - shows available storage
  • hpc-jobs - shows currently pending/running jobs
  • hpc-jobs-history - shows information about past jobs

Software

Compilation should be done on a worker node inside of a computing job. It is most convenient to use an interactive job to do all compilation and application setup. Login node doesn't include development libraries!

Warning
The module tree is still in the early phases of being constructed and can undergo substantial modifications. Please don't rely on the current tree for important computations.

Applications and libraries are available through the modules system. Please note that the module structure was flattened and module paths might have changed! The list of available modules can be obtained by issuing the command:

module avail

to search for a specific module, please use the "spider" (i.e. search) command:

module spider application_name

The specific module can be loaded by the add command:

module add openmpi/4.1.1-gcc-11.2.0

and the environment can be purged by:

on Athena is unsupported. For the time being, please install your own software in the $HOME or the group directory.

Sample job scripts

Example job scripts are available on this page: Sample scripts Please note that Athena is a GPU cluster, so please submit only jobs using GPUs!module purge

More information

Athena is following Prometheus' configuration and usage patterns. Prometheus documentation can be found here: https://kdm.cyfronet.pl/portal/Prometheus:Basics