Configuration of cryoSPARC environment

License

Each user should have their license obtained from https://cryosparc.com/download

To get access to cryoSPARC installation at the Athena cluster

  1. Make sure you have active access to the Athena cluster.
  2. Create a ticket at the PLGrid Helpdesk and apply for membership in plggcryospar team.
  3. Log in to the Athena login node.

    Log in to Athena login node
    ssh <login>@athena.cyfronet.pl
  4. Load the cryoSPARC module using the command

    Set cryoSPARC environment
    module add <module_name>/<version>

    cryoSPARC versions

    The following modules containing different cryoSPARC versions are currently available on the Athena cluster:

    • cryoSPARC/4.7.0
    • cryoSPARC/4.6.2 (default)
    • cryoSPARC/4.5.3-240807
    • cryoSPARC/4.5.1
    • cryoSPARC/4.4.1-240110
    • cryoSPARC/4.4.0-231114
    • cryoSPARC/4.3.0
    • cryoSPARC/4.2.1-230621
    • cryoSPARC/4.2.1-230403
    • cryoSPARC/4.2.0-230302

    Stalling jobs

    CryoSPARC version 4.6.0 (not listed above) introduced a regression, causing some jobs (especially 2D Classification) to stall and never finish.
    The newer version, 4.6.2, should fix this issue, but if you encounter such behavior, please let us know via the Helpdesk.

  5. Run cryoSPARC configuration script.
    It will configure your cryoSPARC environment, create a new user in the cryoSPARC database, and configure default cluster lanes.

    Pass your license ID, e-mail, password (which will be used to log in to the cryoSPARC web app), and first and last name as arguments for the script.

    Configure cryoSPARC
    cryosparc_configuration --license <license-id> --email <your-email> --password '<password>' --firstname <first-name> --lastname <last-name> 

    Note that the password has to be enclosed by single quotes, so the configuration script can properly handle some special characters like '$'.

    Slurm account for configuration

    cryosparc_configuration  script runs srun  job underneath, so one may need to specify a Slurm account to run this job. It can be done with a command

    export SLURM_ACCOUNT=<account_name>

    where <account_name> should be a grant name with a '-gpu-a100' suffix: <grant_name>-gpu-a100

    More details about the account naming scheme can be found on the documentation page.

    Access problems

    In the case of "cryosparc_configuration: command not found" error run in terminal

    newgrp plggcryospar

    to start a new subshell with permissions of plggcryospar team.

    Accounts/grants management

    Athena uses a new account naming scheme for computing grants as specified on the documentation page.

    Our cryoSPARC tools provide a utility - cryosparc-accounts  with which one can specify what account/grant should be used for cryoSPARC jobs. Please follow the instructions provided by $cryosparc-accounts help command.

    cryosparc-accounts show
        prints currently selected account for cryoSPARC jobs
    
    cryosparc-accounts set
        sets account/grant to be used by cryoSPARC jobs to ACCOUNT_NAME
    
    cryosparc-accounts clear
        clears custom account/grant for cryoSPARC jobs
       
    cryosparc-accounts help
        prints this message
  6. Your cryoSPARC master setup is already done. All crypoSPARC master instances should be run in batch jobs.


cryoSPARC master job

cryoSPARC master process must not be run on login nodes of the Athena cluster. It should be run using plgrid-services partition through the SLURM job described below.

Automated cryoSPARC master in batch job

cryoSPARC master should be started through the batch job. 

The batch script is located at $CRYOSPARC_ADDITIONAL_FILES_DIR/cryosparc-master.slurm. The $CRYOSPARC_ADDITIONAL_FILES_DIR variable is available when the cryoSPARC module is loaded into the environment.

  1. Load cryoSPARC module

    Loading module
    module add cryoSPARC/<version>
  2. Submit job

    cryosparc-master job submission
    sbatch $CRYOSPARC_ADDITIONAL_FILES_DIR/cryosparc-master.slurm 

    cryoSPARC master job

    There should be only one job that runs cryoSPARC master in plgrid-services partition per user.

  3. Check whether the job was started.

    Check job status
    squeue -j <JobID>
  4. Common states of jobs

    • PD - PENDING  - job is awaiting resource allocation,
    • R - RUNNING - job currently has an allocation and is running,
    • CF - CONFIGURING  - job has been allocated resources but is waiting for them to become ready for use (e.g., booting),
    • CG - COMPLETING  - job is in the process of completing. Some processes on some nodes may still be active.
  5. Make a tunnel

    In your directory cat job log file:

    Listing of job's log
    cat cryosparc-master-log-<JobID>.txt

    where `XXXXXXX` is your batch job id which is displayed after you run it f.e. `cat cryosparc-master-log-49145683.txt`

    It will show you something like this:

    Example of job log
    Copy/Paste this in your local terminal to ssh tunnel with remote
    -----------------------------------------------------------------
    ssh -o ServerAliveInterval=300 -N -L 40100:172.20.68.193:40100 plgusername@athena.cyfronet.pl
    -----------------------------------------------------------------
    Then open a browser on your local machine to the following address
    ------------------------------------------------------------------
    localhost:40100 
    ------------------------------------------------------------------
  6. Exec the given command in another shell at your local computer to make a tunnel:

    Tunneling
    ssh -o ServerAliveInterval=300 -N -L 40100:172.20.68.193:40100 plgusername@athena.cyfronet.pl
  7. Log in to cryoSPARC web application - open in browser: `localhost:40100`
  8. To gracefully stop cryoSPARC master in a batch job, send signal "2" using scancel --batch --signal=2 <JobID> command. 

    Ending master job
    scancel --batch --signal=2 <JobID>

    In case the job isn't stopped by the user using scancel command, it will be ended gracefully by sending signal "2" just before maximal time (it is done through #SBATCH --signal=B:2@240 command in the script).

    The cryoSPARC master instance should not be stopped when cryoSPARC jobs are running or queued. Otherwise, jobs may fail or behave unexpectedly.

Storage space

We recommend storing project directories and input files in group storage space. Athena shares the same group storage (pr2) with Ares, and access to it can be obtained through the PLGrid grant system (ACK Cyfronet HPC Storage - STORAGE-01 resource).

Please note that Athena's SCRATCH space is high-performance SSD-based storage dedicated to short-lived data in IO-intensive tasks and is limited to 25TB per user. Any data stored in SCRATCH for over 60 days can be deleted without notice. Therefore, it is not suitable for storing large and permanent data.

SSD cache

Athena SCRATCH space is based on high-performance NVMe drives, so enabling the "Cache on SSD" option in all jobs that support it is highly recommended. Performing such jobs without an SSD cache enabled can lead to much longer computing times.

The SSD cache is configured to be located in $SCRATCH/.cryosparc/cache directory with a 10TB quota and the lifetime of the cryosparc-master instance. As SCRATCH space on Athena is limited to 25TB per user, please try to use no more than 15TB for other data so cryoSPARC can fully use its cache space, especially if you work on very large datasets. 

More details about cryoSPARC SSD cache functionality can be found in the official documentation

Cluster lanes

Default lanes

There are six default cluster lanes available on the Athena cluster.

  • athena-plgrid-12h - primary lane dedicated for CPU and GPU jobs, with GPU jobs limited to 12 hours and CPU jobs to 72 hours.
  • athena-plgrid-24h - similar to athena-plgrid-12h, but with maximum GPU job duration extended to 24 hours.
  • athena-plgrid-6h - similar to athena-plgrid-6h, but with maximum GPU job duration extended to 6 hours (versions >= 4.4.0).
  • athena-plgrid-bigmem-12h - same as athena-plgrid-12h but with doubled memory size.  (versions >= 4.2.1-230403)
  • athena-plgrid-bigmem-24h - same as athena-plgrid-24h but with doubled memory size.
  • athena-plgrid-bigmem-6h - same as athena-plgrid-6h but with doubled memory size (versions >= 4.4.0)

Bigmem lanes

As using longer or bigmem lanes may lead to longer queue waiting times and unnecessary resource reservations, it is recommended to conduct calculations on regular lanes whenever possible.

In version 4.2.0, default memory requirements have been readjusted for specific job types (like Non-uniform refinement and Topaz train), so they should fit in regular lanes in more cases.

If even the memory amount available with athena-plgrid-bigmem lanes is insufficient for your needs, please use the custom variables described below. For cryoSPARC versions lower than 4.4.1, consider creating an additional custom lane or contact support.

Custom variables

Starting from version 4.4.0, we began to introduce additional options to support more flexible job configuration. As for now, the following variables are supported by all of the default cluster lanes:

  • max_hours - (versions >= 4.4.0) - overrides the default maximum job time for the selected cluster lane. With this variable, one can adjust the maximum job time to the expected execution time of a job. Shorter time reservations may improve queue waiting times. Remember that the scheduling system will kill the job if it isn't finished in time. 
  • mem_mult - (versions >= 4.4.1) - overrides the default memory multiplier for the selected cluster lane. The default setting of mem_mult is 1 for ordinary lanes and 2 for bigmem lanes. For example, setting mem_mult to 4 will result in a reservation of two times more memory than with bigmem lanes with default values.
  • slurm_account - (versions >= 4.4.1) - can be used to specify the Slurm account for the job. This variable overrides the account specified with the cryosparc-accounts utility.
  • notification_email - (versions >= 4.6.2) - the email address specified in this variable will be used to send notifications about job status.

Details on how to use custom variables can be found in cryoSPARC official documentation.


In case you need access to new cluster lanes or cluster lane features on the older cryoSPARC version, please contact the Helpdesk.

Creating additional lanes

You can create additional cluster lanes to fulfill your specific requirements:

  1. Start cryosparc-master instance as usual (if not already running),
  2. Attach to the cryosparc-master instance job

    Interactive job
    srun -N1 -n1 --jobid <job_id> --pty /bin/bash
  3. Load cryoSPARC environment using modules

    Load cryoSPARC environment
    module add cryoSPARC/<version>
  4. Copy cluster config cluster_info.json  and script template cluster_script.sh from $CRYOSPARC_ADDITIONAL_FILES_DIR  to your working directory

    Copy files
    cp $CRYOSPARC_ADDITIONAL_FILES_DIR/cluster_info.json .
    cp $CRYOSPARC_ADDITIONAL_FILES_DIR/cluster_script.sh .
  5. Modify files accordingly
    1. in config cluster_info.json change the name of lane/cluster to avoid overwriting default athena-* lanes
    2. in cluster_script.sh change --time, --mem or other parts of the script template
  6. Run command 

    add line
    cryosparcm cluster connect
  7. Repeat steps 5 and 6 to create other lanes if necessary
  8. End interactive job

    end interactive job
    exit

For more details, please refer to the official cryoSPARC documentation.

External utilities

Topaz

A Topaz utility is installed on the Athena cluster and available for cryoSPARC Topaz jobs. The latest version of Topaz officially supported by cryoSPARC is 0.2.4, but version 0.2.5 is also expected to work. Please use one of the following paths to the topaz executable:

/net/software/v1/software/topaz/0.2.4/bin/topaz
or
/net/software/v1/software/topaz/0.2.5/bin/topaz

MotionCor2

The most recent version of MotionCor2 supported by cryoSPARC (as of cryoSPARC version 4.5.1) is 1.4.5. Please use the following path to the MotionCor2 executable:

/net/software/v1/software/MotionCor2/1.4.5-GCCcore-11.3.0/bin/motioncor2 

deppEMhancer

Version 0.15 of deepEMhancer is available on the Athena cluster. Please use the following path to the deepEMhancer executable:

/net/software/v1/software/deepEMhancer/0.15/bin/deepemhancer.sh

DeepEMhancer models are located in the following directory

/net/software/v1/software/deepEMhancer/0.15/deepEMhancerModels/production_checkpoints 


Maintenance procedures

cryoSPARC database backup

A backup of the cryoSPARC database will be automatically created every seven or more days during cryosparc-master startup in $SCRATCH/.cryosparc/database_backup directory, and three previous backups will be kept by default. Different locations and other options can be specified in $HOME/.cryosparc/cyfronet file.

Please refer to the official cryoSPARC documentation for more details about the cryoSPARC database backup and how to restore it.

Upgrading to a newer version

cryoSPARC is updated frequently with updates and patches (see https://cryosparc.com/updates). For each new version, a new module cryosparc/<version> is created. To upgrade your instance to a new version, you will have to

  1. End all running cryoSPARC jobs.
  2. Shut down the cryosparc-master instance

    scancel --batch --signal=2 <cryosparc_master_jobid>
  3. Load module with the new version of cryoSPARC.
  4. Start a new cryosparc-master instance as usual.
  5. (Optional) Manually update all your custom lanes (specifically, worker_bin_path entry in cluster_info.json file should be adjusted and the lane re-added). Default lanes will be updated automatically.

Migrating from Ares cluster

Athena is a computing cluster designed to handle tasks that heavily use GPU resources and require the highest IO performance. With 384 NVIDIA A100 GPUs and access to SSD-based SCRATCH storage, we expect computing and queue waiting times to be much shorter than Ares. For users with the largest cryoSPARC databases, we also expect the web interface to become more responsive. That all should result in a vastly improved overall experience when using cryoSPARC, and thus, migrating from Ares is highly recommended.

The cryoSPARC setup on the Athena cluster is very similar to one already established on Ares and previously on Prometheus, with the main difference being that SSD caching mechanisms are enabled.

As Athena shares group storage space (pr2) with Ares, it is easy to migrate existing projects from Ares via built-in cryoSPARC importing and exporting utilities that were highly improved in the 4.0 release. For details, please refer to the official guide on migrating projects to the new cryoSPARC instance.


With the new installation of cryoSPARC being well-tested in production, Athena is now the main cluster on which we support cryoSPARC. New versions are expected to be introduced only on Athena.

  • No labels