This lesson is in the early stages of development (Alpha version)

SLURM Advanced Topics: Glossary

Key Points

Work with Graphical User Interfaces (GUIs)
  • It is possible to use Graphical User Interfaces when working on Hawk.

  • X11 is a system that enables the display of graphical windows from a remote server

  • Almost all popular SSH clients support X11

  • Hawk provides VNC capabilities that enables the use of a remote Linux desktop.

Common Linux CLI Text Editors
  • Nano, Vim and Emacs are the most common CLI Linux text editors

  • Gedit is also a common text editor with a graphical interface

  • Command line text editors might have a steep learning curve but are powerful

  • If you plan to spend a lot of time working on text files on Linux, it is worth mastering a CLI text editor.

Capture errors in shell scripts
  • Bash has built-in options that allow to check before executing if variables are set in a script.

  • Trapping errors early is very important and can save time and effort in the long run.

  • Writing maintainable shell scripts makes it easier to come back and read your code.

Best practices
  • Do NOT run long jobs in the login nodes

  • Check your quota regularly

  • Submit jobs to appropriate partitions

Requesting resources
  • srun is used to run an interactive job.

  • sbatch is used to queue a job script for later execution.

  • SLURM defines several environment variables that can be used within job scripts

  • Use replacement patterns to uniquely identify your log files

  • SLURM has several commands to help you interact with jobs and partitions

Efficient job scripts
  • SLURM array jobs can be used to parallelize multiple tasks in a single script

  • GNU Parallel can be used similarly to array jobs but has additional features like resubmission capabilities

  • SLURM allow us to create jobs pipelines with its –dependency option

  • Hawk has a Quality of Service feature on partitions that sets limits to users on how many jobs can queued, run at the same time, maximum number of CPUs and nodes

Working with GPUs
  • SCW systems offer P100 and V100 generation GPU devices

  • GPUs are treated as a consumable resource and need to be requested explicitly

  • CUDA libraries are only available on GPU partitions, dev and login nodes

Installing packages
  • Users can control their own software when needed for development.

Accessing webtools
  • OpenOndemand is a powerful way to access GUIs.

  • Accessing Hawk compute nodes directly requires some care but can be powerful in running some tools.

Glossary

FIXME