User Tools

Site Tools


cluster-jobs

CLUSTER MANUAL

Running jobs

What's on this page:

  1. General information
  2. Job Blueprint
  3. R example
  4. Python example
  5. MATLAB example

1. General information

Below here, there are a few introductory sections for your general information on SLURM workload management system. For more details on SLURM visit the SLURM website or check out the Advanced SLURM documentation in this manual.

SLURM Workload Manager

SLURM is the workload manager and job scheduler.
There are two ways of starting jobs with SLURM; either interactively with srun or as a script with sbatch. Interactive jobs are a good way to test your setup before you put it into a script or to work with interactive applications like MATLAB or Python. You immediately see the results and can check if all parts behave as you expected. See the Interactive Jobs for more details.
Please note: at our site if you submit SLURM task arrays it is very important to throttle the number of tasks/CPUs dispatched or you will take all the available resources.

SLURM Parameters

SLURM supports a multitude of different parameters. This enables you to effectively tailor your script to your need when using Psychp01 but also means that is easy to get lost and waste your time and quota.
Note: everything that is preceded by # followed by a space will be taken as a comment. On the contrary, #SBATCH followed by dash or double dash will define the parameters for the job.
The following parameters can be used as command line parameters with sbatch and srun or in jobscript, see script examples below. To use it in a jobscript, start a newline with #SBATCH followed by the parameter. For example, if you want to give a name to your job script, use the argument –job-name=, for example:

#SBATCH --job-name=my_test_job_name

NOTE: Do not use spaces in the job name. Something like that: #SBATCH –job-name=my test job name would not work!
See the Advanced SLURM section for list of commands and examples on how to use them in batch scripts.

Differences between CPUs and tasks

As a new user writing your first SLURM job script the difference between –ntasks and –cpus-per-task is typically quite confusing. Assuming you want to run your program on a single node with 16 cores which SLURM parameters should you specify?
The answer is it depends on whether your application supports MPI. MPI (message passing protocol) is a communication interface used for developing parallel computing programs on distributed memory systems. This is necessary for applications running on multiple computers (nodes) to be able to share (intermediate) results.
To decide which set of parameters you should use, check if your application utilizes MPI and therefore would benefit from running on multiple nodes simultaneously. On the other hand, if you have an non-MPI enables application or made a mistake in your setup, it doesn’t make sense to request more than one node.
IMPORTANT: Currently, psychp01 has only a single node, hence when running your analyses, you would need to carefully choose only how many CPUs your analysis (task) requires by using –cpus-per-task.

2. Job Blueprint

You can save the following example to a file (e.g., run.sh) on psychp01. Comment the two cp commands that are just for illustration purpose (lines 46 and 55) and change the SBATCH directives where applicable. You can then run the script by typing:

sbatch run.sh

Please note that all values that you define with sbatch directives are hard values. When you, for example, ask for 6000 MB of memory (–mem=6000MB) and your job uses more than that, the job will be automatically killed by the manager.

#!/bin/bash -l

##############################
#       Job blueprint        #
##############################

# Give your job a name, so you can recognize it in the queue overview
#SBATCH --job-name=example

# Define, how many nodes you need. For now, psychp01 has only 1 node with a 100 CPUs.

#SBATCH --nodes=1

# By default, SLURM will allocate 1 CPU to your tasks. You can define the number of CPUs you would need for your job as follows:
#SBATCH --cpus-per-task=20

# You can further define the number of tasks with --ntasks-per-*
# See "man sbatch" for details. e.g., --ntasks=4 will ask for 4 cpus.
# Define, how long the job will run in real time. This is a hard cap meaning
# that if the job runs longer than what is written here, it will be
# force-stopped by the server. If you make the expected time too long, it will
# take longer for the job to start. Here, we say the job will take 5 minutes.
#              d-hh:mm:ss

#SBATCH --time=0-00:05:00
 
# Define the partition on which the job shall run. May be omitted.
#SBATCH --partition=test

# How much memory you need.
# --mem will define memory per node and
# --mem-per-cpu will define memory per CPU/core. Choose one of those.
#SBATCH --mem-per-cpu=1500MB
##SBATCH --mem=5GB    # this one is not in effect, due to the double hash
 
# Turn on mail notification. There are many possible self-explaining values:
# NONE, BEGIN, END, FAIL, ALL (including all aforementioned)
# For more values, check "man sbatch"
#SBATCH --mail-type=END,FAIL
 
# You may not place any commands before the last SBATCH directive
 
# Define and create a unique scratch directory for this job
SCRATCH_DIRECTORY=/home/${USER}/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
 
# You can copy everything you need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/myfiles*.txt ${SCRATCH_DIRECTORY}
 
# This is where the actual work is done. In this case, the script only waits.
# The time command is optional, but it may give you a hint on how long the
# command worked
time sleep 10
#sleep 10
 
# After the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
 
# In addition to the copied files, you will also find a file called
# slurm-1234.out in the submit directory. This file will contain all output that
# was produced during runtime, i.e. stdout and stderr.
 
# After everything is saved to the home directory, delete the work directory to
# save space on /home
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
 
# Finish the script
exit 0

Running many sequential jobs in parallel using job arrays

In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example, but this does not matter for the job arrays:

#!/usr/bin/env python

import time

print('start at ' + time.strftime('%H:%M:%S'))

print('sleep for 10 seconds ...')
time.sleep(10)

print('stop at ' + time.strftime('%H:%M:%S'))

Save this to a file called “test.py” and try it out:

$ python test.py

start at 15:23:48
sleep for 10 seconds ...
stop at 15:23:58

Good. Now we would like to run this script 16 times at the same time. For this we use the following script:

#!/bin/bash -l

#####################
# job-array example #
#####################

#SBATCH --job-name=example

# 16 jobs will run in this array at the same time
#SBATCH --array=1-16

# run for five minutes
#              d-hh:mm:ss
#SBATCH --time=0-00:05:00

# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB

# you may not place bash commands before the last SBATCH directive

# define and create a unique scratch directory
SCRATCH_DIRECTORY=/ptmp/${USER}/job-array-example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}

cp ${SLURM_SUBMIT_DIR}/test.py ${SCRATCH_DIRECTORY}

# each job will see a different ${SLURM_ARRAY_TASK_ID}
echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID}
python test.py > output_${SLURM_ARRAY_TASK_ID}.txt

# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR}

# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}

# happy end
exit 0

Submit the script and after a short while you should see 16 output files in your submit directory.

3. R example

Running R scripts on psychp01 is very easy. Save the following R script in your home directory (e.g., /home/gbellucci) as r_script.R:

# Create a 2000x2000 matrix with random values
A1 <- matrix(runif(2000*2000), nrow = 2000, ncol = 2000)
 
# Perform 1000 fft operations on it
start_time <- as.numeric(Sys.time())
for (i in 1:1000) {
    A1 <- fft(A1)
}
time1 <- as.numeric(Sys.time()) - start_time
 
# write result time to file:
cat(paste('CPU time:', round(time1*1000,2),'ms'),file='results.txt')

Save the following job script in your home directory as run_r_script.sh.

#!/bin/bash -l
# My job script to run R code
 
#SBATCH -o ./job_output.%A_%a
#SBATCH -e ./job_errors.%A_%a
#SBATCH -D ./
#SBATCH -J run_r_test
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=6000
#SBATCH --time=00:05:00
 
# Use the command Rscript and set the path to the R script to run it. Change /home/gbellucci with the location of your script on psychp01.
Rscript /home/gbellucci/r_script.R

Now on the command line, change your current directory to the home directory where you saved your run_r_script.sh and use the sbatch command to run the R script:

sbatch run_r_script.sh

When the job is finished you can display the output:

vim results.txt

4. Python example

Below is an example of a Python script to be used on the cluster. Save the following Python script in your home directory (e.g., /home/gbellucci) as python_script.py:

import numpy as np
from time import time

# Create a 2000x2000 matrix with random values
A1 = np.random.rand(2000, 2000).astype('float32')

# Perform 1000 fft operations on it
start_time = time()
for i in range(1000):
    A1 = np.fft.fft(A1)
end_time = time()
time1 = end_time - start_time

# write result time to file:
with open('results.txt', 'w') as fh:
    fh.write(f'CPU time: {time1*1000:.2f} ms\n')

The job script below will run your Python script. Save it in your home directory as run_python_script.sh.

#!/bin/bash -l
# My job script to run Python code
 
#SBATCH -o ./job_output.%A_%a
#SBATCH -e ./job_errors.%A_%a
#SBATCH -D ./
#SBATCH -J run_test
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=6000
#SBATCH --time=00:05:00

# import below here any python packages that are required for your script

# use the python command and set the path to your script to run it. Change /home/gbellucci with the location of your script on psychp01.
python /home/gbellucci/python_script.py

Now on the command line, change your current directory to the home directory where you saved your run_python_script.sh and use the sbatch command to run the Python script:

sbatch run_python_script.sh

When the job is finished you can display the output:

vim results.txt

On page 37 of the pdf version of the manual in the main page, there is a short video that demonstrates what just described.

5. MATLAB example

Simple example

Running MATLAB scripts on psychp01 is pretty straightforward. Save the following MATLAB script in your home directory (e.g., /home/gbellucci) as matlab_script.m:

% Create a 2000x2000 matrix with random values
A1 = rand(2000,2000,'single');
 
% Perform 1000 fft operations on it
tic;
for i=1:1000
    A1 = fft(A1);
end
time1 = toc;
 
% write result time to file:
fh = fopen('results.txt','w+');
fprintf(fh,'CPU time: %.2f ms\n',time1*1000);
fclose(fh);

Save the following job script in your home directory as run_matlab_script.sh:

#!/bin/bash -l
# This is a comment for this job script to run the above matlab script
 
#SBATCH -o ./job_output.%A_%a # this is just a label to name the output filename of your job
#SBATCH -e ./job_errors.%A_%a # this is just a label to name the error filename of your job
#SBATCH -D ./
#SBATCH -J run_test_for_matlab_script # this is just a label to name your job. When you run "squeue" in the command line, that's the name you'll see
# --- resource specification (which resources for how long) ---
#SBATCH --partition=test
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=6000        # memory in MB required by the job
#SBATCH --time=00:05:00   # run time in h:m:s, up to 24h possible
 
# Use the srun command to run the MATLAB script. Change /home/gbellucci with the location of your script on psychp01.
srun matlab -nodisplay -batch /home/gbellucci/matlab_script.m

Now on the command line, change your current directory to the home directory where you saved your run_matlab_script.sh and use the sbatch command to run the MATLAB script:

sbatch run_matlab_script.sh

When the job is finished you can display the output:

vim results.txt

Example with Parallel Computing

For parallel computing, we would need to modify our MATLAB script matlab_script.m to run processes in parallel as follows:

% Create a 2000x2000 matrix with random values
parpool(10);
A1 = rand(2000,2000,'single');
 
% Perform 1000 fft operations on it
tic;
parfor i=1:1000
    A1 = fft(A1);
end
time1 = toc;
 
% write result time to file:
fh = fopen('results.txt','w+');
fprintf(fh,'CPU time: %.2f ms\n',time1*1000);
fclose(fh);

No further changes are required to your job script run_matlab_script.sh in your home directory. You should only be sure that you are not requesting more CPUs in your MATLAB script than the ones you are allowing in the job script. That is, in MATLAB, command parpool should request either less than or the same number of CPUs as the ones you set in #SBATCH –cpus-per-task=10. Otherwise, your job will crash, as MATLAB will not find enough available CPUs when running.
Now run your job script as before using the sbatch command:

sbatch run_matlab_script.sh

Example with GPU Computing

For now, GPU computing is not supported on psychp01. However, in general, to run MATLAB processes on a GPU partition, we would need to add another block to our MATLAB script. Save it as matlab_gpu.m:

% Create a 2000x2000 matrix with random values
A1 = rand(2000,2000,'single');
 
% Perform 1000 fft operations on it
tic;
for i=1:1000
    A1 = fft(A1);
end
time1 = toc;
 
% copy the matrix onto the GPU
A2 = gpuArray(A1);
% perform the same 1000 operations
tic;
for i=1:1000
    A2 = fft(A2);
end
time2 = toc;
 
% write result time to file:
fh = fopen('results.txt','w+');
fprintf(fh,'CPU time: %.2f ms GPU time: %.2f ms Speedup factor: %.2f\n',time1*1000,time2*1000,time1/time2);
fclose(fh);

We also have to modify the job script a bit. We will select the GPU partition. Save the new job script as run_matlab_gpu_script.sh:

#!/bin/bash -l
# This is a comment for this job script to run the above matlab script on the GPU partition
 
#SBATCH -o ./job_output.%A_%a
#SBATCH -e ./job_errors.%A_%a
#SBATCH -D ./
#SBATCH -J mat_gpu
# --- resource specification (which resources for how long) ---
#SBATCH --partition=gpu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=6000        # memory in MB required by the job
#SBATCH --time=00:05:00   # run time in h:m:s, up to 24h possible
 
# --- start from a clean state and load necessary environment modules ---

# run MATLAB
srun matlab -nodisplay -batch /home/gbellucci/matlab_gpu.m

And now run it like before:

sbatch run_matlab_gpu_script.sh

Advanced SLURMInteractive jobs
Return to Table of ContentsReturn to main page

Discussion

Enter your comment. Wiki syntax is allowed:
 
cluster-jobs.txt · Last modified: 2024/07/16 10:47 by gabriele

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki