cluster-jobs
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
cluster-jobs [2024/05/07 14:45] – created gabriele | cluster-jobs [2024/07/16 10:47] (current) – gabriele | ||
---|---|---|---|
Line 13: | Line 13: | ||
===== 1. General information ===== | ===== 1. General information ===== | ||
- | If you want to see a general | + | Below here, there are a few introductory sections for your general |
- | Below here, there are a few introductory sections for your general information on SLURM workload management system. For more details | + | ==== SLURM Workload Manager ==== |
+ | SLURM is the workload manager and job scheduler.\\ | ||
+ | There are two ways of starting jobs with SLURM; either interactively with '' | ||
+ | Interactive jobs are a good way to test your setup before you put it into a script or to work with interactive applications like MATLAB or Python. You immediately see the results and can check if all parts behave as you expected. See the [[cluster-ijobs|Interactive Jobs]] for more details.\\ | ||
+ | __Please note__: at our site if you submit | ||
- | SLURM Workload Manager | + | ==== SLURM Parameters ==== |
+ | SLURM supports a multitude of different parameters. This enables you to effectively tailor your script to your need when using Psychp01 but also means that is easy to get lost and waste your time and quota.\\ | ||
+ | __Note__: everything that is preceded by ''#'' | ||
+ | The following parameters can be used as command line parameters with '' | ||
- | SLURM is the workload manager and job scheduler. | + | #SBATCH --job-name=my_test_job_name |
- | There are two ways of starting jobs with SLURM; either interactively with srun or as a script with sbatch. | + | __NOTE__: Do not use spaces in the job name. Something like that:'' |
+ | See the [[cluster-advanced_slurm|Advanced | ||
- | Interactive jobs are a good way to test your setup before you put it into a script or to work with interactive applications like MATLAB or Python. You immediately see the results and can check if all parts behave as you expected. See Interactive Jobs for more details. | ||
- | Please note: at our site if you submit | + | ==== Differences between CPUs and tasks ==== |
+ | As a new user writing your first SLURM job script the difference between '' | ||
+ | The answer is it depends on whether your application supports MPI. MPI (message passing protocol) | ||
+ | To decide which set of parameters | ||
+ | __IMPORTANT__: | ||
- | SLURM Parameters | ||
- | SLURM supports a multitude of different parameters. This enables you to effectively tailor your script to your need when using Psychp01 but also means that is easy to get lost and waste your time and quota. | + | ===== 2. Job Blueprint ===== |
+ | You can save the following example | ||
- | Note: everything that is preceded by # followed by a space will be taken as a comment. On the contrary, #SBATCH followed by dash or double dash will define the parameters for the job. | + | sbatch run.sh |
- | The following parameters can be used as command line parameters | + | Please note that all values that you define |
- | #SBATCH --job-name=my_test_job_name | ||
- | NOTE: Do not use spaces | + | #!/bin/bash -l |
+ | |||
+ | ############################## | ||
+ | # Job blueprint | ||
+ | ############################## | ||
+ | |||
+ | # Give your job a name, so you can recognize it in the queue overview | ||
+ | #SBATCH --job-name=example | ||
+ | |||
+ | # Define, how many nodes you need. For now, psychp01 has only 1 node with a 100 CPUs. | ||
+ | |||
+ | #SBATCH --nodes=1 | ||
+ | |||
+ | # By default, SLURM will allocate 1 CPU to your tasks. You can define the number of CPUs you would need for your job as follows: | ||
+ | #SBATCH --cpus-per-task=20 | ||
+ | |||
+ | # You can further define the number of tasks with --ntasks-per-* | ||
+ | # See "man sbatch" | ||
+ | # Define, how long the job will run in real time. This is a hard cap meaning | ||
+ | # that if the job runs longer than what is written here, it will be | ||
+ | # force-stopped by the server. If you make the expected time too long, it will | ||
+ | # take longer for the job to start. Here, we say the job will take 5 minutes. | ||
+ | # d-hh:mm:ss | ||
+ | |||
+ | | ||
+ | |||
+ | # Define the partition on which the job shall run. May be omitted. | ||
+ | # | ||
+ | |||
+ | # How much memory you need. | ||
+ | # --mem will define memory per node and | ||
+ | # --mem-per-cpu will define memory per CPU/core. Choose one of those. | ||
+ | #SBATCH --mem-per-cpu=1500MB | ||
+ | ##SBATCH --mem=5GB | ||
+ | |||
+ | # Turn on mail notification. There are many possible self-explaining values: | ||
+ | # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) | ||
+ | # For more values, check "man sbatch" | ||
+ | #SBATCH --mail-type=END, | ||
+ | |||
+ | # You may not place any commands before the last SBATCH directive | ||
+ | |||
+ | # Define and create a unique scratch directory for this job | ||
+ | SCRATCH_DIRECTORY=/ | ||
+ | mkdir -p ${SCRATCH_DIRECTORY} | ||
+ | cd ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | # You can copy everything you need to the scratch directory | ||
+ | # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from | ||
+ | cp ${SLURM_SUBMIT_DIR}/ | ||
+ | |||
+ | # This is where the actual | ||
+ | # The time command is optional, but it may give you a hint on how long the | ||
+ | # command worked | ||
+ | time sleep 10 | ||
+ | #sleep 10 | ||
+ | |||
+ | # After the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
+ | cp ${SCRATCH_DIRECTORY}/ | ||
+ | |||
+ | # In addition to the copied files, you will also find a file called | ||
+ | # slurm-1234.out in the submit directory. This file will contain all output that | ||
+ | # was produced during runtime, i.e. stdout and stderr. | ||
+ | |||
+ | # After everything is saved to the home directory, delete the work directory to | ||
+ | # save space on /home | ||
+ | cd ${SLURM_SUBMIT_DIR} | ||
+ | rm -rf ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | # Finish the script | ||
+ | exit 0 | ||
- | See here for list of commands and here for examples on how to use them in batch scripts. | + | ==== Running many sequential jobs in parallel using job arrays ==== |
+ | In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example, but this does not matter for the job arrays: | ||
+ | # | ||
+ | | ||
+ | import time | ||
+ | | ||
+ | print(' | ||
+ | | ||
+ | print(' | ||
+ | time.sleep(10) | ||
+ | | ||
+ | print(' | ||
+ | | ||
+ | Save this to a file called “test.py” and try it out: | ||
- | Differences between CPUs and tasks | + | $ python test.py |
+ | |||
+ | start at 15:23:48 | ||
+ | sleep for 10 seconds ... | ||
+ | stop at 15:23:58 | ||
+ | |||
+ | Good. Now we would like to run this script 16 times at the same time. For this we use the following script: | ||
- | As a new user writing your first SLURM job script | + | #!/bin/bash -l |
+ | |||
+ | ##################### | ||
+ | # job-array example # | ||
+ | ##################### | ||
+ | |||
+ | #SBATCH --job-name=example | ||
+ | |||
+ | # 16 jobs will run in this array at the same time | ||
+ | # | ||
+ | |||
+ | # run for five minutes | ||
+ | # d-hh:mm:ss | ||
+ | # | ||
+ | |||
+ | # 500MB memory | ||
+ | # this is a hard limit | ||
+ | # | ||
+ | |||
+ | # you may not place bash commands before the last SBATCH directive | ||
+ | |||
+ | # define and create a unique scratch directory | ||
+ | SCRATCH_DIRECTORY=/ | ||
+ | mkdir -p ${SCRATCH_DIRECTORY} | ||
+ | cd ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | cp ${SLURM_SUBMIT_DIR}/ | ||
+ | |||
+ | # each job will see a different ${SLURM_ARRAY_TASK_ID} | ||
+ | echo "now processing | ||
+ | python test.py > output_${SLURM_ARRAY_TASK_ID}.txt | ||
+ | |||
+ | # after the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
+ | cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR} | ||
+ | |||
+ | # we step out of the scratch directory and remove it | ||
+ | cd ${SLURM_SUBMIT_DIR} | ||
+ | rm -rf ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | # happy end | ||
+ | exit 0 | ||
- | The answer is it depends whether your application supports MPI. MPI (message passing protocol) is a communication interface used for developing parallel computing programs on distributed memory systems. This is necessary for applications running on multiple computers (nodes) to be able to share (intermediate) results. | + | Submit the script and after a short while you should see 16 output files in your submit directory. |
- | To decide which set of parameters you should use, check if your application utilizes MPI and therefore would benefit from running on multiple nodes simultaneously. On the other hand, if you have an non-MPI enables application or made a mistake in your setup, it doesn’t make sense to request more than one node. | ||
- | Currently, psychp01 has only a single node, hence when running your analyses, you would need to carefully choose only how many CPUs your analysis (task) requires by using --cpus-per-task. | ||
+ | |||
+ | ===== 3. R example ===== | ||
+ | Running R scripts on psychp01 is very easy. Save the following R script in your home directory (e.g., ''/ | ||
+ | |||
+ | # Create a 2000x2000 matrix with random values | ||
+ | A1 <- matrix(runif(2000*2000), | ||
+ | |||
+ | # Perform 1000 fft operations on it | ||
+ | start_time <- as.numeric(Sys.time()) | ||
+ | for (i in 1:1000) { | ||
+ | A1 <- fft(A1) | ||
+ | } | ||
+ | time1 <- as.numeric(Sys.time()) - start_time | ||
+ | |||
+ | # write result time to file: | ||
+ | cat(paste(' | ||
+ | |||
+ | Save the following job script in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # My job script to run R code | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_r_test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # Use the command Rscript and set the path to the R script to run it. Change / | ||
+ | Rscript / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_r_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | |||
+ | ===== 4. Python example ===== | ||
+ | Below is an example of a Python script to be used on the cluster. Save the following Python script in your home directory (e.g., ''/ | ||
+ | |||
+ | import numpy as np | ||
+ | from time import time | ||
+ | | ||
+ | # Create a 2000x2000 matrix with random values | ||
+ | A1 = np.random.rand(2000, | ||
+ | | ||
+ | # Perform 1000 fft operations on it | ||
+ | start_time = time() | ||
+ | for i in range(1000): | ||
+ | A1 = np.fft.fft(A1) | ||
+ | end_time = time() | ||
+ | time1 = end_time - start_time | ||
+ | | ||
+ | # write result time to file: | ||
+ | with open(' | ||
+ | fh.write(f' | ||
+ | |||
+ | The job script below will run your Python script. Save it in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # My job script to run Python code | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | | ||
+ | # import below here any python packages that are required for your script | ||
+ | | ||
+ | # use the python command and set the path to your script to run it. Change / | ||
+ | python / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_python_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | On page 37 of the pdf version of the manual in the [[cluster-guide|main page]], there is a short video that demonstrates what just described. | ||
+ | |||
+ | |||
+ | ===== 5. MATLAB example ===== | ||
+ | ==== Simple example ==== | ||
+ | Running MATLAB scripts on psychp01 is pretty straightforward. Save the following MATLAB script in your home directory (e.g., ''/ | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | Save the following job script in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # This is a comment for this job script to run the above matlab script | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_test_for_matlab_script # this is just a label to name your job. When you run " | ||
+ | # --- resource specification (which resources for how long) --- | ||
+ | #SBATCH --partition=test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # Use the srun command to run the MATLAB script. Change / | ||
+ | srun matlab -nodisplay -batch / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_matlab_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | |||
+ | ==== Example with Parallel Computing ==== | ||
+ | For **parallel computing**, | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | parpool(10); | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | parfor i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | No further changes are required to your job script '' | ||
+ | Now run your job script as before using the '' | ||
+ | |||
+ | sbatch run_matlab_script.sh | ||
+ | |||
+ | |||
+ | ==== Example with GPU Computing ==== | ||
+ | For now, **GPU computing** is not supported on psychp01. However, in general, to run MATLAB processes on a GPU partition, we would need to add another block to our MATLAB script. Save it as '' | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % copy the matrix onto the GPU | ||
+ | A2 = gpuArray(A1); | ||
+ | % perform the same 1000 operations | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A2 = fft(A2); | ||
+ | end | ||
+ | time2 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | We also have to modify the job script a bit. We will select the GPU partition. Save the new job script as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # This is a comment for this job script to run the above matlab script on the GPU partition | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J mat_gpu | ||
+ | # --- resource specification (which resources for how long) --- | ||
+ | #SBATCH --partition=gpu | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # --- start from a clean state and load necessary environment modules --- | ||
+ | | ||
+ | # run MATLAB | ||
+ | srun matlab -nodisplay -batch / | ||
+ | |||
+ | And now run it like before: | ||
+ | sbatch run_matlab_gpu_script.sh | ||
+ | |||
+ | |||
+ | [[{: | ||
+ | [[{: | ||
+ | |||
+ | |||
+ | ~~DISCUSSION|Discussion~~ | ||
cluster-jobs.1715093123.txt.gz · Last modified: 2024/05/07 14:45 by gabriele