cluster-jobs
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster-jobs [2024/05/07 17:35] – gabriele | cluster-jobs [2024/07/16 10:47] (current) – gabriele | ||
|---|---|---|---|
| Line 42: | Line 42: | ||
| ===== 2. Job Blueprint ===== | ===== 2. Job Blueprint ===== | ||
| + | You can save the following example to a file (e.g., '' | ||
| + | sbatch run.sh | ||
| + | |||
| + | Please note that all values that you define with '' | ||
| + | |||
| + | |||
| + | #!/bin/bash -l | ||
| + | | ||
| + | ############################## | ||
| + | # Job blueprint | ||
| + | ############################## | ||
| + | | ||
| + | # Give your job a name, so you can recognize it in the queue overview | ||
| + | #SBATCH --job-name=example | ||
| + | | ||
| + | # Define, how many nodes you need. For now, psychp01 has only 1 node with a 100 CPUs. | ||
| + | | ||
| + | #SBATCH --nodes=1 | ||
| + | | ||
| + | # By default, SLURM will allocate 1 CPU to your tasks. You can define the number of CPUs you would need for your job as follows: | ||
| + | #SBATCH --cpus-per-task=20 | ||
| + | | ||
| + | # You can further define the number of tasks with --ntasks-per-* | ||
| + | # See "man sbatch" | ||
| + | # Define, how long the job will run in real time. This is a hard cap meaning | ||
| + | # that if the job runs longer than what is written here, it will be | ||
| + | # force-stopped by the server. If you make the expected time too long, it will | ||
| + | # take longer for the job to start. Here, we say the job will take 5 minutes. | ||
| + | # d-hh:mm:ss | ||
| + | | ||
| + | #SBATCH --time=0-00: | ||
| + | |||
| + | # Define the partition on which the job shall run. May be omitted. | ||
| + | #SBATCH --partition=test | ||
| + | | ||
| + | # How much memory you need. | ||
| + | # --mem will define memory per node and | ||
| + | # --mem-per-cpu will define memory per CPU/core. Choose one of those. | ||
| + | #SBATCH --mem-per-cpu=1500MB | ||
| + | ##SBATCH --mem=5GB | ||
| + | |||
| + | # Turn on mail notification. There are many possible self-explaining values: | ||
| + | # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) | ||
| + | # For more values, check "man sbatch" | ||
| + | #SBATCH --mail-type=END, | ||
| + | |||
| + | # You may not place any commands before the last SBATCH directive | ||
| + | |||
| + | # Define and create a unique scratch directory for this job | ||
| + | SCRATCH_DIRECTORY=/ | ||
| + | mkdir -p ${SCRATCH_DIRECTORY} | ||
| + | cd ${SCRATCH_DIRECTORY} | ||
| + | |||
| + | # You can copy everything you need to the scratch directory | ||
| + | # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from | ||
| + | cp ${SLURM_SUBMIT_DIR}/ | ||
| + | |||
| + | # This is where the actual work is done. In this case, the script only waits. | ||
| + | # The time command is optional, but it may give you a hint on how long the | ||
| + | # command worked | ||
| + | time sleep 10 | ||
| + | #sleep 10 | ||
| + | |||
| + | # After the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
| + | cp ${SCRATCH_DIRECTORY}/ | ||
| + | |||
| + | # In addition to the copied files, you will also find a file called | ||
| + | # slurm-1234.out in the submit directory. This file will contain all output that | ||
| + | # was produced during runtime, i.e. stdout and stderr. | ||
| + | |||
| + | # After everything is saved to the home directory, delete the work directory to | ||
| + | # save space on /home | ||
| + | cd ${SLURM_SUBMIT_DIR} | ||
| + | rm -rf ${SCRATCH_DIRECTORY} | ||
| + | |||
| + | # Finish the script | ||
| + | exit 0 | ||
| + | |||
| + | ==== Running many sequential jobs in parallel using job arrays ==== | ||
| + | In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example, but this does not matter for the job arrays: | ||
| + | |||
| + | # | ||
| + | | ||
| + | import time | ||
| + | | ||
| + | print(' | ||
| + | | ||
| + | print(' | ||
| + | time.sleep(10) | ||
| + | | ||
| + | print(' | ||
| + | | ||
| + | Save this to a file called “test.py” and try it out: | ||
| + | |||
| + | $ python test.py | ||
| + | | ||
| + | start at 15:23:48 | ||
| + | sleep for 10 seconds ... | ||
| + | stop at 15:23:58 | ||
| + | | ||
| + | Good. Now we would like to run this script 16 times at the same time. For this we use the following script: | ||
| + | |||
| + | #!/bin/bash -l | ||
| + | | ||
| + | ##################### | ||
| + | # job-array example # | ||
| + | ##################### | ||
| + | | ||
| + | #SBATCH --job-name=example | ||
| + | | ||
| + | # 16 jobs will run in this array at the same time | ||
| + | #SBATCH --array=1-16 | ||
| + | | ||
| + | # run for five minutes | ||
| + | # d-hh:mm:ss | ||
| + | #SBATCH --time=0-00: | ||
| + | | ||
| + | # 500MB memory per core | ||
| + | # this is a hard limit | ||
| + | #SBATCH --mem-per-cpu=500MB | ||
| + | | ||
| + | # you may not place bash commands before the last SBATCH directive | ||
| + | | ||
| + | # define and create a unique scratch directory | ||
| + | SCRATCH_DIRECTORY=/ | ||
| + | mkdir -p ${SCRATCH_DIRECTORY} | ||
| + | cd ${SCRATCH_DIRECTORY} | ||
| + | | ||
| + | cp ${SLURM_SUBMIT_DIR}/ | ||
| + | | ||
| + | # each job will see a different ${SLURM_ARRAY_TASK_ID} | ||
| + | echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID} | ||
| + | python test.py > output_${SLURM_ARRAY_TASK_ID}.txt | ||
| + | | ||
| + | # after the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
| + | cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR} | ||
| + | | ||
| + | # we step out of the scratch directory and remove it | ||
| + | cd ${SLURM_SUBMIT_DIR} | ||
| + | rm -rf ${SCRATCH_DIRECTORY} | ||
| + | | ||
| + | # happy end | ||
| + | exit 0 | ||
| + | |||
| + | Submit the script and after a short while you should see 16 output files in your submit directory. | ||
| + | |||
| + | |||
| + | |||
| + | ===== 3. R example ===== | ||
| + | Running R scripts on psychp01 is very easy. Save the following R script in your home directory (e.g., ''/ | ||
| + | |||
| + | # Create a 2000x2000 matrix with random values | ||
| + | A1 <- matrix(runif(2000*2000), | ||
| + | |||
| + | # Perform 1000 fft operations on it | ||
| + | start_time <- as.numeric(Sys.time()) | ||
| + | for (i in 1:1000) { | ||
| + | A1 <- fft(A1) | ||
| + | } | ||
| + | time1 <- as.numeric(Sys.time()) - start_time | ||
| + | |||
| + | # write result time to file: | ||
| + | cat(paste(' | ||
| + | |||
| + | Save the following job script in your home directory as '' | ||
| + | |||
| + | #!/bin/bash -l | ||
| + | # My job script to run R code | ||
| + | |||
| + | #SBATCH -o ./ | ||
| + | #SBATCH -e ./ | ||
| + | #SBATCH -D ./ | ||
| + | #SBATCH -J run_r_test | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=10 | ||
| + | #SBATCH --mem=6000 | ||
| + | #SBATCH --time=00: | ||
| + | |||
| + | # Use the command Rscript and set the path to the R script to run it. Change / | ||
| + | Rscript / | ||
| + | |||
| + | Now on the command line, change your current directory to the home directory where you saved your '' | ||
| + | |||
| + | sbatch run_r_script.sh | ||
| + | |||
| + | When the job is finished you can display the output: | ||
| + | |||
| + | vim results.txt | ||
| + | |||
| + | |||
| + | ===== 4. Python example ===== | ||
| + | Below is an example of a Python script to be used on the cluster. Save the following Python script in your home directory (e.g., ''/ | ||
| + | |||
| + | import numpy as np | ||
| + | from time import time | ||
| + | | ||
| + | # Create a 2000x2000 matrix with random values | ||
| + | A1 = np.random.rand(2000, | ||
| + | | ||
| + | # Perform 1000 fft operations on it | ||
| + | start_time = time() | ||
| + | for i in range(1000): | ||
| + | A1 = np.fft.fft(A1) | ||
| + | end_time = time() | ||
| + | time1 = end_time - start_time | ||
| + | | ||
| + | # write result time to file: | ||
| + | with open(' | ||
| + | fh.write(f' | ||
| + | |||
| + | The job script below will run your Python script. Save it in your home directory as '' | ||
| + | |||
| + | #!/bin/bash -l | ||
| + | # My job script to run Python code | ||
| + | |||
| + | #SBATCH -o ./ | ||
| + | #SBATCH -e ./ | ||
| + | #SBATCH -D ./ | ||
| + | #SBATCH -J run_test | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=10 | ||
| + | #SBATCH --mem=6000 | ||
| + | #SBATCH --time=00: | ||
| + | | ||
| + | # import below here any python packages that are required for your script | ||
| + | | ||
| + | # use the python command and set the path to your script to run it. Change / | ||
| + | python / | ||
| + | |||
| + | Now on the command line, change your current directory to the home directory where you saved your '' | ||
| + | |||
| + | sbatch run_python_script.sh | ||
| + | |||
| + | When the job is finished you can display the output: | ||
| + | |||
| + | vim results.txt | ||
| + | |||
| + | On page 37 of the pdf version of the manual in the [[cluster-guide|main page]], there is a short video that demonstrates what just described. | ||
| + | |||
| + | |||
| + | ===== 5. MATLAB example ===== | ||
| + | ==== Simple example ==== | ||
| + | Running MATLAB scripts on psychp01 is pretty straightforward. Save the following MATLAB script in your home directory (e.g., ''/ | ||
| + | |||
| + | % Create a 2000x2000 matrix with random values | ||
| + | A1 = rand(2000, | ||
| + | |||
| + | % Perform 1000 fft operations on it | ||
| + | tic; | ||
| + | for i=1:1000 | ||
| + | A1 = fft(A1); | ||
| + | end | ||
| + | time1 = toc; | ||
| + | |||
| + | % write result time to file: | ||
| + | fh = fopen(' | ||
| + | fprintf(fh,' | ||
| + | fclose(fh); | ||
| + | |||
| + | Save the following job script in your home directory as '' | ||
| + | |||
| + | #!/bin/bash -l | ||
| + | # This is a comment for this job script to run the above matlab script | ||
| + | |||
| + | #SBATCH -o ./ | ||
| + | #SBATCH -e ./ | ||
| + | #SBATCH -D ./ | ||
| + | #SBATCH -J run_test_for_matlab_script # this is just a label to name your job. When you run " | ||
| + | # --- resource specification (which resources for how long) --- | ||
| + | #SBATCH --partition=test | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=10 | ||
| + | #SBATCH --mem=6000 | ||
| + | #SBATCH --time=00: | ||
| + | |||
| + | # Use the srun command to run the MATLAB script. Change / | ||
| + | srun matlab -nodisplay -batch / | ||
| + | |||
| + | Now on the command line, change your current directory to the home directory where you saved your '' | ||
| + | |||
| + | sbatch run_matlab_script.sh | ||
| + | |||
| + | When the job is finished you can display the output: | ||
| + | |||
| + | vim results.txt | ||
| + | |||
| + | |||
| + | ==== Example with Parallel Computing ==== | ||
| + | For **parallel computing**, | ||
| + | |||
| + | % Create a 2000x2000 matrix with random values | ||
| + | parpool(10); | ||
| + | A1 = rand(2000, | ||
| + | |||
| + | % Perform 1000 fft operations on it | ||
| + | tic; | ||
| + | parfor i=1:1000 | ||
| + | A1 = fft(A1); | ||
| + | end | ||
| + | time1 = toc; | ||
| + | |||
| + | % write result time to file: | ||
| + | fh = fopen(' | ||
| + | fprintf(fh,' | ||
| + | fclose(fh); | ||
| + | |||
| + | No further changes are required to your job script '' | ||
| + | Now run your job script as before using the '' | ||
| + | |||
| + | sbatch run_matlab_script.sh | ||
| + | |||
| + | |||
| + | ==== Example with GPU Computing ==== | ||
| + | For now, **GPU computing** is not supported on psychp01. However, in general, to run MATLAB processes on a GPU partition, we would need to add another block to our MATLAB script. Save it as '' | ||
| + | |||
| + | % Create a 2000x2000 matrix with random values | ||
| + | A1 = rand(2000, | ||
| + | |||
| + | % Perform 1000 fft operations on it | ||
| + | tic; | ||
| + | for i=1:1000 | ||
| + | A1 = fft(A1); | ||
| + | end | ||
| + | time1 = toc; | ||
| + | |||
| + | % copy the matrix onto the GPU | ||
| + | A2 = gpuArray(A1); | ||
| + | % perform the same 1000 operations | ||
| + | tic; | ||
| + | for i=1:1000 | ||
| + | A2 = fft(A2); | ||
| + | end | ||
| + | time2 = toc; | ||
| + | |||
| + | % write result time to file: | ||
| + | fh = fopen(' | ||
| + | fprintf(fh,' | ||
| + | fclose(fh); | ||
| + | |||
| + | We also have to modify the job script a bit. We will select the GPU partition. Save the new job script as '' | ||
| + | |||
| + | #!/bin/bash -l | ||
| + | # This is a comment for this job script to run the above matlab script on the GPU partition | ||
| + | |||
| + | #SBATCH -o ./ | ||
| + | #SBATCH -e ./ | ||
| + | #SBATCH -D ./ | ||
| + | #SBATCH -J mat_gpu | ||
| + | # --- resource specification (which resources for how long) --- | ||
| + | #SBATCH --partition=gpu | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=10 | ||
| + | #SBATCH --mem=6000 | ||
| + | #SBATCH --time=00: | ||
| + | |||
| + | # --- start from a clean state and load necessary environment modules --- | ||
| + | | ||
| + | # run MATLAB | ||
| + | srun matlab -nodisplay -batch / | ||
| + | |||
| + | And now run it like before: | ||
| + | sbatch run_matlab_gpu_script.sh | ||
| + | |||
| + | |||
| + | [[{: | ||
| + | [[{: | ||
cluster-jobs.1715103344.txt.gz · Last modified: 2024/05/07 17:35 by gabriele