cluster-jobs
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
cluster-jobs [2024/05/07 17:35] – gabriele | cluster-jobs [2024/07/16 10:47] (current) – gabriele | ||
---|---|---|---|
Line 42: | Line 42: | ||
===== 2. Job Blueprint ===== | ===== 2. Job Blueprint ===== | ||
+ | You can save the following example to a file (e.g., '' | ||
+ | sbatch run.sh | ||
+ | |||
+ | Please note that all values that you define with '' | ||
+ | |||
+ | |||
+ | #!/bin/bash -l | ||
+ | | ||
+ | ############################## | ||
+ | # Job blueprint | ||
+ | ############################## | ||
+ | | ||
+ | # Give your job a name, so you can recognize it in the queue overview | ||
+ | #SBATCH --job-name=example | ||
+ | | ||
+ | # Define, how many nodes you need. For now, psychp01 has only 1 node with a 100 CPUs. | ||
+ | | ||
+ | #SBATCH --nodes=1 | ||
+ | | ||
+ | # By default, SLURM will allocate 1 CPU to your tasks. You can define the number of CPUs you would need for your job as follows: | ||
+ | #SBATCH --cpus-per-task=20 | ||
+ | | ||
+ | # You can further define the number of tasks with --ntasks-per-* | ||
+ | # See "man sbatch" | ||
+ | # Define, how long the job will run in real time. This is a hard cap meaning | ||
+ | # that if the job runs longer than what is written here, it will be | ||
+ | # force-stopped by the server. If you make the expected time too long, it will | ||
+ | # take longer for the job to start. Here, we say the job will take 5 minutes. | ||
+ | # d-hh:mm:ss | ||
+ | | ||
+ | #SBATCH --time=0-00: | ||
+ | |||
+ | # Define the partition on which the job shall run. May be omitted. | ||
+ | #SBATCH --partition=test | ||
+ | | ||
+ | # How much memory you need. | ||
+ | # --mem will define memory per node and | ||
+ | # --mem-per-cpu will define memory per CPU/core. Choose one of those. | ||
+ | #SBATCH --mem-per-cpu=1500MB | ||
+ | ##SBATCH --mem=5GB | ||
+ | |||
+ | # Turn on mail notification. There are many possible self-explaining values: | ||
+ | # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) | ||
+ | # For more values, check "man sbatch" | ||
+ | #SBATCH --mail-type=END, | ||
+ | |||
+ | # You may not place any commands before the last SBATCH directive | ||
+ | |||
+ | # Define and create a unique scratch directory for this job | ||
+ | SCRATCH_DIRECTORY=/ | ||
+ | mkdir -p ${SCRATCH_DIRECTORY} | ||
+ | cd ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | # You can copy everything you need to the scratch directory | ||
+ | # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from | ||
+ | cp ${SLURM_SUBMIT_DIR}/ | ||
+ | |||
+ | # This is where the actual work is done. In this case, the script only waits. | ||
+ | # The time command is optional, but it may give you a hint on how long the | ||
+ | # command worked | ||
+ | time sleep 10 | ||
+ | #sleep 10 | ||
+ | |||
+ | # After the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
+ | cp ${SCRATCH_DIRECTORY}/ | ||
+ | |||
+ | # In addition to the copied files, you will also find a file called | ||
+ | # slurm-1234.out in the submit directory. This file will contain all output that | ||
+ | # was produced during runtime, i.e. stdout and stderr. | ||
+ | |||
+ | # After everything is saved to the home directory, delete the work directory to | ||
+ | # save space on /home | ||
+ | cd ${SLURM_SUBMIT_DIR} | ||
+ | rm -rf ${SCRATCH_DIRECTORY} | ||
+ | |||
+ | # Finish the script | ||
+ | exit 0 | ||
+ | |||
+ | ==== Running many sequential jobs in parallel using job arrays ==== | ||
+ | In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example, but this does not matter for the job arrays: | ||
+ | |||
+ | # | ||
+ | | ||
+ | import time | ||
+ | | ||
+ | print(' | ||
+ | | ||
+ | print(' | ||
+ | time.sleep(10) | ||
+ | | ||
+ | print(' | ||
+ | | ||
+ | Save this to a file called “test.py” and try it out: | ||
+ | |||
+ | $ python test.py | ||
+ | | ||
+ | start at 15:23:48 | ||
+ | sleep for 10 seconds ... | ||
+ | stop at 15:23:58 | ||
+ | | ||
+ | Good. Now we would like to run this script 16 times at the same time. For this we use the following script: | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | | ||
+ | ##################### | ||
+ | # job-array example # | ||
+ | ##################### | ||
+ | | ||
+ | #SBATCH --job-name=example | ||
+ | | ||
+ | # 16 jobs will run in this array at the same time | ||
+ | #SBATCH --array=1-16 | ||
+ | | ||
+ | # run for five minutes | ||
+ | # d-hh:mm:ss | ||
+ | #SBATCH --time=0-00: | ||
+ | | ||
+ | # 500MB memory per core | ||
+ | # this is a hard limit | ||
+ | #SBATCH --mem-per-cpu=500MB | ||
+ | | ||
+ | # you may not place bash commands before the last SBATCH directive | ||
+ | | ||
+ | # define and create a unique scratch directory | ||
+ | SCRATCH_DIRECTORY=/ | ||
+ | mkdir -p ${SCRATCH_DIRECTORY} | ||
+ | cd ${SCRATCH_DIRECTORY} | ||
+ | | ||
+ | cp ${SLURM_SUBMIT_DIR}/ | ||
+ | | ||
+ | # each job will see a different ${SLURM_ARRAY_TASK_ID} | ||
+ | echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID} | ||
+ | python test.py > output_${SLURM_ARRAY_TASK_ID}.txt | ||
+ | | ||
+ | # after the job is done we copy our output back to $SLURM_SUBMIT_DIR | ||
+ | cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR} | ||
+ | | ||
+ | # we step out of the scratch directory and remove it | ||
+ | cd ${SLURM_SUBMIT_DIR} | ||
+ | rm -rf ${SCRATCH_DIRECTORY} | ||
+ | | ||
+ | # happy end | ||
+ | exit 0 | ||
+ | |||
+ | Submit the script and after a short while you should see 16 output files in your submit directory. | ||
+ | |||
+ | |||
+ | |||
+ | ===== 3. R example ===== | ||
+ | Running R scripts on psychp01 is very easy. Save the following R script in your home directory (e.g., ''/ | ||
+ | |||
+ | # Create a 2000x2000 matrix with random values | ||
+ | A1 <- matrix(runif(2000*2000), | ||
+ | |||
+ | # Perform 1000 fft operations on it | ||
+ | start_time <- as.numeric(Sys.time()) | ||
+ | for (i in 1:1000) { | ||
+ | A1 <- fft(A1) | ||
+ | } | ||
+ | time1 <- as.numeric(Sys.time()) - start_time | ||
+ | |||
+ | # write result time to file: | ||
+ | cat(paste(' | ||
+ | |||
+ | Save the following job script in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # My job script to run R code | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_r_test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # Use the command Rscript and set the path to the R script to run it. Change / | ||
+ | Rscript / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_r_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | |||
+ | ===== 4. Python example ===== | ||
+ | Below is an example of a Python script to be used on the cluster. Save the following Python script in your home directory (e.g., ''/ | ||
+ | |||
+ | import numpy as np | ||
+ | from time import time | ||
+ | | ||
+ | # Create a 2000x2000 matrix with random values | ||
+ | A1 = np.random.rand(2000, | ||
+ | | ||
+ | # Perform 1000 fft operations on it | ||
+ | start_time = time() | ||
+ | for i in range(1000): | ||
+ | A1 = np.fft.fft(A1) | ||
+ | end_time = time() | ||
+ | time1 = end_time - start_time | ||
+ | | ||
+ | # write result time to file: | ||
+ | with open(' | ||
+ | fh.write(f' | ||
+ | |||
+ | The job script below will run your Python script. Save it in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # My job script to run Python code | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | | ||
+ | # import below here any python packages that are required for your script | ||
+ | | ||
+ | # use the python command and set the path to your script to run it. Change / | ||
+ | python / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_python_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | On page 37 of the pdf version of the manual in the [[cluster-guide|main page]], there is a short video that demonstrates what just described. | ||
+ | |||
+ | |||
+ | ===== 5. MATLAB example ===== | ||
+ | ==== Simple example ==== | ||
+ | Running MATLAB scripts on psychp01 is pretty straightforward. Save the following MATLAB script in your home directory (e.g., ''/ | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | Save the following job script in your home directory as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # This is a comment for this job script to run the above matlab script | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J run_test_for_matlab_script # this is just a label to name your job. When you run " | ||
+ | # --- resource specification (which resources for how long) --- | ||
+ | #SBATCH --partition=test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # Use the srun command to run the MATLAB script. Change / | ||
+ | srun matlab -nodisplay -batch / | ||
+ | |||
+ | Now on the command line, change your current directory to the home directory where you saved your '' | ||
+ | |||
+ | sbatch run_matlab_script.sh | ||
+ | |||
+ | When the job is finished you can display the output: | ||
+ | |||
+ | vim results.txt | ||
+ | |||
+ | |||
+ | ==== Example with Parallel Computing ==== | ||
+ | For **parallel computing**, | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | parpool(10); | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | parfor i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | No further changes are required to your job script '' | ||
+ | Now run your job script as before using the '' | ||
+ | |||
+ | sbatch run_matlab_script.sh | ||
+ | |||
+ | |||
+ | ==== Example with GPU Computing ==== | ||
+ | For now, **GPU computing** is not supported on psychp01. However, in general, to run MATLAB processes on a GPU partition, we would need to add another block to our MATLAB script. Save it as '' | ||
+ | |||
+ | % Create a 2000x2000 matrix with random values | ||
+ | A1 = rand(2000, | ||
+ | |||
+ | % Perform 1000 fft operations on it | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A1 = fft(A1); | ||
+ | end | ||
+ | time1 = toc; | ||
+ | |||
+ | % copy the matrix onto the GPU | ||
+ | A2 = gpuArray(A1); | ||
+ | % perform the same 1000 operations | ||
+ | tic; | ||
+ | for i=1:1000 | ||
+ | A2 = fft(A2); | ||
+ | end | ||
+ | time2 = toc; | ||
+ | |||
+ | % write result time to file: | ||
+ | fh = fopen(' | ||
+ | fprintf(fh,' | ||
+ | fclose(fh); | ||
+ | |||
+ | We also have to modify the job script a bit. We will select the GPU partition. Save the new job script as '' | ||
+ | |||
+ | #!/bin/bash -l | ||
+ | # This is a comment for this job script to run the above matlab script on the GPU partition | ||
+ | |||
+ | #SBATCH -o ./ | ||
+ | #SBATCH -e ./ | ||
+ | #SBATCH -D ./ | ||
+ | #SBATCH -J mat_gpu | ||
+ | # --- resource specification (which resources for how long) --- | ||
+ | #SBATCH --partition=gpu | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=10 | ||
+ | #SBATCH --mem=6000 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | # --- start from a clean state and load necessary environment modules --- | ||
+ | | ||
+ | # run MATLAB | ||
+ | srun matlab -nodisplay -batch / | ||
+ | |||
+ | And now run it like before: | ||
+ | sbatch run_matlab_gpu_script.sh | ||
+ | |||
+ | |||
+ | [[{: | ||
+ | [[{: | ||
cluster-jobs.1715103344.txt.gz · Last modified: 2024/05/07 17:35 by gabriele