Table of Contents
CLUSTER MANUAL
Transferring files to/from psychp01
What's on this page:
- Transferring data
- Transferring code
- File Transfer Clients
1. Transferring data
You need to upload your data into the folder that was created by the IT for you within the /MRIWork
folder on the cluster. Your folder name will be something like MRIWork#
where #
stands for a number assigned to you. Hence, if your number is “25”, your path name to upload your data on the cluster will be:
/MRIWork/MRIWork25
The most secure way to transfer data is using File Transfer Clients like sftp
and scp
via the command line.
To know more about /MRIWork
, folder structure, data archiving, data backup and data sharing, have a look at the CUBIC wiki.
Please NOTE: theoretically, you can upload your data also into your home directory (see Transferring code below) and save your analysis results there. However, the home directory has limited amount of storage space and is not periodically backed up. Hence, it is advisable that you save your data and analysis results in your /MRIWork/MRIWork#
folder, even if you do not have MRI data and do not think your data occupy too much space. This will avoid problems with storage space and minimize risks of data loss.
2. Transferring code
You can upload your scripts into a folder within your home directory on the cluster. The IT will have probably created a folder in the home directory with the first letter of your name preceding your surname (e.g., gbellucci
). Hence, if your name is Gabriele Bellucci, your path name to upload your data on the cluster will be:
/home/gbellucci
Your folder name in the home directory has the same name as the username you use to access the cluster. This name was provided to you by the IT when you asked for access. Be aware that there might be deviations on how your folder name in the home directory has been created (especially if you were granted access to the cluster before 2024). If you have access to the cluster, just log in using ssh
and type pwd
to see your folder name in the home directory. Something like the line above will pop up on your command line.
The most secure way to transfer code is using File Transfer Clients like sftp
and scp
via the command line. Please see below for how to use these clients and page 20 of the pdf file on the home page for a demonstration video.
3. File Transfer Clients
Primary access to psychp01 is via ssh based tools (on the command line). To upload or download data and code, File Transfer Clients such as scp
and sftp
can be used.
To transfer data to and from psychp01 use the following address:
psychp01.rhul.ac.uk
SFTP
sftp
, which stands for Secure File Transfer Protocol, is an encrypted protocol built into SSH that can implement commands for transferring files between two remote systems over a secure connection. There are many resources on the web on how to use sftp
(e.g., here). Here, example applications to transfer data onto psychp01 will be shown.
First, you need to establish a secure connection with the server. This is very similar to how you would connect with the server using ssh
(see here).
sftp username@psychp01.rhul.ac.uk
Like for the ssh connection, “username” is the username provided to you by the IT when you asked for access to the cluster. Hit enter and you will be required to enter your password. Once you are connected, at the beginning of your command line, you will see that an connection has been established:
sftp>
Now, you can use ftp
commands to (among others) upload, download, remove, and move files. Type help
to check all commands available.
sftp> help
The sftp
connection puts you on the cluster. Here, you can use all common commands you would use on your local machine to get the current directory, change the current directory and so on. If you would like to use the same commands on your local computer, you can do that by adding an “l” in front of the command you want to use. This “l” stands for “local” and tells sftp
to use the command on the local machine as opposed to the remote one.
For instance, when you establish an sftp
connection, you will find yourself in your home directory. Hence, if your home directory path is /home/gbellucci
, when you type pwd
, you will see the second line of the code below appearing:
sftp> pwd Remote working directory: /home/gbellucci
On the contrary, if you type lpwd
, you will see the second line of the code below appearing:
sftp> lpwd Local working directory: /Users/Gab
where /Users/Gab
is my (local) current directory on my computer. Type help
to see the difference in the commands for remote and local implementations.
To download data from the cluster onto your local directory, you need to use the get
command, like this:
sftp> get remote_filename_path local_dirpath
For example, if you have to get a file called results_matrix.mat
from the folder results
in your home directory /home/gbellucci
and download it in your folder project_results
on your local directory /Users/Gab
, you will do:
sftp> get /home/gbellucci/results/results_matrix.mat /Users/Gab/project_results
Alternatively, you can cd
to results
(on the cluster), lcd
to project_results
(on your local machine), and then just type get results_matrix.mat
, like this:
sftp> cd /home/gbellucci/results sftp> lcd /Users/Gab/project_results sftp> get results_matrix.mat
NOTE: If you have folder names that contain spaces, sftp
would fail. For instance, something like that: sftp> lcd /Users/Gab/project results
, (i.e., your results folder named “project result” with a space) would not work!
If you have to download a folder, you will need to use the -r
argument like that:
sftp> get -r remote_dirpath local_dirpath
On the contrary, if you have to upload data from your local machine to the cluster, you will need to use the put command:
sftp> put local_filename_path remote_dirpath
In the video on page 20 of the pdf file on the main page, you will see how to transfer a Python code and a bash file to psychp01 using put
.
SCP
scp
(secure copy) is a command-line utility that allows you to securely copy files and directories between two locations. scp
use requires a password, and both the files and password are encrypted so as to securely transfer data from one location to the other. scp
uses the ssh
protocol for both authentication and encryption. See here for more information.
When transferring data, scp
takes on two main arguments:
scp source destination
The first argument is the address of the source file to transfer, the second the address where it has to be transferred to. A good way to memorize it is to think that scp
needs to know what
to send where to
.
For example, to transfer files from the remote cluster (source) to your local machine (destination), use:
scp username@address_name:pathname_remote_src pathname_local_dest
To transfer files from your local machine (source) to the remote cluster (destination), use:
scp pathname_local_src username@address_name:pathname_remote_dest
Suppose my username (the one given to you by the IT when you got access to the cluster) is gbellucci
, the filename of the file (e.g., a MATLAB file .m) I need to transfer is best_analysis.m
, the pathname to that file on my local computer is /Users/Gab
, and the pathname of the remote folder on the cluster I need to send my file to is /home/gbellucci/coolest_project
. The line I need on terminal to transfer my file will be:
scp /Users/Gab/best_analysis.m gbellucci@psychp01.rhul.ac.uk:/home/gbellucci/coolest_project
Remember, your data will not be in your folder in the home directory but in your MRIWork#
folder in /MRIWork
. Hence, to upload a data file (say, data.mat
), you’d need to type:
scp /Users/Gab/data.mat gbellucci@psychp01.rhul.ac.uk:/MRIWork/MRIWork25/data_coolest_project
If you have to upload or download multiple files or a file that contains multiple file (e.g., a folder), now you’ll have a directory path (and not a file path), and you can use the -r
argument to reiterate the sending over all files like that:
scp -r dirpath_local_src username@psychp01.rhul.ac.uk:dirpath_remote_dest
For example, if your directory path is to the folder called analyses_folder
, you can type the following:
scp -r /Users/Gab/analyses_folder gbellucci@psychp01.rhul.ac.uk:/home/gbellucci/coolest_project
If you have a whole data folder to transfer, you will upload it into your /MRIWork/MRIWork#
folder like that:
scp -r /Users/Gab/data gbellucci@psychp01.rhul.ac.uk:/MRIWork/MRIWork25/data_coolest_project
You would swap the two arguments if the folder is on the cluster, and you would need to get it onto your local computer:
scp -r username@psychp01.rhul.ac.uk:dirpath_remote_src dirpath_local_dest
For example, if your directory path is to the folder on the cluster called results_folder
that you need to download into your analyses_folder
on your local computer, you can type the following:
scp -r gbellucci@psychp01.rhul.ac.uk:/home/gbellucci/results_folder /Users/Gab/analyses_folder
RSYNC
rsync
, which stands for remote sync, is a remote and local file synchronization tool. It uses an algorithm to minimize the amount of data copied by only moving the portions of files that have changed. Please see here for more information.
SSHFS
sshfs
allows you to mount the file system on your local machine. See here for more details. Basic usage for Linux users:
sshfs username@psychp01.rhul.ac.uk:dirpath mountpoint [options]
FileZilla
FileZilla
is a free and open-source File Transfer Protocol (FTP) client that supports ftp
, ftps
and sftp
protocols. It allows the implementation of the above command-line programs through a graphical interface. Please have a look at this step-by-step guide on how to use FileZilla.
ExpanDrive
An alternative to File Transfer Clients like the one mentioned above is ExpanDrive. ExpanDrive is a network filesystem client for MacOS, Microsoft Windows and Linux that facilitates mapping of local volume to many different types of cloud storage. It is different from the above File Transfer Clients because it is integrated into all applications on the operating system and does not require a file to be downloaded onto the local machine. On the contrary, remote files can be accessed, managed and changed as if they were stored locally.
The downside is that it is a non-free commercial tool.
Discussion