MATLAB Parallel Server in Red Cloud
What You Need to Know About This Service
Currently the matlab-2024a image will only work with 20 vcpus (CAC is investigating with Mathworks on this)
MATLAB Parallel Server must be used in conjunction with the Parallel Computing Toolbox (PCT) in your MATLAB client. Accordingly, it is necessary to have PCT and to be familiar with it. If you are a member of the Cornell community, you probably already have PCT, as it is included in Cornell's normal site license. Gaining PCT knowledge is to your advantage, because it is the best way for MATLAB to make effective use of multiple CPUs on any system—even the multiple cores in your laptop. One starting point for learning PCT is CAC's tutorial. Extending PCT's basic concepts to Red Cloud should be natural and easy.
Red Cloud offers the following advantages for your PCT computations.
- A parallel server instance will have as many workers as cores (current maximum, 64 cores).
- Licenses for the workers are included in your Red Cloud subscription.
- Workers have exclusive access to their allocated cores and memory.
- Data are readily transferred through the campus network at no extra cost.
Assumptions
- You are a member of an academic community with access to an academic MATLAB R2024a client.
- Your group has started a CAC project giving you access to Red Cloud, and you are familiar with the Horizon Web Console.
- The MATLAB R2024a client, including the Parallel Computing Toolbox (PCT), is installed on your local workstation.
- First time Red Cloud login has been completed.
- Create a Red Cloud key pair has been completed.
Create a Security Group
In the Red Cloud Horizon Web Console, you will need to create a Security Group for your MATLAB instances. Its purpose is to open up certain TCP ports so your client has proper access to the MATLAB Job Scheduler (MJS) in your MATLAB Parallel Server(s).
Choose "Networks > Security Groups", then "Create Security Group". Then add the Rules below, following the instructions on the adding Security groups rule page. Where a port range is required, be sure to use the "Open Port" drop-down menu to change "Port" to "Port Range", and keep "Direction" as "Ingress" and "Remote" as "CIDR".
Rule | Port Range | CIDR |
---|---|---|
Custom TCP Rule | 27350 - 28000 | <your client IP address>/32 |
SSH (for terminal access, file transfers) | N/A (will be port 22) | <your client IP address>/32 |
You must click "Add" at the bottom of the pop-up after making each new entry. You must also click "Create Security Group" at the end, after adding all rules.
As indicated in the table, the most secure CIDR of all is
As an effort to make this process easier, CAC has created a security group called campus-only-ssh
in your project, which is present on the Security Group page on the Horizon Web Console. In this group, the rules required to SSH into your MATLAB instances are already set up. However, you will still need to create a new security group for Custom TCP Rule with port range 27350 - 28000. On the Security Group page, create a security group called campus-only-mjs
and add "Custom TCP Rule" with a port range of 27350 - 28000 for each CIDR you want to include, which, in this case, is all possible Cornell IP addresses. Here is the complete list of CIDR entries:
- 10.0.0.0/8 <-- this one is sufficient for on-campus eduroam Wi-Fi and CU VPN
- 128.84.0.0/16
- 128.253.0.0/16
- 132.236.0.0/16
- 192.35.82.0/24
- 192.122.235.0/24
- 192.122.236.0/24
If all of the above are included, then access is permitted from anywhere on Cornell network (reference 1, reference 2, reference 3). However, you should be aware that in this case, any Cornell user who has knowledge of the IP address(es) of your Red Cloud instance(s) and the MJS (MATLAB Job Scheduler) name (see below) will be able to submit MATLAB PCT jobs to your instance(s).
Previously, these same port ranges had to be open on the client side, too. However, that requirement seems to have been relaxed, according to info from MathWorks.
Start the MATLAB Cluster
Currently only one-node clusters are supported. However, a single node can support many workers, up to the total number of cores that you assigned to your instance. (Multi-node clusters are also possible, and they could be supported, if a demand for them arises.)
In the Horizon Web console for Red Cloud:
- From the "Compute" > "Instances" tab, click on the "Launch Instance" button.
- Give your instance an easy-to-remember name; click "Next".
- For the Source, select the MATLAB image
matlab-2024a
:- Scroll to
matlab-2024a
and click its up-arrow - Increase "Volume Size (GB)" if you think you will be uploading large data files to the instance.
- Say "Yes" to "Delete Volume on Instance Delete" if you don't want a long-term backup of any customizations or data.
- When you're done making your selections, click "Next".
- Scroll to
- For the Flavor, select the desired instance type. One MATLAB worker will be started per CPU in the instance. Click "Next".
- In Networks, select the "public" network by clicking its up-arrow, and click "Next".
- No Network Ports need to be added, so click "Next".
- In Security Groups, find the MATLAB security group that you created (or the combination of
campus-only-mjs
andcampus-only-ssh
). Click its up-arrow, then "Next". - Key Pair is the only other setting that is recommended.
- Assigning a key pair lets you access your instance via ssh, which can be handy for troubleshooting.
- It also lets you move files back and forth to your instance using sftp or scp.
- Click on "Launch Instance".
- After the instance is running, the MATLAB cluster should be reachable at the public IP address of the instance within a minute or two.
Note: after you finish using your MATLAB cluster, remember to Shelve or Delete the instance to stop charges against your Red Cloud subscription. You may read more about instances on the Compute page.
Connect to Your MATLAB Cluster
Perform the following steps in your local MATLAB client:
- Open Parallel > Create and Manage Clusters
- Choose Add Cluster Profile > MATLAB Job Scheduler (MJS).
- In the warning dialog that comes up, click "OK".
- In the lower right corner of the scheduler, click "Edit".
- Enter values for at least the top three values in Properties:
- Description: Red Cloud or another name of your choosing.
- Host: <Your Red Cloud Instance's public IP Address> (e.g. 128.84.40.NNN)
- MJSName: EC2_job_manager
- Optional: if you want each worker to have more than the standard allotment of memory or disk per core, scroll down and set NumWorkersRange to have a maximum value which is less than the number of cores in your cluster. (In that case, you may also choose to set NumThreads > 1.)
- Click "Done". Click "Rename" in the toolbar to give a new scheduler a better name. This name will appear in your MATLAB client.
- Click "Validate" in the toolbar to ensure the scheduler is configured properly. As each stage completes, a green circle with a check mark in it should be displayed.
Possible validation issues
Validation may fail for a number of reasons. Here is a short list of things to try if it does:
- If the first validation stage fails, it is most likely because nothing in your list of Security Groups is allowing access from your client's IP address.
- Log into the web console, go to "Compute" > "Instances", and make sure your instance is in the Active and Running states.
- Click on the name of your instance and check to see if the Security Groups include the appropriate rules, as indicated above.
- If not, go to Network > Security Groups, choose "Manage Rules" for a Security Group that pertains to your instance, and add one or more of the above rules, based on your client's IP address.
- Wait a minute until the new rules take effect in your running instance (you may also want to restart your MATLAB client).
- Run the validation test again to ensure your MATLAB cluster passes all the stages.
- If the client is able to connect to the cluster, but the second stage of validation fails, check the results ("Show Details").
- If you see an error message saying, "This MATLAB Job Scheduler does not support running jobs for MATLAB release...", this just means that the workers are not yet ready.
- Wait a few more minutes and re-try validation.
- If you still cannot pass validation, and error messages such as "Could not contact an MJS lookup service on host..." persist, it means your network connection is being blocked.
- Double-check your Security Group and firewall settings as described above.
- Then contact your departmental IT support, as there may be port blocking in effect on departmental routers. (Campus Wi-Fi connections should be sufficiently open.)
Test Your MATLAB Cluster
Finally, you can run this quick "Hello, world" test from the command line in your client. In the first line, supply the name of your scheduler. If you did not rename the scheduler when you created it, its name appears in the Cluster Profiles Manager dialog (the default is 'MJSProfile1').
pool = parpool('Red Cloud')
spmd; fprintf('Hello from lab %d of %d', labindex, numlabs); end
delete(pool)
The number of replies should equal the number of workers ("labs"), which by default is equal to the number of cores in your instance. Note that the labs are numbered starting from 1, not 0.
Upload Large Files to Your MATLAB Cluster
MATLAB PCT provides built-in mechanisms for uploading data files so they can be accessed by your MATLAB Parallel Server's workers. The primary ones are the AttachedFiles keyword in functions like parpool() and createJob(), and the addAttachedFiles() function for an existing parallel pool. Unfortunately, these mechanisms are not suitable for large files, because they generate a separate copy of the file for each worker. This is inefficient and unnecessary in Red Cloud, where in most cases, all the workers share a file system on the same instance. Here we present two alternatives that should help you to make data files available to your MATLAB Parallel Server's workers.
Prerequisites:
- You must have created a Red Cloud key pair before starting your instance, and you must have specified this key pair when the instance was launched.
- You should also be familiar with how public key authentication works in Linux.
- Finally, in order to connect to the instance using ssh, sftp, or scp, the Red Cloud security group should include a rule to allow incoming connections to port 22 from the address of the computer that is trying to connect.
Alternative 1: Upload to /tmp on your instance
This method is probably the simpler of the two. Any files you upload will persist on your instance until you terminate it. The only tricky part is knowing how to authenticate with the key pair when you connect to your instance with a file transfer client. It is straightforward to do this type of authentication from the command line in Linux or MacOS, if you use either sftp or scp:
sftp -i ~/.ssh/myCACid-key.pem rocky@128.84.40.NNN
sftp> put file.txt /tmp
scp -i ~/.ssh/myCACid-key.pem file.txt rocky@128.84.40.NNN:/tmp
The above examples assume you have stored the the key pair (or at least the private-key portion of it) in your local .ssh folder in Linux or MacOS. If sftp or scp does not accept the -i option in your OS, you can try using ssh-agent and ssh-add to make the private key available to these commands.
In Windows, if you happen to use the PuTTY client, it comes with a psftp client that you might want to try. (First you'll have to use the PuTTYgen application to import your .pem file and save it as a .ppk private key.) In the Windows cmd environment, the syntax for psftp would look something like this:
C:\Users\srl6>"C:\Program Files (x86)\PuTTY\psftp.exe" -i C:\Users\srl6\SSHkeys\srl6-key.ppk rocky@128.84.40.NNN
For exceptionally large files, you can make use of your instance's ephemeral storage, which is located at /dev/vdb. You will need to format it and create a mount point for it. The volume persists only as long as the instance is running, but it is large (100 GB minimum) and fast (local RAID 5).
Alternative 2: Upload to your CAC home folder
Your Red Cloud subscription comes with 50GB of storage, part of which can be used to store data files in your home folder at CAC. On Cornell networks, your home folder is available as a network share located at //linuxlogin.cac.cornell.edu/myCACid, where myCACid is your CAC username. (More storage can be added to your subscription if desired.) To upload files to your home folder, use your favorite file transfer client such as WinSCP, or a command-line utility such as sftp or scp. Point your file transfer client or utility to the above address, making sure to provide your CAC username and password.
But this CAC home folder is not automatically available to your Red Cloud instances. The preferred way to make it accessible is to mount the network share using Samba/CIFS. First log in to your instance as rocky, which you do with your private key:
ssh -i ~/.ssh/myCACid-key.pem rocky@128.84.40.NNN
The above example again assumes you have stored the the key pair in your .ssh folder in Linux or MacOS. In Windows, you may wish to use PuTTY as the ssh client (in which case you will have to generate a .ppk file from the .pem file using PuTTYgen.) After you are logged in, issue the following commands:
sudo dnf install cifs-utils
sudo mkdir -p /home/<myCACid>/CAC
sudo mount -t cifs //storage03.cac.cornell.edu/<myCACid> /home/<myCACid>/CAC -o user=<myCACid>,domain=CTC_ITH,vers=2.1
<supply your CAC password when prompted>
Note that there should not be spaces after commas.
At this point all files in your home folder should be available to all MATLAB workers, via a path starting with /home/<myCACid>/CAC
.
If you stop this Red Cloud instance and start it back up, the mount command will have to be executed anew. To make the Samba mount automatic during restarts, add an appropriate entry to /etc/fstab in the instance.
Fast examples of file I/O
Example 1. Let's say you have copied a file, file.txt, to /tmp on your instance by using scp as described in Alternative 1 above. Let's also suppose this file contains 3 lines (or any arbitrary number) with 1 integer per line. If you'd like to have all your MATLAB workers read this file into vector b and print b to the MATLAB console, you can do the following:
spmd; fid=fopen('/tmp/file.txt'); b=fscanf(fid,'%d'); disp(b); end
Vector b is now available in the workspace of all the workers, where it can be used for further parallel computations. Note: from your MATLAB client, you can also use spmd in combination with system(), pwd, etc., in order to explore the environment of your MATLAB workers in Red Cloud. (Or you can just use ssh to take a look around.)
Example 2. Now let's say file.txt is located in your CAC home folder, which you have mounted on your instance as shown in Alternative 2 above. If the following MATLAB function is assigned to a task in a parallel job, then a parallel worker in Red Cloud will read file.txt and return its contents:
function b = echofile()
fid=fopen('/home/<myCACid>/CAC/file.txt');
b=fscanf(fid,'%d');
end
Assuming the above function is saved into a local file named echofile.m, you can enter following commands in your MATLAB client to run echofile() on your cluster in Red Cloud, then fetch and display the contents of file.txt:
clust = parcluster('MJSProfile1') % replace with the name of your schedular
job = createJob(clust)
task = createTask(job,@echofile,1,{})
submit(job)
wait(job)
bvals = fetchOutputs(job)
bvals{1}
Again, the echofile() function is just one of many ways that you can imagine interacting with the files in your CAC home folder, using either MATLAB built-in commands or shell commands invoked through system().
Manually check and restart your cluster
The easiest way to correct an unresponsive cluster in Red Cloud is to do a soft or hard reboot of the instance from the web console. Accordingly, the following manual steps should rarely be necessary. They are recorded here just in case they are helpful in troubleshooting. First, note the IP address of your instance, log in via ssh, and change to the directory with the command-line tools:
For R2024a, the above lines look slightly different:
ssh -i ~/.ssh/myCACid-key.pem rocky@128.84.40.NNN
cd /opt/matlab/R2024a/toolbox/parallel/bin
Here are the commands to check, stop, and start the MJS service, scheduler, and workers in Red Cloud:
./nodestatus
sudo ./mjs stop
sudo ./mjs start
Historical footnote: in releases prior to R2019a, the MATLAB Job Server (MJS) used to be called the MATLAB Distributed Computing Server or Engine (MDCS or MDCE), and the above MJS commands were shell scripts with different names.