CS 470

JMU CS 470 Cluster

Hardware

The CS 470 cluster is located in the Frye Building data center with the following hardware:

9x Dell PowerEdge R6525 w/ 2x AMD EPYC 7252 (16C, 3.1 Ghz, HT) 64 GB – compute nodes
8x Dell PowerEdge R6525 w/ 2x AMD EPYC 7252 (16C, 3.1 Ghz, HT) 64 GB and NVIDIA A2 GPU – compute nodes
Dell PowerEdge R6525 w/ 2x AMD EPYC 7252 (16C, 3.1 Ghz, HT) 64 GB – login node
Dell PowerEdge R730 w/ Xeon E5-2640v3 (16C, 2.6Ghz, HT) 32 GB – NFS server
(in above) 8x 1.2TB 10K SAS HDD w/ RAID - storage
Dell N3024 Switch 24x1GbE, 2xCombo, 2x10GbE SFP+

Software

All of the newer nodes are running RHEL8 with Slurm 20.11 for job management. An environment module is available for OpenMPI. Run module avail to see all available modules, and you can find additional software available in the /shared folder. In particular, you will find several useful utilities in /shared/cs470/bin, and I recommend either adding that folder to your PATH environment variable or making symlinks to a folder that is.

Several command-line text editors are installed by default, including nano, vim, and emacs.

If you need software that is not already installed or available via module, it is recommended that you build it from source in your home directory. Check the documentation for the software for instructions on how to do that. If you run into issues or your software is not available in source form, please email the system admin Pete Morris (morrispj) or the faculty contact Mike Lam (lam2mo) to request assistance.

On-campus Access

The login node of the newer cluster is accessible via SSH as login02.cluster.cs.jmu.edu from the campus network.

It is recommended that you set up public/private key SSH access from your most frequent point of access machines (e.g., your personal laptop). To do this, first generate a public/private keypair from a terminal if you have never done so on that machine:

ssh-keygen -t rsa

If prompted, accept the default location and passphrase options by pressing enter twice. Then, copy the public key to the login node using one of the following commands based on your machine's operating system (run in a terminal open in your home folder):

(on Linux or macOS)
  ssh-copy-id <eid>@login02.cluster.cs.jmu.edu

(on Windows)
  type .ssh\id_rsa.pub | ssh <eid>@login02.cluster.cs.jmu.edu "cat >> .ssh/authorized_keys"

Now you won't need to enter your password every time you log in from that machine. Here is a slightly longer tutorial if you'd like to learn a bit more about this process.

It is also recommended that you edit your ~/.ssh/config file to add an SSH alias. Here is an example entry:

Host cluster
    HostName login02.cluster.cs.jmu.edu
    User <eid>

Now you can log into the cluster from your machine simply by typing this command:

ssh cluster

The firewall settings for our data center eventually "time out" idle SSH connections. To prevent this, you can add the following to your ~/.ssh/config:

TCPKeepAlive yes
ServerAliveInterval 15

Off-campus Access

If you are off-campus, you will need to proxy your SSH connection through an on-campus point of access (for CS students, this will probably be stu). To transparently proxy ssh sessions through stu, you can use the "-J" option if it is available:

ssh -J <eid>@stu.cs.jmu.edu <eid>@login02.cluster.cs.jmu.edu

Obviously, it is also recommended that you set up your ~/.ssh/config on your home machine so that you don't have to type all that every time. Assuming your SSH client supports it, you can even do this transparently by adding "ProxyJump stu" to the ~/.ssh/config configuration for the cluster host. Here is an example full SSH config file:

Host stu
    HostName stu.cs.jmu.edu
    User <eid>

Host cluster
    HostName login02.cluster.cs.jmu.edu
    User <eid>
    ProxyJump stu

Properly configured, you should be able to log into the cluster from off-campus very easily and without having to enter your password with the following command in a terminal:

ssh cluster

In addition, with all of the above properly configured the VS Code Remote SSH extension should allow you to write programs for CS 470 using a graphical IDE on your personal computer (of course you can always just use a command-line text editor on the login node as well -- see the Software section above).

For more information on proxies and jump hosts, see this Wikibook page.

If you are on Windows, you can also use PuTTY and WinSCP, both of which can be configured with public/private key access (the keys generated above will need to be converted to a .ppk file first, with WinSCP does automatically) and transparent proxying through stu. Other popular Windows SSH/SCP clients include Bitvise and MobaXterm.

Home Directories

If you are a student in CS 470, you should have an account already on the cluster, with a 250MB disk quota in your home directory (/nfs/home/[eid]). To check your disk usage, use the following command:

quota -s

If you need more space temporarily, use your designated scratch space (/scratch/[eid]). CAUTION: The scratch storage space may be wiped between semesters! If you need more permanent space, please contact your instructor or the cluster admin.

You can connect directly to your cluster home directory or scratch directory from a Linux lab machine:

Open the file manager and select File -> Connect to server.

Enter the following settings:

Server:   login02.cluster.cs.jmu.edu
Type:     ssh
Folder:   /scratch/<eid> or /nfs/home/<eid>
Username: <eid>
Password: <eid password>

Transferring Files

If you need to transfer files back and forth between the cluster and another Unix-based machine (e.g., running Linux or macOS), you can use the scp command (here is a tutorial). If you are off campus, use the -o option to use stu as your jump host (e.g., -o 'ProxyJump stu.cs.jmu.edu' (and you should also consider adding stu to your SSH config as described above so that you can shorten the host name).

If you would prefer to use a graphical interface, I recommend FileZilla on Linux, CyberDuck on macOS, and WinSCP on Windows.

For a more seamless experience, you can also mount the remote filesystem locally using SSH. If you are doing this from off campus, use the following option to sshfs to jump through stu: -o ssh_command="ssh -J <eid>@stu.cs.jmu.edu"

Here is a script that you may find helpful: mount_cluster.sh

Submitting Interactive Jobs

You may use the login node to compile your programs and perform other incidental tasks. YOU SHOULD NOT EXECUTE HEAVY COMPUTATION ON THE LOGIN NODE! To properly run compute jobs, you must submit them using Slurm. You can find various Slurm tutorials on their website.

To run simple jobs interactively, use the srun command:

srun [Slurm options] /path/to/program [program options]

The most important Slurm options are the number of processes/tasks (-n) and the number of allocated nodes (-N). If not specified, the number of nodes will be set to the minimum number necessary to satisfy the process requirement.

The cluster has seventeen compute nodes, each of which has two eight-core AMD processors. Hyperthreading is enabled on the hardware but disabled in Slurm, so the maximum number of processes per node according to Slurm is sixteen. This minimizes unpredictable performance artifacts due to hyperthreading.

Here are some examples:

srun -n 4  hostname                         # 4  processes (single node)
srun -n 32 hostname                         # 32 processes (requires two nodes)
srun -N 4  hostname                         # 4  processes (4 nodes)
srun -N 4 -n 32 hostname                    # 32 processes (across 4 nodes)

Here are some examples of running an MPI program:

srun -n 1 /shared/cs470/mpi-hello/hello
srun -n 16 /shared/cs470/mpi-hello/hello
srun -n 32 /shared/cs470/mpi-hello/hello

If you'd like to open an interactive shell on a compute node for debugging purposes, you can do so using the following command (switch out "bash" if you prefer a different shell):

srun --pty /usr/bin/bash -i

Eight of the nodes have an NVIDIA A2 GPU suitable for running CUDA code. If you wish to take advantage of the GPUs, you must make sure your job is allocated to these nodes by adding --gres=gpu to the command line when you launch your job.

WARNING: The version of OpenMPI (the default MPI package) on our cluster does NOT have full support for multithreading. Thus, you must use MPICH for multithreaded projects. Run the following command to enable MPICH:

module load mpi/mpich-4.2.0-x86_64

You'll also need to use salloc instead of srun, and explicitly include the call to mpirun. Here's an example (note the use of "-Q" to silence the job allocation output from salloc:

$ salloc -Q -n 4 mpirun ./your_mpi_program

Submitting Batch Jobs

For longer or more complex jobs, you'll want to run them in batch mode so that you can do other things (or even log out) while your job runs. To run in batch mode, you must prepare a job submission script. This also has the added benefit that you won't have to keep typing long commands. Here is a simple job script:

#!/bin/bash
#
#SBATCH --job-name=hostname
#SBATCH --nodes=1
#SBATCH --ntasks=1

hostname

Assuming the above file has been saved as hostname.sh, it can be submitted using the sbatch command:

sbatch hostname.sh

The job control system will create the job and tell you the new job ID. The results will be saved to a file titled slurm-[id].out with the corresponding job ID. To see a list of jobs currently submitted or running, use the following command:

squeue

The results should look similar to this:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              4267     debug sleep_20   lam2mo PD       0:00      1 (None)
              4266     debug sleep_20   lam2mo  R       0:11      1 compute01

To cancel a job, use the scancel command and give it the ID of the job you wish to cancel:

scancel [id]

Please be considerate--do not run long jobs that require all of the nodes. Check regularly for runaway jobs and cancel them. If you find that someone else has a long-running job that you think may be in error, please email that person directly (USER@dukes.jmu.edu) and CC the instructor.

For more information on the use of Slurm, see their online tutorials or read the man pages (e.g., "man sbatch" or "man squeue").

Sample Batch Submit Scripts

Regular or Pthreads application (change NAME and EXENAME):

#!/bin/bash
#
#SBATCH --job-name=NAME
#SBATCH --nodes=1

./EXENAME

OpenMP application (change NAME, NTHREADS, and EXENAME):

#!/bin/bash
#
#SBATCH --job-name=NAME
#SBATCH --nodes=1

OMP_NUM_THREADS=NTHREADS ./EXENAME

To run with multiple thread counts, you can use a Bash loop. Here is an example for OpenMP:

#!/bin/bash
#
#SBATCH --job-name=NAME
#SBATCH --nodes=1

for t in 1 2 4 8 16 32; do
    OMP_NUM_THREADS=$t ./EXENAME
done

MPI application (change NAME, NTASKS, and EXENAME):

#!/bin/bash
#
#SBATCH --job-name=NAME
#SBATCH --ntasks=NTASKS

module load mpi
srun EXENAME

If you use zsh instead of bash, you may need to include the following line before running module load mpi:

source /usr/share/Modules/init/zsh

Finally, if you need to submit many batch MPI jobs with different process/task counts, you may find it convenient to parameterize the run script and then actually launch the jobs and view the results with different scripts. Here is an example setup:

#
# run.sh (PARAMETERIZED -- DO NOT RUN DIRECTLY)
#
#!/bin/bash
#SBATCH –job-name=<cmd>-MPI_NUM_TASKS
#SBATCH --output=<cmd>-MPI_NUM_TASKS.txt
#SBATCH --ntasks=MPI_NUM_TASKS

module load mpi
srun -n MPI_NUM_TASKS <cmd>

#
# launch.sh (run to submit all jobs)
#
#!/bin/bash
# TODO: customize for the number of tasks needed for your application
for n in 1 8 16 32 64 128; do
    sed -e "s/MPI_NUM_TASKS/$n/g" run.sh | sbatch
done

#
# view.sh (run to view full or partial results)
#
#!/bin/bash

# TODO: customize for the number of tasks needed for your application
for n in 1 8 16 32 64 128; do
    echo "== $n processes =="
    cat <cmd>-$n.txt
    echo
done

Debugging

GDB

It is possible to use GDB to debug multithreadjed and MPI applications; however, it is more tricky than serial debugging. The GDB manual contains a section on multithreaded debugging, and there is a short FAQ about debugging MPI applications.

Helgrind

Helgrind is a Valgrind-based tool for detecting synchronization errors in Pthreads applications. To run Helgrind, use the following command:

valgrind --tool=helgrind [your-exe]

To run Helgrind on a compute node, make sure you put srun at the beginning of the command:

srun valgrind --tool=helgrind [your-exe]

For more information about using the tool and interpreting its output, see the manual. Note that your program will run considerably slower with Helgrind because of the added analysis cost.

Performance Analysis

GNU Profiler

To run the GNU profiler, you must compile with the "-pg" command-line switch then run your program as usual. It will create a file called gmon.out in the working directory that contains the raw profiling results. To format the output in human-readable tables, use the gprof utility (note that you must pass it the original executable file for debug information):

gprof <exe-name>

The default output is self-documented; the first table contains flat profiling data and the second table contains profiling data augmented by call graph information. There are also many command-line parameters to control the output; use man gprof to see full documentation.

To see line-by-line information (execution counts only), you can use the gcov utility. To do this, you will also need to compile with the "-fprofile-arcs -ftest-coverage" command-line options and run the program as usual. This will create *.gcda and *.gcdo files containing code coverage results. You can then run gcov on the source code files to produce the final results:

gcov <src-names>

This will produce *.c.gcov files for each original source file with profiling annotations.

Callgrind/Cachegrind

You can run Valgrind-based tools without any special compilation flags; in fact, you should NOT include the GNU profiler flags because that will introduce irrelevant perturbation into your Valgrind-based results. To run Valgrind-based tools, simply call the valgrind utility and give it the appropriate tool name:

valgrind --tool=callgrind <exe-name>
valgrind --tool=cachegrind <exe-name>

This will produce callgrind.out.* and cachegrind.out.* files in the working directory containing the raw profiling results. To produce human-readable output, use the callgrind_annotate and cg_annotate utilities:

callgrind_annotate <output-file>
cg_annotate <output-file>

The Cachegrind output can take a little while to decipher if you're unfamiliar with it. Here are the most frequent abbreviations:

I	instruction
D	data
L1	L1 cache
LL	last-level cache (L3 on the cluster)
r	read
w	write
m	miss

For Cachegrind results, you can also obtain line-by-line information by passing the source file as a second parameter to cg_annotate. Note that you may need to specify the full path; check the output of the regular cg_annotate to see what file handle you should use.

For more information about all the reports that these tools can generate, see the Valgrind documentation (specifically, see the sections on Callgrind and Cachegrind).

External Resources

Slurm: Tutorials | Quickstart | QuickRef | srun | sbatch | squeue | scancel
Pthreads: LLNL tutorial | Randu.org tutorial | API standard
OpenMP: LLNL tutorial | QuickRef | API standard
MPI: LLNL tutorial | QuickRef | API standard

Parallel and Distributed Systems - Spring 2024