GPU nodes


 

Available GPUs

The following NVIDIA GPUs are currently available as part of the DCC managed HPC clusters:

# GPUsNameYearArchitectureCUDA cap.CUDA coresClock MHzMem GiBSP peak GFlopsDP peak GFlopsPeak GB/s
8Tesla M20502012GF100 (Fermi)2.04485752.621030 515148.4
6Tesla M2070Q2012GF100 (Fermi)2.04485755.251030515150.3
2*GeForce GTX 6802012GK104-400 (Kepler)3.0153610581.953090128192.2
3Tesla K20c2013GK110 (Kepler)3.524967454.6335241175208
5Tesla K40c2013GK110B (Kepler)3.52880745 / 87511.174291 / 50401430 / 1680288
8Tesla K80c (dual)2014GK210 (Kepler)3.72496562 / 87511.172796 / 4368932 / 1456240
1*GeForce GTX TITAN X2015GM200-400 (Maxwell)5.23072107611.926144192336
8*TITAN X2016GP102 (Pascal)6.135841417 / 153111.9010157 / 10974317.4 / 342.9480

*Please note that the NVIDIA consumer GPUs (GForce GTX 680 and GForce GTX TITAN X) as well as TITAN X do not support ECC.

In addition, we have 1 Xeon-Phi node with 2×Intel Xeon Phi 5110P accelerators (60 cores, 8 GB memory), which can be used for testing purposes.

 

Running interactively on GPUs

There are currently two nodes available for running interactive jobs on NVIDIA GPUs.

Node n-62-17-44 is installed with 2×NVIDIA Tesla M2070Q, which are based on the Fermi architecture (same as NVIDIA Tesla M2050).

To run interactively on this node, you can use the following command:

hpclogin1: $ gpush

This command executes a bash script that submits an interactive job to the gpushqueue queue.

Node n-62-18-47 is installed with 1×NVIDIA GForce GTX TITAN X, 2×NVIDIA Tesla K20c, and 1×NVIDIA Tesla K40c, all based on the Kepler architecture (same as NVIDIA Tesla K80c and NVIDIA GForce GTX 680).

To run interactively on this node, you can use the following command:

hpclogin1: $ k40sh

This command executes a bash script that submits an interactive job to the k40_interactive queue.

Please note that multiple users are allowed on these nodes, and all users will be able to access all the GPUs on the node. We have set the GPUs to the “Exclusive process” runtime mode, which means that you will encounter a “device not available” (or similar) error, if someone is using the GPU you are trying to access.

In order to avoid too many conflicts we ask you to follow this code-of-conduct:

  • Please monitor which GPUs are currently occupied using the command nvidia-smi and predominantly select unoccupied GPUs (e.g., using cudaSetDevice()) for your application.
  • If you need to run on all CPU cores, e.g., for performance profiling, please make sure that you are not disturbing other users.
  • We kindly ask you to use the interactive nodes mainly for development, profiling, and short test jobs.

If you have further questions or issues using the GPUs please write to support@hpc.dtu.dk.

 

Requesting GPUs under LSF9

We have currently two nodes with Kepler GPUs available for computation, which are managed by the LSF scheduler. Node n-62-18-49 has four GPUs (4×NVIDIA Tesla K40c) and node n-62-24-17 has eight GPUs (4×NVIDIA Tesla K80c). Both nodes currently have their own separate LSF queues, gpuqueuek40 and gpuqueuek80, respectively.

We also have one node (n-62-30-10) with Pascal GPUs (with 4×NVIDIA TITAN X) available for computation on the queue gpuqueuetitanx. Please note that the Titan cards are not efficient for calculations in double precision and do not support ECC.

To use these nodes, you need first to access the LSF part of the cluster, by logging into

login2.hpc.dtu.dk

An example script for using the K80 GPU follows

#!/bin/sh
### General options
### –- specify queue --
#BSUB -q gpuqueuek80
### -- set the job Name --
#BSUB -J K80_JOB
### -- ask for number of cores (default: 1) --
#BSUB -n 2
### -- Select the resources: 2 gpus in exclusive process mode --
#BSUB -R "rusage[ngpus_excl_p=2]"
### -- set walltime limit: hh:mm --
#BSUB -W 16:00
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
#BSUB -B
### -- send notification at completion--
#BSUB -N
### -- Specify the output and error file. %J is the job-id --
### -- -o and -e mean append, -oo and -eo mean overwrite --
#BSUB -o gpu-%J.out
#BSUB -e gpu_%J.err
### -- small workaround -- no comment ;)
#BSUB -L /bin/bash
# -- end of LSF options -- 

#nvidia-smi 
# Load the cuda module 
module load cuda/7.5 

# here follow the commands you want to execute 
/appl/cuda/7.5/samples/bin/x86_64/linux/release/matrixMulCUBLAS --device=2

For an explanation of the general BSUB options, refer to this page. If you want to submit an LSF-job you have to use the following syntax:

bsub < myjobscript.sh

The special options for the GPU usage are mainly two:

#BSUB -q gpuqueuek80
#BSUB -R "rusage[ngpus_excl_p=2]"

The first line is to select the queue with the K80 accelerator. The second line requests the gpu resources, in this specific case 2 in exclusive mode.
Then you need to load the cuda runtime environment

module load cuda/7.5

and finally you can add the command for your specific program. Just replace the line

/appl/cuda/7.5/samples/bin/x86_64/linux/release/matrixMulCUBLAS --device=2

with your command line.
If you want, uncommenting the line

#env | grep CUDA_VISIBLE_DEVICES

you will see which Cuda devices are visible to your program, to be sure that your request was correct.

Requesting GPUs under LSF10

The syntax regarding requesting GPUs in our setup has changed from LSF9 to LSF10.
For submitting jobs into the LSF10-setup, please follow these instructions:
Using GPUs under LSF10

If you have further questions or issues using the GPUs please write to support@hpc.dtu.dk.