Cloud Deployments
TornadoVM can be executed on the cloud. This document explains how to use TornadoVM for running on Amazons AWS instances that contain GPUs or FPGAs.
1. Running on AWS for CPUs and GPUs
The installation and execution instructions for running on AWS CPUs and GPUs is identical to those for running locally. See the general installation steps here: Installation.
2. Running on AWS EC2 F1 Xilinx FPGAs
The following toolkit configuration comes with the AWS EC2 F1 instance:
FPGA DEV AMI: 1.10.0
Xilinx Vitis Tool: 2020.2
Pre-requisites:
You need to have a storage bucket with: (s3_bucket, s3_dcp_key and s3_loogs_key) for Step 3.
You need to clone the aws-fpga repository and checkout
v1.4.18
, as follows:$ cd /home/centos $ git clone https://github.com/aws/aws-fpga.git $AWS_FPGA_REPO_DIR $ cd $AWS_FPGA_REPO_DIR $ git checkout v1.4.18
1. Install TornadoVM as a CentOS user. The Xilinx FPGA is not exposed to simple users.
$ git clone https://github.com/beehive-lab/TornadoVM.git
$ cd TornadoVM
$ source etc/sources.env
$ make
2. Follow these steps to get access to the Xilinx FPGA.
Enter a bash shell as root.
$ sudo -E /bin/bash
Note: If you face a failure regarding the generation of IP, try the patchhere.
Load the environment variables for Xilinx HLS and runtime.
$ source $AWS_FPGA_REPO_DIR/vitis_setup.sh
Load the environment variables of TornadoVM for root.
$ cd /home/centos/TornadoVM
$ source etc/sources.env
$ tornado --devices
3. Update the the FPGA Conguration file
Update the $TORNADO_SDK/etc/xilinx-fpga.conf
file or create your own
(e.g. $TORNADO_SDK/etc/aws-fpga.conf
), and append the necessary
information (i.e. FPGA plarform name (DEVICE_NAME), HLS compiler flags
(FLAGS), HLS directory ( DIRECTORY_BITSTREAM), and AWS S3 configuration
(s3_bucket, s3_dcp_key and s3_loogs_key)).
$ vim $TORNADO_SDK/etc/aws-fpga.conf
Example of configuration file:
[device]
DEVICE_NAME = /home/centos/src/project_data/aws-fpga/Vitis/aws_platform/xilinx_aws-vu9p-f1_shell-v04261818_201920_2/xilinx_aws-vu9p-f1_shell-v04261818_201920_2.xpfm
[options]
COMPILER=v++
FLAGS = -O3 -j12 # Configure the compilation flags. You can also pass the HLS configuration file (e.g. --config conf.cfg).
DIRECTORY_BITSTREAM = fpga-source-comp/
# If the FPGA is in AWS EC2 F1 Instance
AWS_ENV = yes
[AWS S3 configuration]
AWS_S3_BUCKET = tornadovm-fpga-bucket
AWS_S3_DCP_KEY = outputfolder
AWS_S3_LOGS_KEY = logfolder
You can run TornadoVM with your configuration file, by using the
-Dtornado.fpga.conf.file=FILE
flag. If this flag is not used, the
default configuration file is the $TORNADO_SDK/etc/xilinx-fpga.conf
.
4. Run a program that offloads a task on the FPGA.

image
The following example uses a custom configuration file
(aws-fpga.conf
) to execute the DFT on the AWS F1 FPGA:
$ tornado --jvm "-Ds0.t0.device=0:0 -Dtornado.fpga.conf.file=/home/centos/TornadoVM/etc/aws-fpga.conf -Xmx20g -Xms20g" --printKernel --threadInfo -m tornado.examples/uk.ac.manchester.tornado.examples.dynamic.DFTMT --params="256 default 1" >> output.log
$ Ctrl-Z (^Z)
$ bg
$ disown
This command will trigger TornadoVM to automatically compile Java to
OpenCL and use the AWS FPGA Hardware Development Kit (HDK) to generate a
bitstream. You can also redirect the output from Standard OUT to a file
(output.log
) as the compilation may take a few hours and the
connection may be terminated with a broken pipe (e.g. packet_write_wait:
Connection to 174.129.48.160 port 22: Broken pipe).
Read the output.log
file in order to monitor the outcome of the
TornadoVM execution. To monitor the outcome of the HLS compilation, read
the outputFPGA.log
file, which is automatically generated in the
DIRECTORY_BITSTREAM
( e.g. fpga-source-comp
). After the
bitstream generation, TornadoVM will automatically invoke the creation
of an Amazon FPGA Image (AFI) and upload a file related to the kernel to
the Amazon S3 bucket (configured in the Step 3). The execution of the
program will end up with an error as the bitstream is forwarded to be
used, while the AFI image is not ready yet. E.g.:
[TornadoVM-OCL-JNI] ERROR : clCreateProgramWithBinary -> Returned: -44
5. You can monitor the status of your Amazon FPGA Image.
Instructions are given in outputFPGA.log
. Ensure that you use the
correct FPGAImageId
(e.g. afi-0c1bb6821ccc766fe
) .
$ cat fpga-source-comp/outputFPGA.log
$ aws ec2 describe-fpga-images --fpga-image-ids afi-0c1bb6821ccc766fe
This command will return the following message:
{
"FpgaImages": [
{
"UpdateTime": "2021-05-27T23:55:15.000Z",
"Name": "lookupBufferAddress",
"Tags": [],
"PciId": {
"SubsystemVendorId": "0xfedd",
"VendorId": "0x1d0f",
"DeviceId": "0xf010",
"SubsystemId": "0x1d51"
},
"FpgaImageGlobalId": "agfi-045c5d8825f920edc",
"Public": false,
"State": {
"Code": "pending"
},
"ShellVersion": "0x04261818",
"OwnerId": "813381863415",
"FpgaImageId": "afi-0c1bb6821ccc766fe",
"CreateTime": "2021-05-27T23:15:21.000Z",
"Description": "lookupBufferAddress"
}
]
}
When the state changes from pending
to available
, the
awsxlcbin
binary code can be executed via TornadoVM to the AWS FPGA.
6. Now that the AFI is available, you can execute the program and run the OpenCL kernel on the AWS FPGA.
If you have logged out, ensure that you run (Steps 2 and 4).
$ tornado --jvm="-Ds0.t0.device=0:0 -Dtornado.fpga.conf.file=/home/centos/TornadoVM/etc/aws-fpga.conf -Xmx20g -Xms20g" --debug --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.dynamic.DFTMT --params="256 default 1" >> output.log
The result is the following:
tornado --jvm="-Ds0.t0.device=0:0 -Dtornado.fpga.conf.file=/home/centos/TornadoVM-Internal-feat-removeBufferCache/etc/aws-fpga.conf --threadInfo -Xmx20g -Xms20g" --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.dynamic.DFTMT --parms "256 default 1"
Initialization time: 705795966 ns
__attribute__((reqd_work_group_size(64, 1, 1)))
__kernel void computeDft(__global long *_kernel_context, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics, __global uchar *inreal, __global uchar *inimag, __global uchar *outreal, __global uchar *outimag, __global uchar *inputSize)
{
int i_8, i_29, i_35, i_5, i_4, i_36;
float f_6, f_7, f_24, f_25, f_26, f_27, f_28, f_16, f_17, f_18, f_19, f_20, f_21, f_22, f_23, f_13, f_15;
ulong ul_12, ul_3, ul_2, ul_34, ul_14, ul_1, ul_33, ul_0;
long l_9, l_10, l_11, l_30, l_31, l_32;
// BLOCK 0
ul_0 = (ulong) inreal;
ul_1 = (ulong) inimag;
ul_2 = (ulong) outreal;
ul_3 = (ulong) outimag;
i_4 = get_global_id(0);
// BLOCK 1 MERGES [0 5 ]
i_5 = i_4;
// BLOCK 2
// BLOCK 3 MERGES [2 4 ]
f_6 = 0.0F;
f_7 = 0.0F;
i_8 = 0;
__attribute__((xcl_pipeline_loop(1)))
for(;i_8 < 256;)
{
// BLOCK 4
l_9 = (long) i_8;
l_10 = l_9 << 2;
l_11 = l_10 + 24L;
ul_12 = ul_0 + l_11;
f_13 = *((__global float *) ul_12);
ul_14 = ul_1 + l_11;
f_15 = *((__global float *) ul_14);
f_16 = (float) i_8;
f_17 = f_16 * 6.2831855F;
f_18 = (float) i_5;
f_19 = f_17 * f_18;
f_20 = f_19 / 256.0F;
f_21 = native_sin(f_20);
f_22 = native_cos(f_20);
f_23 = f_22 * f_15;
f_24 = fma(f_21, f_13, f_23);
f_25 = f_7 - f_24;
f_26 = f_21 * f_15;
f_27 = fma(f_22, f_13, f_26);
f_28 = f_6 + f_27;
i_29 = i_8 + 1;
f_6 = f_28;
f_7 = f_25;
i_8 = i_29;
} // B4
// BLOCK 5
l_30 = (long) i_5;
l_31 = l_30 << 2;
l_32 = l_31 + 24L;
ul_33 = ul_2 + l_32;
*((__global float *) ul_33) = f_6;
ul_34 = ul_3 + l_32;
*((__global float *) ul_34) = f_7;
i_35 = get_global_size(0);
i_36 = i_35 + i_5;
i_5 = i_36;
// BLOCK 6
return;
} // kernel
Task info: s0.t0
Backend : OPENCL
Device : xilinx_aws-vu9p-f1_shell-v04261818_201920_2 CL_DEVICE_TYPE_ACCELERATOR (available)
Dims : 1
Global work offset: [0]
Global work size : [256]
Local work size : [64, 1, 1]
Number of workgroups : [4]
Total time: 4532676526 ns
Is valid?: true
Validation: SUCCESS