TornadoVM Changelog

This file summarizes the new features and major changes for each TornadoVM version.

TornadoVM 1.0.2



  • #323: Set Accelerator Memory Limit per Execution Plan at the API level

  • #328: Javadoc API to run with concurrent devices and memory limits

  • #340: New API calls to enable threadInfo and printKernel from the Execution Plan API.

  • #334: Dynamically enable/disable profiler after first run


  • #337 : Initial support for Graal and JDK 21.0.2

Bug Fixes

  • #322: Fix duplicate thread-info debug message when the debug option is also enabled.

  • #325: Set/Get accesses for the MatrixVectorFloat4 type fixed

  • #326: Fix installation script for running with Python >= 3.12

  • #327: Fix Memory Limits for all supported Panama off-heap types.

  • #329: Fix timers for the dynamic reconfiguration policies

  • #330: Fix the profiler logs when silent mode is enabled

  • #332: Fix Batch processing when having multiple task-graphs in a single execution plan.

TornadoVM 1.0.1



  • #305: Under-demand data transfer for custom data ranges.

  • #313: Initial support for Half-Precision (FP16) data types.

  • #311: Enable Multi-Task Multiple Device (MTMD) model from the TornadoExecutionPlan API:

  • #315: Math Ceil function added


  • #294: Separation of the OpenCL Headers from the code base.

  • #297: Separation of the LevelZero JNI API in a separate repository.

  • #301: Temurin configuration supported.

  • #304: Refactor of the common phases for the JIT compiler.

  • #316: Beehive SPIR-V Toolkit version updated.

Bug Fixes

  • #298: OpenCL Codegen fixed open-close brackets.

  • #300: Python Dependencies fixed for AWS

  • #308: Runtime check for Grid-Scheduler names

  • #309: Fix check-style to support STR templates

  • #314: emit Vector16 Capability for 16-width vectors

TornadoVM 1.0



  • Brand-new API for allocating off-heap objects and array collections using the Panama Memory Segment API. - New Arrays, Matrix and Vector type objects are allocated using the Panama API. - Migration of existing applications to use the new Panama-based types:

  • Handling of the TornadoVM’s internal bytecode improved to avoid write-only copies from host to device.

  • cospi and sinpi math operations supported for OpenCL, PTX and SPIR-V.

  • Vector 16 data types supported for float, double and int.

  • Support for Mesa’s rusticl.

  • Device default ordering improved based on maximum thread size.

  • Move all the installation and configuration scripts from Bash to Python.

  • The installation process has been improved for Linux and OSx with M1/M2 chips.

  • Documentation improved.

  • Add profiling information for the testing scripts.


  • Integration with the Graal 23.1.0 JIT Compiler.

  • Integration with OpenJDK 21.

  • Integration with Truffle Languages (Python, Ruby and Javascript) using Graal 23.1.0.

  • TornadoVM API Refactored.

  • Backport bug-fixes for branch using OpenJDK 17: master-jdk17

Bug fixes:

  • Multiple SPIR-V Devices fixed.

  • Runtime Exception when no SPIR-V devices are present.

  • Issue with the kernel context API when invoking multiple kernels fixed.

  • MTMD mode is fixed when running multiple backends on the same device.

  • long type as a constant parameter for a kernel fixed.

  • FPGA Compilation and Execution fixed for AWS and Xilinx devices.

  • Batch processing fixed for different data types of the same size.

TornadoVM 0.15.2



  • Initial Support for Multi-Tasks on Multiple Devices (MTMD): This mode enables the execution of multiple independent tasks on more than one hardware accelerators. Documentation in link:

  • Support for trigonometric radian, cospi and sinpi functions for the OpenCL/PTX and SPIR-V backends.

  • Clean-up Java modules not being used and TornadoVM core classes refactored.


  • Initial integration with ComputeAorta (part of the Codeplay’s oneAPI Construction Kit for RISC-V) to run on RISC-V with Vector Instructions (OpenCL backend) in emulation mode.

  • Beehive SPIR-V Toolkit dependency updated.

  • Tests for prebuilt SPIR-V kernels fixed to dispatch SPIR-V binaries through the Level Zero and OpenCL runtimes.

  • Deprecated script removed.

Bug fixes:

  • TornadoVM OpenCL Runtime throws an exception when the detected hardware does not support FP64.

  • Fix the installer for the older Apple with the x86 architecture using AMD GPUs.

  • Installer for ARM based systems fixed.

  • Installer fixed for Microsoft WSL and NVIDIA GPUs.

  • OpenCL code generator fixed to avoid using the reserved OpenCL keywords from Java function parameters.

  • Dump profiler option fixed.

TornadoVM 0.15.1



  • Introduction of a device selection heuristic based on the computing capabilities of devices. TornadoVM selects, as the default device, the fastest device based on its computing capability.

  • Optimisation of removing redundant data copies for Read-Only and Write-Only buffers from between the host (CPU) and the device (GPU) based on the Tornado Data Flow Graph.

  • New installation script for TornadoVM.

  • Option to dump the TornadoVM bytecodes for the unit tests.

  • Full debug option improved. Use --fullDebug.


  • Integration and compatibility with the Graal 22.3.2 JIT Compiler.

  • Improved compatibility with Apple M1 and Apple M2 through the OpenCL Backend.

  • GraalVM/Truffle programs integration improved. Use --truffle in the tornado script to run guest programs with Truffle. Example: tornado --truffle python Full documentation:

Bug fixes:

TornadoVM 0.15



  • New TornadoVM API:

  • Launch a new website for the documentation

  • Improved documentation

  • Initial support for Intel ARC discrete GPUs.

  • Improved TornadoVM installer for Linux

  • ImprovedTornadoVM launch script with optional parameters

  • Support of large buffer allocations with Intel Level Zero. Use: tornado.spirv.levelzero.extended.memory=True

Bug fixes:

  • Vector and Matrix types

  • TornadoVM Floating Replacement compiler phase fixed

  • Fix CMAKE for Intel ARC GPUs

  • Device query tool fixed for the PTX backend

  • Documentation for Windows 11 fixed

TornadoVM 0.14.1



  • The tornado command is replaced from a Bash to a Python script.

    • Use tornado --help to check the new options and examples.

  • Support of native tests for the SPIR-V backend.

  • Improvement of the OpenCL and PTX tests of the internal APIs.


  • Integration and compatibility with the Graal 22.2.0 JIT Compiler.

  • Compatibility with JDK 18 and JDK 19.

  • Compatibility with Apple M1 Pro using the OpenCL backend.

Bug Fixes

  • CUDA PTX generated header fixed to target NVIDIA 30xx GPUs and CUDA 11.7.

  • The signature of generated PTX kernels fixed for NVIDIA driver >= 510 and 30XX GPUs when using the TornadoVM Kernel API.

  • Tests of virtual OpenCL devices fixed.

  • Thread deployment information for the OpenCL backend is fixed.

  • TornadoVMRuntimeCI moved to TornadoVMRutimeInterface.

TornadoVM 0.14


New Features

  • New device memory management for addressing the memory allocation limitations of OpenCL and enabling pinned memory of device buffers.

    • The execution of task-schedules will still automatically allocate/deallocate memory every time a task-schedule is executed, unless lock/unlock functions are invoked explicitly at the task-schedule level.

    • One heap per device has been replaced with a device buffer per input variable.

    • A new API call has been added for releasing memory: unlockObjectFromMemory

    • A new API call has been added for locking objects to the device: lockObjectInMemory This requires the user to release memory by invoking unlockObjectFromMemory at the task-schedule level.

  • Enhanced Live Task migration by supporting multi-backend execution (PTX <-> OpenCL <-> SPIR-V).


  • Integration with the Graal 22.1.0 JIT Compiler

  • JDK 8 deprecated

  • Azul Zulu JDK supported

  • OpenCL 2.1 as a default target for the OpenCL Backend

  • Single Docker Image for Intel XPU platforms, including the SPIR-V backend (using the Intel Integrated Graphics), and OpenCL (using the Intel Integrated Graphics, Intel CPU and Intel FPGA in emulation mode). Image:

Improvements/Bug Fixes

  • SIGNUM Math Function included for all three backends.

  • SPIR-V optimizer enabled by default (3x reduce in binary size).

  • Extended Memory Mode enabled for the SPIR-V Backend via Level Zero.

  • Phi instructions fixed for the SPIR-V Backend.

  • SPIR-V Vector Select instructions fixed.

  • Duplicated IDs for Non-Inlined SPIR-V Functions fixed.

  • Refactoring of the TornadoVM Math Library.

  • FPGA Configuration files fixed.

  • Bitwise operations for OpenCL fixed.

  • Code Generation Times and Backend information are included in the profiling info.

TornadoVM 0.13


TornadoVM 0.12


  • New backend: initial support for SPIR-V and Intel Level Zero

    • Level-Zero dispatcher for SPIR-V integrated

    • SPIR-V Code generator framework for Java

  • Benchmarking framework improved to accommodate all three backends

  • Driver metrics, such as kernel time and data transfers included in the benchmarking framework

  • TornadoVM profiler improved:

    • Command line options added: --enableProfiler <silent|console> and --dumpProfiler <jsonFile>

    • Logging improve for debugging purposes. JIT Compiler, JNI calls and code generation

  • New math intrinsincs operations supported

  • Several bug fixes:

    • Duplicated barriers removed. TornadoVM BARRIER bytecode fixed when running multi-context

    • Copy in when having multiple reductions fixed

    • TornadoVM profiler fixed for multiple context switching (device switching)

  • Pretty printer for device information

TornadoVM 0.11


TornadoVM 0.10


  • TornadoVM JIT Compiler sync with Graal 21.1.0

  • Experimental support for OpenJDK 16

  • Tracing the TornadoVM thread distribution and device information with a new option --threadInfo instead of --debug

  • Refactoring of the new API:

    • TornadoVMExecutionContext renamed to KernelContext

    • GridTask renamed to GridScheduler

  • AWS F1 AMI version upgraded to 1.10.0 and automated the generation of AFI image

  • Xilinx OpenCL backend expanded with:

      1. Initial integration of Xilinx OpenCL attributes for loop

        pipelining in the TornadoVM compiler

      1. Support for multiple compute units

  • Logging FPGA compilation option added to dump FPGA HLS compilation to a file

  • TornadoVM profiler enhanced for including data transfers for the stack-frame and kernel dispatch time

  • Initial support for 2D Arrays added

  • Several bug fixes and stability support for the OpenCL and PTX backends

TornadoVM 0.9


TornadoVM 0.8


  • Added PTX backend for NVIDIA GPUs

    • Build TornadoVM using make BACKEND=ptx,opencl to obtain the two supported backends.

  • TornadoVM JIT Compiler aligned with Graal 20.2.0

  • Support for other JDKs:

    • Red Hat Mandrel 11.0.9

    • Amazon Coretto 11.0.9

    • GraalVM LabsJDK 11.0.8

    • OpenJDK 11.0.8

    • OpenJDK 12.0.2

    • OpenJDK 13.0.2

    • OpenJDK 14.0.2

  • Support for hybrid (CPU-GPU) parallel reductions

  • New API for generic kernel dispatch. It introduces the concept of WorkerGrid and GridTask

    • A WorkerGrid is an object that stores how threads are organized on an OpenCL device: java       WorkerGrid1D worker1D = new WorkerGrid1D(4096);

    • A GridTask is a map that relates a task-name with a worker-grid. java       GridTask gridTask = new GridTask();       gridTask.set("s0.t0", worker1D);

    • A TornadoVM Task-Schedule can be executed using a GridTask: java     ts.execute(gridTask);

    • More info: link

  • TornadoVM profiler improved

    • Profiler metrics added

    • Code features per task-graph

  • Lazy device initialisation moved to early initialisation of PTX and OpenCL devices

  • Initial support for Atomics (OpenCL backend)

  • Task Schedules with 11-14 parameters supported

  • Documentation improved

  • Bug fixes for code generation, numeric promotion, basic block traversal, Xilinx FPGA compilation.

TornadoVM 0.7


  • Support for ARM Mali GPUs.

  • Support parallel reductions on FPGAs

  • Agnostic FPGA vendor compilation via configuration files (Intel & Xilinx)

  • Support for AWS on Xilinx FPGAs

  • Recompilation for different input data sizes supported

  • New TornadoVM API calls:

    1. Update references for re-compilation: taskSchedule.updateReferences(oldRef, newRef);

    2. Use the default OpenCL scheduler: taskSchedule.useDefaultThreadScheduler(true);

  • Use of JMH for benchmarking

  • Support for Fused Multiply-Add (FMA) instructions

  • Easy-selection of different devices for unit-tests -V --debug -J"-Dtornado.unittests.device=0:1"

  • Bailout mechanism improved from parallel to sequential

  • Improve thread scheduling

  • Support for private memory allocation

  • Assertion mode included

  • Documentation improved

  • Several bug fixes

TornadoVM 0.6


  • TornadoVM compatible with GraalVM 19.3.0 using JDK 8 and JDK 11

  • TornadoVM compiler update for using Graal 19.3.0 compiler API

  • Support for dynamic languages on top of Truffle

  • Support for multiple tasks per task-schedule on FPGAs (Intel and Xilinx)

  • Support for OSX Mojave and Catalina

  • Task-schedule name handling for FPGAs improved

  • Exception handling improved

  • Reductions for long type supported

  • Bug fixes for ternary conditions, reductions and code generator

  • Documentation improved

TornadoVM 0.5


  • Initial support for Xilinx FPGAs

  • TornadoVM API classes are now Serializable

  • Initial support for local memory for reductions

  • JVMCI built with local annotation patch removed. Now TornadoVM requires unmodified JDK8 with JVMCI support

  • Support of multiple reductions within the same task-schedules

  • Emulation mode on Intel FPGAs is fixed

  • Fix reductions on Intel Integrated Graphics

  • TornadoVM driver OpenCL initialization and OpenCL code cache improved

  • Refactoring of the FPGA execution modes (full JIT and emulation modes improved).

TornadoVM 0.4


  • Profiler supported

    • Use -Dtornado.profiler=True to enable profiler

    • Use -Dtornado.profiler=True to dump the profiler logs

  • Feature extraction added

    • Use -Dtornado.feature.extraction=True to enable code extraction features

  • Mac OSx support

  • Automatic reductions composition (map-reduce) within the same task-schedule

  • Bug related to a memory leak when running on GPUs solved

  • Bug fixes and stability improvements

TornadoVM 0.3


  • New Matrix 2D and Matrix 3D classes with type specializations.

  • New API-call TaskSchedule#batch for batch processing. It allows programmers to run with more data than the maximum capacity of the accelerator by creating batches of executions.

  • FPGA full automatic compilation pipeline.

  • FPGA options simplified:

    • -Dtornado.precompiled.binary=<binary> for loading the bitstream.

    • -Dtornado.opencl.userelative=True for using relative addresses.

    • -Dtornado.opencl.codecache.loadbin=True removed.

  • Reductions support enhanced and fully automated on GPUs and CPUs.

  • Initial support for reductions on FPGAs.

  • Initial API for profiling tasks integrated.

TornadoVM 0.2


  • Rename to TornadoVM

  • Device selection for better performance (CPU, multi-core, GPU, FPGA) via an API for Dynamic Reconfiguration

    • Added methods executeWithProfiler and executeWithProfilerSequential with an input policy.

    • Policies: Policy.PERFORMANCE, Policy.END_2_END, and Policy.LATENCY implemented.

  • Basic heuristic for predicting the highest performing target device with Dynamic Reconfiguration

  • Initial FPGA integration for Altera FPGAs:

    • Full JIT compilation mode

    • Ahead of time compilation mode

    • Emulation/debug mode

  • FPGA JIT compiler specializations

  • Added support for Java reductions:

    • Compiler specializations for CPU and GPU reductions

  • Performance and stability fixes

Tornado 0.1.0


  • Initial Implementation of the Tornado compiler

  • Initial GPU/CPU code generation for OpenCL

  • Initial support in the runtime to execute OpenCL programs generated by the Tornado JIT compiler

  • Initial Tornado-API release (@Parallel Java annotation and TaskSchedule API)

  • Multi-GPU enabled through multiple tasks-schedules