TornadoVM Changelog

This file summarizes the new features and major changes for each TornadoVM version.

TornadoVM 0.15

27/01/2023

Improvements

  • New TornadoVM API:

  • Launch a new website https://tornadovm.readthedocs.io/en/latest/ for the documentation

  • Improved documentation

  • Initial support for Intel ARC discrete GPUs.

  • Improved TornadoVM installer for Linux

  • ImprovedTornadoVM launch script with optional parameters

  • Support of large buffer allocations with Intel Level Zero. Use: tornado.spirv.levelzero.extended.memory=True

Bug fixes:

  • Vector and Matrix types

  • TornadoVM Floating Replacement compiler phase fixed

  • Fix CMAKE for Intel ARC GPUs

  • Device query tool fixed for the PTX backend

  • Documentation for Windows 11 fixed

TornadoVM 0.14.1

29/09/2022

Improvements

  • The tornado command is replaced from a Bash to a Python script.

    • Use tornado --help to check the new options and examples.

  • Support of native tests for the SPIR-V backend.

  • Improvement of the OpenCL and PTX tests of the internal APIs.

Compatibility/Integration

  • Integration and compatibility with the Graal 22.2.0 JIT Compiler.

  • Compatibility with JDK 18 and JDK 19.

  • Compatibility with Apple M1 Pro using the OpenCL backend.

Bug Fixes

  • CUDA PTX generated header fixed to target NVIDIA 30xx GPUs and CUDA 11.7.

  • The signature of generated PTX kernels fixed for NVIDIA driver >= 510 and 30XX GPUs when using the TornadoVM Kernel API.

  • Tests of virtual OpenCL devices fixed.

  • Thread deployment information for the OpenCL backend is fixed.

  • TornadoVMRuntimeCI moved to TornadoVMRutimeInterface.

TornadoVM 0.14

15/06/2022

New Features

  • New device memory management for addressing the memory allocation limitations of OpenCL and enabling pinned memory of device buffers.

    • The execution of task-schedules will still automatically allocate/deallocate memory every time a task-schedule is executed, unless lock/unlock functions are invoked explicitly at the task-schedule level.

    • One heap per device has been replaced with a device buffer per input variable.

    • A new API call has been added for releasing memory: unlockObjectFromMemory

    • A new API call has been added for locking objects to the device: lockObjectInMemory This requires the user to release memory by invoking unlockObjectFromMemory at the task-schedule level.

  • Enhanced Live Task migration by supporting multi-backend execution (PTX <-> OpenCL <-> SPIR-V).

Compatibility/Integration

  • Integration with the Graal 22.1.0 JIT Compiler

  • JDK 8 deprecated

  • Azul Zulu JDK supported

  • OpenCL 2.1 as a default target for the OpenCL Backend

  • Single Docker Image for Intel XPU platforms, including the SPIR-V backend (using the Intel Integrated Graphics), and OpenCL (using the Intel Integrated Graphics, Intel CPU and Intel FPGA in emulation mode). Image: https://github.com/beehive-lab/docker-tornado#intel-integrated-graphics

Improvements/Bug Fixes

  • SIGNUM Math Function included for all three backends.

  • SPIR-V optimizer enabled by default (3x reduce in binary size).

  • Extended Memory Mode enabled for the SPIR-V Backend via Level Zero.

  • Phi instructions fixed for the SPIR-V Backend.

  • SPIR-V Vector Select instructions fixed.

  • Duplicated IDs for Non-Inlined SPIR-V Functions fixed.

  • Refactoring of the TornadoVM Math Library.

  • FPGA Configuration files fixed.

  • Bitwise operations for OpenCL fixed.

  • Code Generation Times and Backend information are included in the profiling info.

TornadoVM 0.13

21/03/2022

TornadoVM 0.12

17/11/2021

  • New backend: initial support for SPIR-V and Intel Level Zero

    • Level-Zero dispatcher for SPIR-V integrated

    • SPIR-V Code generator framework for Java

  • Benchmarking framework improved to accommodate all three backends

  • Driver metrics, such as kernel time and data transfers included in the benchmarking framework

  • TornadoVM profiler improved:

    • Command line options added: --enableProfiler <silent|console> and --dumpProfiler <jsonFile>

    • Logging improve for debugging purposes. JIT Compiler, JNI calls and code generation

  • New math intrinsincs operations supported

  • Several bug fixes:

    • Duplicated barriers removed. TornadoVM BARRIER bytecode fixed when running multi-context

    • Copy in when having multiple reductions fixed

    • TornadoVM profiler fixed for multiple context switching (device switching)

  • Pretty printer for device information

TornadoVM 0.11

29/09/2021

TornadoVM 0.10

29/06/2021

  • TornadoVM JIT Compiler sync with Graal 21.1.0

  • Experimental support for OpenJDK 16

  • Tracing the TornadoVM thread distribution and device information with a new option --threadInfo instead of --debug

  • Refactoring of the new API:

    • TornadoVMExecutionContext renamed to KernelContext

    • GridTask renamed to GridScheduler

  • AWS F1 AMI version upgraded to 1.10.0 and automated the generation of AFI image

  • Xilinx OpenCL backend expanded with:

      1. Initial integration of Xilinx OpenCL attributes for loop

        pipelining in the TornadoVM compiler

      1. Support for multiple compute units

  • Logging FPGA compilation option added to dump FPGA HLS compilation to a file

  • TornadoVM profiler enhanced for including data transfers for the stack-frame and kernel dispatch time

  • Initial support for 2D Arrays added

  • Several bug fixes and stability support for the OpenCL and PTX backends

TornadoVM 0.9

15/04/2021


TornadoVM 0.8

19/11/2020

  • Added PTX backend for NVIDIA GPUs

    • Build TornadoVM using make BACKEND=ptx,opencl to obtain the two supported backends.

  • TornadoVM JIT Compiler aligned with Graal 20.2.0

  • Support for other JDKs:

    • Red Hat Mandrel 11.0.9

    • Amazon Coretto 11.0.9

    • GraalVM LabsJDK 11.0.8

    • OpenJDK 11.0.8

    • OpenJDK 12.0.2

    • OpenJDK 13.0.2

    • OpenJDK 14.0.2

  • Support for hybrid (CPU-GPU) parallel reductions

  • New API for generic kernel dispatch. It introduces the concept of WorkerGrid and GridTask

    • A WorkerGrid is an object that stores how threads are organized on an OpenCL device: java       WorkerGrid1D worker1D = new WorkerGrid1D(4096);

    • A GridTask is a map that relates a task-name with a worker-grid. java       GridTask gridTask = new GridTask();       gridTask.set("s0.t0", worker1D);

    • A TornadoVM Task-Schedule can be executed using a GridTask: java     ts.execute(gridTask);

    • More info: link

  • TornadoVM profiler improved

    • Profiler metrics added

    • Code features per task-graph

  • Lazy device initialisation moved to early initialisation of PTX and OpenCL devices

  • Initial support for Atomics (OpenCL backend)

  • Task Schedules with 11-14 parameters supported

  • Documentation improved

  • Bug fixes for code generation, numeric promotion, basic block traversal, Xilinx FPGA compilation.


TornadoVM 0.7

22/06/2020

  • Support for ARM Mali GPUs.

  • Support parallel reductions on FPGAs

  • Agnostic FPGA vendor compilation via configuration files (Intel & Xilinx)

  • Support for AWS on Xilinx FPGAs

  • Recompilation for different input data sizes supported

  • New TornadoVM API calls:

    1. Update references for re-compilation: taskSchedule.updateReferences(oldRef, newRef);

    2. Use the default OpenCL scheduler: taskSchedule.useDefaultThreadScheduler(true);

  • Use of JMH for benchmarking

  • Support for Fused Multiply-Add (FMA) instructions

  • Easy-selection of different devices for unit-tests tornado-test.py -V --debug -J"-Dtornado.unittests.device=0:1"

  • Bailout mechanism improved from parallel to sequential

  • Improve thread scheduling

  • Support for private memory allocation

  • Assertion mode included

  • Documentation improved

  • Several bug fixes

TornadoVM 0.6

21/02/2020

  • TornadoVM compatible with GraalVM 19.3.0 using JDK 8 and JDK 11

  • TornadoVM compiler update for using Graal 19.3.0 compiler API

  • Support for dynamic languages on top of Truffle

  • Support for multiple tasks per task-schedule on FPGAs (Intel and Xilinx)

  • Support for OSX Mojave and Catalina

  • Task-schedule name handling for FPGAs improved

  • Exception handling improved

  • Reductions for long type supported

  • Bug fixes for ternary conditions, reductions and code generator

  • Documentation improved

TornadoVM 0.5

16/12/2019

  • Initial support for Xilinx FPGAs

  • TornadoVM API classes are now Serializable

  • Initial support for local memory for reductions

  • JVMCI built with local annotation patch removed. Now TornadoVM requires unmodified JDK8 with JVMCI support

  • Support of multiple reductions within the same task-schedules

  • Emulation mode on Intel FPGAs is fixed

  • Fix reductions on Intel Integrated Graphics

  • TornadoVM driver OpenCL initialization and OpenCL code cache improved

  • Refactoring of the FPGA execution modes (full JIT and emulation modes improved).

TornadoVM 0.4

14/10/2019

  • Profiler supported

    • Use -Dtornado.profiler=True to enable profiler

    • Use -Dtornado.profiler=True -Dtornado.profiler.save=True to dump the profiler logs

  • Feature extraction added

    • Use -Dtornado.feature.extraction=True to enable code extraction features

  • Mac OSx support

  • Automatic reductions composition (map-reduce) within the same task-schedule

  • Bug related to a memory leak when running on GPUs solved

  • Bug fixes and stability improvements

TornadoVM 0.3

22/07/2019

  • New Matrix 2D and Matrix 3D classes with type specializations.

  • New API-call TaskSchedule#batch for batch processing. It allows programmers to run with more data than the maximum capacity of the accelerator by creating batches of executions.

  • FPGA full automatic compilation pipeline.

  • FPGA options simplified:

    • -Dtornado.precompiled.binary=<binary> for loading the bitstream.

    • -Dtornado.opencl.userelative=True for using relative addresses.

    • -Dtornado.opencl.codecache.loadbin=True removed.

  • Reductions support enhanced and fully automated on GPUs and CPUs.

  • Initial support for reductions on FPGAs.

  • Initial API for profiling tasks integrated.

TornadoVM 0.2

25/02/2019

  • Rename to TornadoVM

  • Device selection for better performance (CPU, multi-core, GPU, FPGA) via an API for Dynamic Reconfiguration

    • Added methods executeWithProfiler and executeWithProfilerSequential with an input policy.

    • Policies: Policy.PERFORMANCE, Policy.END_2_END, and Policy.LATENCY implemented.

  • Basic heuristic for predicting the highest performing target device with Dynamic Reconfiguration

  • Initial FPGA integration for Altera FPGAs:

    • Full JIT compilation mode

    • Ahead of time compilation mode

    • Emulation/debug mode

  • FPGA JIT compiler specializations

  • Added support for Java reductions:

    • Compiler specializations for CPU and GPU reductions

  • Performance and stability fixes

Tornado 0.1.0

07/09/2018

  • Initial Implementation of the Tornado compiler

  • Initial GPU/CPU code generation for OpenCL

  • Initial support in the runtime to execute OpenCL programs generated by the Tornado JIT compiler

  • Initial Tornado-API release (@Parallel Java annotation and TaskSchedule API)

  • Multi-GPU enabled through multiple tasks-schedules