TornadoVM Changelog

This file summarizes the new features and major changes for each TornadoVM version.

CHANGELOG

TornadoVM 5.1.0-jdk21

17/07/26

Improvements

#929: Add FP8 (E4M3/E5M2) support for the CUDA backend + fix NaN/Infinity float constants
#928: Add FP8 (E4M3/E5M2) support for the CUDA backend + fix NaN/Infinity float constants
#934: Auto-disable intra-plan concurrency on serial task graphs
#937: Stage large H2D transfers through a pinned host buffer ring
#936: Stage large H2D transfers through a pinned host buffer ring
#935: Auto-disable intra-plan concurrency on serial task graphs
#919: [cuda] NVTX instrumentation for library tasks + cuSPARSE hybrid provider (SpMV/SpMM)
#912: [cuda] CUTLASS hybrid-API provider: FP32/FP16 GEMM + fused bias/ReLU/GELU epilogues as library tasks

Compatibility

#938: [docs] Revision of the documentation page for v5.0.0
#925: [ci] Fix release pipeline triggers, rename pipeline stages, skip redundant CI

Bug Fixes

#945: [fix] Restore cutlass-jni build on GCC 15+ hosts (-Wtemplate-body)
#932: Fix vmDeps path to dispatch KernelContext/WorkerGrid kernels
#933: Fix vmDeps path to dispatch KernelContext/WorkerGrid kernels
#910: [cuda] Fix misleading NVRTC compilation errors: memoized FP16 header resolution + explanatory driver/toolkit diagnostics

Other Changes

#946: [fix] Restore cutlass-jni build on GCC 15+ hosts (-Wtemplate-body)
#921: [fix] Windows support for the CUDA backend (build, runtime DLL resolution, correctness)
#943: Backport Cuda windows to jdk25
#944: Backport Docs to jdk25
#927: Backport Nvtx cusparse to jdk25
#926: Back port Hybrid cutlass to jdk25
#916: [docs] Complete hybrid API guide (HYBRID_API_GUIDE.md): all providers, examples, flags, custom-provider walkthrough
#923: [cleanup] remove deprecated files
#924: [cleanup] remove deprecated files
#911: [cuda] Fix misleading NVRTC compile-error logging and drop the cuda_fp16.h include dependency
#915: [docs] Complete hybrid API guide (HYBRID_API_GUIDE.md): all providers, examples, flags, custom-provider walkthrough

TornadoVM 5.0.0-jdk21

09/07/26

Improvements

#888: [hybrid] Hybrid API: native library tasks with cuDNN provider
#887: [hybrid] Hybrid API: native library tasks with cuFFT provider
#869: [JDK25][PTX] Tensor Core MMA intrinsics for FP16 and INT8
#850: [Metal] Add hardware matrix units (`simdgroup_float8x8`) for fast gemms to M-silicon
#867: [PTX][CUDA] Tensor Core MMA intrinsics for FP16 and INT8
#854: [PTX] Add atan2 and asin math intrinsics and software atan implementation
#843: [jdk25] Add swizzled local memory accessors for FP16 and INT8 on the PTX backend
#841: Add swizzled local memory accessors for FP16 and INT8 on the PTX backend

Compatibility

#857: [ci] Automate multi-platform SDK release pipeline (+ Ray-Tracer/Metal CI and launcher exit-code fix)

Bug Fixes

#903: Fix for stride-16 swizzled FP16 shared-memory load and store that caused segfault in tests
#874: [hotfix] Multiple fixes after enabling CUDA backend into develop
#886: Fix batch processing with a remainder chunk under device buffer reuse
#883: Fix repeated execution of batched plans
#882: Fix repeated execution of batched plans
#877: [hotfix] Resolving unit-tests failure in CUDA backend
#856: [fix] Handle ObjectStamp compatibility for vector types in OpenCL, Metal and SPIRV backends
#844: [fix] NPE when calling HalfFloat.getFloat32() on a value read from a local HalfFloat[]
#838: [fix] Honour compile-time local work size in SPIR-V Level Zero launcher

Other Changes

#904: Fix for stride-16 swizzled FP16 shared-memory load and store that caused segfault in tests
#893: Fix CUDA-backend build on develop (cuFFT broken merge + cudnn SDPA)
#901: Feat/update logo
#897: [JDK25] Support for CUDA streams for overlapping data transfers with TaskGraph execution
#800: [JDK21] Support for CUDA streams for overlapping data transfers with TaskGraph execution
#902: Feat/update logo
#852: Revise README to include latest TornadoVM updates
#900: Backport Cuda fixes to jdk25
#892: Fix batch processing with a remainder chunk under device buffer reuse
#896: Backport Readme to jdk25
#898: Backport Hybrid API to jdk25
#879: [hybrid] Hybrid API: native library tasks with cuBLAS/cuBLASLt provider
#861: [cuda] Add CUDA C backend via NVRTC to complement the PTX backend
#885: [OpenCL] Fix pointer-to-ulong conversions rejected by strict OpenCL compilers (Intel iGPUs)
#891: [hotfix][PTX] Align compileTask exception handling with other backends to preserve original exception on bailout-disabled path
#890: [hotfix][PTX] Align compileTask exception handling with other backends to preserve original exception on bailout-disabled path
#881: Port the CUDA backend (CUDA C / NVRTC) from develop to jdk25
#884: [OpenCL] Fix pointer-to-ulong conversions rejected by strict OpenCL compilers (Intel iGPUs)
#848: Replace heuristic brace-placement with structured control-flow recovery (OpenCL + Metal)
#849: [jdk25] Replace CFG block visitors with structured control-flow recovery (OpenCL + Metal)
#870: Fix nested persisted-object list on task-graph reuse and empty consume
#880: Backport Persist obj to jdk25
#876: Metal mma tiled jdk25
#866: Update level-zero library version to v1.18.1
#864: Update level-zero library version to v1.18.1
#834: Release 4.0.1-jdk25

TornadoVM 4.0.1-jdk21

29/04/26

Compatibility

#832: Adopt jdk21 version suffix on develop and update release workflows
#827: Bump org.apache.logging.log4j:log4j-core from 2.25.3 to 2.25.4

Bug Fixes

#831: [fix] F16 miscompilation for Metal, PTX, SPIR-V backends with JDK 25
#830: [fix] Fix F16 miscompilation in PTX and Metal backends
#829: [fix] Backport jdk25 OCL plugin fixes: 64-bit address arithmetic, node insertion, and atomics

Other Changes

#823: Update SDKMAN! versions for backend options
#825: [feat] Add workflow to automatically create mirroring PRs for JDK25
#824: Update master

TornadoVM 4.0.0

02/04/26

Improvements

#811: Add support for CUDA Graphs to replay bytecodes to reduce dispatch overhead

Bug Fixes

#817: [hotfix] Fix double-deletion of PiNode when multiple OffsetAddressNodes share the same PiNode

Other Changes

#813: Add SIMD Shuffle/Reduction Support to PTX Backend
#814: [docs] Update readme to inlcude metal
#819: Sync master with develop
#818: [fix] Make cmake to always set CMAKE_OSX_SYSROOT explicitly, making i…
#796: [JDK21] Add `Apple Metal` backend to run natively on Apple Silicon
#808: Add TornadoVM developer skill (build, test, debug, Java 21+ idioms) for Claude
#806: Release 3.0.0-jdk25
#807: Post release minor fixes for mvn deploy and readme budges

TornadoVM 3.0.0

24/02/26

Improvements

#790: [feat] Move –intellijinit from CLI to developer-only Makefile target with dynamic backend selection

Compatibility

#804: Refactor GH actions to split JDK21 and JDK25 testing, packaging and deployment
#777: Bump org.apache.logging.log4j:log4j-core from 2.17.1 to 2.25.3
#775: [docs] Revise TornadoVM installation instructions
#776: [feat] Add new action to push default tornadovm version

Bug Fixes

#785: [fix] Added scripts in dist directory to resolve issue with intellijinit

Other Changes

#803: Whitelist `TestInheritedFields` for non-OpenCL backends
#802: [fix] Handle Windows CRLF line endings in virtual device tests and native tests & Update Makefile.mak
#801: Add GitHub workflows for JDK 25 build, deployment, and release prepar…
#799: Add JDK 25.0.2 release automation workflows
#787: [refactor] Prepare compiler and API infrastructure for Jdk25 migration
#781: Fix OCLFieldBuffer to include inherited instance fields

TornadoVM 2.2.0

17/12/25

Improvements

#765: Add cross-platform SDK compatibility checks and fix launcher issues
#713: [ptx] Support for CUDA JIT compiler flags

Compatibility

#764: [cicd] Prevent workflows from running on forks

Other Changes

#773: [build] Replace TORNADO_SDK with TORNADOVM_HOME
#772: [docs] Refactor license table in README.md for clarity and conciseness
#771: [CI] Publish archives to sdkman action
#769: Update README.md for TornadoVM version 2.1.0 for SDKs

TornadoVM 2.1.0

09/12/25

Improvements

#754: Support to express Q8_0 tensors as Tornado ByteArray

Compatibility

#756: [CI] Add night workflow to build and test all supported JDKs (Zulu, OpenJDK, GraalVM, Corretto, Mandrel etc) on Linux x64 runner
#755: [docs] Refining README and simplifying instructions

Bug Fixes

#753: [hotfix] Fix Conversion Error from FP16 to FP32

Other Changes

#752: Update POM files: bump parent version to `2.0.1-dev` across all modules
#758: [CI] Pre and post release workflows to automate release deployments

TornadoVM 2.0.0

02/12/25

Improvements

#722: Simplify running tornadovm with a Java argfile.
#732: [types] Support for GPU-native Int8 types for PTX and OpenCL.
#736: Implement support for compressed oops (coops).
#738: [feat] Update TornadoVM to be packaged as SDK across multiple platforms.
#739: [feat] Zero-copy TornadoNativeArray type instances with shallow memory segments.
#740: Add support for byte and half-float arrays in local memory across all backends.
#748: Support FP32 to FP16 conversion across all backends.

Compatibility

#704: Bump org.apache.commons:commons-lang3 from 3.12.0 to 3.18.0 in tornado-benchmark.
#709: Fix Python dependency installation issues in installer.
#717: [feat] Added streamlit python dependency for compatibility with TornadoViz.

Bug Fixes

#705: [fix] Support for ShortCircuits in OpenCL and PTX.
#706: [fix] Codegen support for IntegerBelowNode & Fix of closing bracket in OpenCL.
#712: [fix] Fix for Loop Partial Unroll Phase.
#714: [fix] The differences CUDA 13 introduced to CUDA API cuCtxCreate.
#721: [fix] Integer overflow in TornadoNativeArray implementations that caused IllegalArgumentException when allocating large arrays.
#723: [fix] Prevent NullPointerException when trying to dump bytecodes during warm-up.
#746: [fix] Remove @ prefix from inline Truffle export flags.

Refactors & Infrastructure

#703: Updated build instructions for using a single thread for maven.
#708: [test] Mark unsupported tests for SPIRV.
#711: Add mvn test configuration to ease unit-testing when porting TornadoVM to third-party projects.
#716: [build-infra] Add missing checksums and script to generate checksum files for TornadoVM Maven artifacts.
#720: Increase default memory size on device.
#725: [refactor] Move Dynamic Reconfiguration to research features
#730: [build] Revamp build infrastructure by adding Maven wrapper mvnw.
#733: Move argfile generation to python from bash.
#742: [CI] Migrate build & test workflows (OpenCL, PTX, SPIR-V) to GitHub Actions for Linux (x64) and macOs(arm64).
#737: [tests] Marked Quantization Tests as whitelisted due to NVIDIA driver issues.
#743: [build] Use shared export‑list files instead of verbose inline –add-exports in pom.xml.
#744: Add licences in pom files - prep work for migrating release to maven central.
#745: [deploy] Add release profile in maven to prepare maven central release.
#747: [CI] Add deploy-maven-central workflow for deploying artifacts to Maven central.
#749: [CI] Add GPG key configuration and Maven settings to deploy-maven-central.

TornadoVM 1.1.1

07/07/25

Improvements

#657: Optimize to reuse the allocated buffers for batch processing.
#659: Fixed object state to be the one from the last executed TaskGraph.
#660: New PERSIST bytecode to improve object lifecycle tracking.
#661: Saving the TornadoVM Bytecodes in a log file.
#660: Distinguish the data transfer mode when logging the execution of the TRANSFER_TO_DEVICE_ONCE Bytecode.
#667: Update documentation of the TornadoVM flags.
#670: Refactoring of the Matrix4x4Float type.
#674: Updated project links in README.
#675: Avoid rescheduling IfNodes used for loop-bound evaluation.
#676: Added unit-tests for Transformer Compute Kernels.
#679: Added Matrix-Vector Row-Major compute example.
#683: Mark flash attention unittest unsupported for SPIR-V.
#684: Performance improvements for processing with Dynamic Reconfiguration.
#685: Dynamic reconfiguration refactored.
#686: New API Functions for warmup.
#693: Disabling fast math to support FMA in PTX.
#695: Update tornadovm-installer script to be interactive.
#696: Increase sizes for auxiliary data structures related with the number of Tasks in a TaskGraph.
#697: Added auto-deps mode in tornadovm-installer and restored backend and jdk console arguments.
#698: Update tornadovm-installer changes in README.

Compatibility

#668: Updated build instructions for RISC-V systems.

Bug Fixes

#664: Fix kernel name in PTX with sanitizer check.
#666: Fix GridScheduler for execution plans that have multiple TaskGraphs.
#671: Fix ANSI espace characters for logging TornadoVM Bytecodes.
#677: Fix 1.0/sqrt(x) replacement with native rsqrt(x) function.
#678: Fix profiling on macOS systems, regarding accessing UPS metrics.
#681: Fix closing bracket for flash attention.
#688: Fix state after warmup phase.

TornadoVM 1.1.0

31/03/25

Improvements

#620: Support of computation with mixed precision FP16 to FP32 for matrix operations.
#622: New API to allow buffer mapping between two different buffers on the hardware accelerator.
#624: Enhanced TornadoVM profiler with correct information for the UNDER_DEMAND transfer to host data.
#627: New feature to persist data on the hardware accelerator, and consume data already allocated on the hardware accelerator.
#630: Support for atomics using the kernel API for OpenCL and PTX backends.
#636: TornadoVM bytecode logging improved.
#642: Math functions extended: acosh and asinh supported for OpenCL and SPIR-V.
#645: Memory deallocations improved. Action by default when closing the TornadoExecutionPlan resource.

Compatibility

#625: Documentation to build on RISC-V updated.
#632: Add maven build with Single thread.
#633: Add tests for running multiple task graphs with different grid schedulers.
#638: Add tests to check force copy in buffers and persist buffers on the hardware accelerator.
#640: Rename XPUFuffer to FieldBuffer for all backends.
#649: Update the fast mode to live mode for testing.
#654: Add loop condition test in white list.

Bug Fixes

#626: Fix data accessors when using the UNDER_DEMAND transfer to host innovation from the task-graph.
#628: Device filtering API fixed to use device type and device names.
#635: Update nodes for local memory to be subtype of ValueNode instead of ConstantNode in the TornadoVM IR.
#639: Fix subgraph execution when combining with the GridScheduler.
#644: Fix TornadoVM execution frame setter.
#646: Fix shared memory buffers across task-graphs when no new allocation is present as new parameters for the following task-graphs.
#647: Fix UNDER_DEMAND invocation for the batch processor mode and read-write arrays.
#651: Fix memory mapping regions for the PTX Backend.
#653: Object repetition with shared buffers on ON_DEVICE bytecodes.

TornadoVM 1.0.10

31/01/25

Improvements

#608: Selective execution with multiple SPIR-V runtimes (either OpenCL, Intel Level Zero, or both) to unlock execution on RISC-V systems.
#611: Support of HalfFloat for Matrix Types (FP16 -> FP16).

Compatibility

#607: WSL installation and configuration updated for WSL Ubuntu 24 LTS and Windows 11.
#609: Documentation and patch for RISC-V64 updated.
#610: Maven dependency updated
#612: Re-enable colours in maven builds on Linux.

Bug Fixes

#606: Fix data sizes in benchmark suite.
#613: Fix code formatter.
#614: Fix flags for the benchmark pipeline in Jenkins.
#615: Fix code style based on the formatter.
#616: Fix atomics for the Kernel API and the OpenCL backend.

TornadoVM 1.0.9

20th December 2024

Improvements

#573: Enhanced output of unit-tests with a summary of pass-rates and fail-rates.
#576: Extended support for 3D matrices.
#580: Extended debug information for execution plans.
#584: Added helper menu for the tornado launcher script when no arguments are passed.
#589: Enable partial loop unrolling for all backends.
#594: Added RISC-V 64 CPU port support to run OpenCL with vector instructions RVV 1.0 (using the Codeplay OCK Toolkit).
#598: OpenCL low-level buffers tagged as read, write and read/write based on the data dependency analysis.
#601: Feature to select an immutable task graph to execute from a multi-task graph execution plan.

Compatibility

#570: Extended timeout for all suite of unit-tests.
#579: Removed legacy JDK 8 and JDK11 build options from the TornadoVM installer.
#582: Restored tornado runner scripts for IntellIJ.
#583: Automatic generation of IDE IntelliJ configuration runner files from the TornadoVM command.
#597: Updated white-list of unit-test and checkstyle improved.

Bug Fixes

#571: Fix issues with bracket closing for if/loops conditions.
#572: Fix for printing default execution plans (execution plans with default parameters).
#575: Fix the Level Zero version used for building the SPIR-V backend.
#577: Fix checkstyle.
#587: Fix thread scheduler for new NVIDIA Drivers.
#592: Fix Float.POSITIVE_INFINITY and Float.NEGATIVE_INFINITIVE constants for the OpenCL, CUDA and SPIR-V backends.
#596: Fix extra closing bracket during the code-generation for the FPGAs.
Remove the intermediate CUDA pinned memory regions in the JNI code: link
Fix bitwise negation operations for the PTX backend: link
GetBackendImpl::getAllDevices thread-safe: link
Check size elements for memory segments: link.

TornadoVM 1.0.8

30th September 2024

Improvements

#565: New API call in the Execution Plan to log/trace the executed configuration plans.
#563: Expand the TornadoVM profiler with Level Zero Sysman Energy Metrics.
#559: Refactoring Power Metric handlers for PTX and OpenCL.
#548: Benchmarking improvements.
#549: Prebuilt API tests added using multiple backend-setup.
Add internal tests for monitoring memory management (link).

Compatibility

#561: Build for OSx 14.6 and OSx 15 fixed.

Bug Fixes

#564: Jenkins configuration fixed to run KFusion per backend.
#562: Warmup action from the Execution Plan fixed to run with correct internal IDs.
#557: Shared Execution Plans Context fixed.
#553: OpenCL compiler flags for Intel Integrated GPUs fixed.
#552: Fixed runtime to select any device among multiple SPIR-V devices.
Fixed zero extend arithmetic operations: link

TornadoVM 1.0.7

30th August 2024

Improvements

#468: Cleanup Abstract Metadata Class.
#473: Add maven plugin to build TornadoVM source for the releases.
#474: Refactor <X>TornadoDevice to place common methods in the TornadoXPUInterface.
#482: Help messages improved when an out-of-memory exception is raised.
#484: Double-type for the trigonometric functions added in the TornadoMath class.
#487: Prebuilt API simplified.
#494: Add test to trigger unsupported features related to direct use of Memory Segments.
#509: Add a quick pass configuration to skip the heavy tests during active development.
#532: Improve thread scheduler to support RISC-V Accelerators from Codeplay.
#533: Support for scalar values to be passed via lambda expressions as tasks.
#538: README file updated.
#539: Refactor core classes and add new API methods to pass compilation flags to the low-level driver compilers (OpenCL, PTX and Level Zero).
#542: Tagged LevelZero JNI and Beehive Toolkit dependencies added in the build and installer.

Compatibility

#465: Support for JDK 22 and GraalVM 24.0.2.
#486: Temurin for Windows added in the list of supported JDKs.
#525: Revert usage of String Templates in preparation for JDK 23.
#527: SPIR-V version parameter added. TornadoVM may run previous SPIR-V versions (e.g., ComputeAorta from Codeplay).
#513: LevelZero JNI Library updated to v0.1.4.

Bug Fixes

#470: README documentation fixed.
#478: Fix the test names that are present in the white list.
#488: FP64 Kind for radian operations and the PTX backend fixed.
#493: Tests Whitelist for PTX backend fixed.
#502: Fix barrier type in the documentation regarding programmability of reductions.
#514: Installer script fixed.
#540: Fix issue with clean-up execution IDs function.
#541: Fix Data Accessors for the prebuilt API.
#543: Fix checkstyle condition and FP16 error message improved.

TornadoVM 1.0.6

27th June 2024

Improvements

#442: Support for multiple SPIR-V device versions (>= 1.2).
#444: Enabling automatic device memory clean-up after each run from the execution plan.
#448: API extension to query device memory consumption at the TaskGraph granularity.
#451: Option to select the default SPIR-V runtime.
#455: Refactoring the API and documentation updated.
#460: Refactoring all examples to use try-with-resources execution plans by default.
#462: Support for copy array references from private to private memory on the hardware accelerator.

Compatibility

#438: No writes for intermediate files to avoid permissions issues with Jenkins.
#440: Update Jenkinsfile for CI/CD testing.
#443: Level Zero and OpenCL runtimes for SPIR-V included in the Jenkins CI/CD.
#450: TornadoVM benchmark script improved to report dimensions and sizes.
#453: Update Jenkinsfile with regards to the runtime for SPIR-V.

Bug Fixes

#434: Fix for building TornadoVM on OSx after integration with SPIR-V binaries for OpenCL.
#441: Fix PTX unit-tests.
#446: Fix NVIDIA thread-block scheduler for new GPU drivers.
#447: Fix recompilation when batch processing is not triggered.
#463: Fix unit-tests for CPU virtual devices.

TornadoVM 1.0.5

26th May 2024

Improvements

#402: Support for TornadoNativeArrays from FFI buffers.
#403: Clean-up and refactoring for the code analysis of the loop-interchange.
#405: Disable Loop-Interchange for CPU offloading..
#407: Debugging OpenCL Kernels builds improved.
#410: CPU block scheduler disabled by default and option to switch between different thread-schedulers added.
#418: TornadoOptions and TornadoLogger improved.
#423: MxM using ns instead of ms to report performance.
#425: Vector types for Float<Width> and Int<Width> supported.
#429: Documentation of the installation process updated and improved.
#432: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime.

Compatibility

#409: Guidelines to build the documentation.
#411: Windows installer improved.
#412: Python installer improved to check download all Python dependencies before the main installer.
#413: Improved documentation for installing all configurations of backends and OS.
#424: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime.
#430: Improved the installer by checking that the TornadoVM environment is loaded upfront.

Bug Fixes

#400: Fix batch computation when the global thread indexes are used to compute the outputs.
#414: Recover Test-Field unit-tests using Panama types.
#415: Check style errors fixed.
#416: FPGA execution with multiple tasks in a task-graph fixed.
#417: Lazy-copy out fixed for Java fields.
#420: Fix Mandelbrot example.
#421: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs.
#422: Compilation for NVIDIA Jetson Nano fixed.
#426: Fix Logger for all backends.
#428: Math cos/sin operations supported for vector types.
#431: Jenkins files fixed.

TornadoVM 1.0.4

30th April 2024

Improvements

#369: Introduction of Tensor types in TornadoVM API and interoperability with ONNX Runtime.
#370 : Array concatenation operation for TornadoVM native arrays.
#371: TornadoVM installer script ported for Windows 10/11.
#372: Add support for HalfFloat (Float16) in vector types.
#374: Support for TornadoVM array concatenations from the constructor-level.
#375: Support for TornadoVM native arrays using slices from the Panama API.
#376: Support for lazy copy-outs in the batch processing mode.
#377: Expand the TornadoVM profiler with power metrics for NVIDIA GPUs (OpenCL and PTX backends).
#384: Auto-closable Execution Plans for automatic memory management.

Compatibility

#386: OpenJDK 17 support removed.
#390: SapMachine OpenJDK 21 supported.
#395: OpenJDK 22 and GraalVM 22.0.1 supported.
TornadoVM tested with Apple M3 chips.

Bug Fixes

#367: Fix for Graal/Truffle languages in which some Java modules were not visible.
#373: Fix for data copies of the HalfFloat types for all backends.
#378: Fix free memory markers when running multi-thread execution plans.
#379: Refactoring package of vector api unit-tests.
#380: Fix event list sizes to accommodate profiling of large applications.
#385: Fix code check style.
#387: Fix TornadoVM internal events in OpenCL, SPIR-V and PTX for running multi-threaded execution plans.
#388: Fix of expected and actual values of tests.
#392: Fix installer for using existing JDKs.
#389: Fix DataObjectState for multi-thread execution plans.
#396: Fix JNI code for the CUDA NVML library access with OpenCL.

TornadoVM 1.0.3

27th March 2024

Improvements

#344: Support for Multi-threaded Execution Plans.
#347: Enhanced examples.
#350: Obtain internal memory segment for the Tornado Native Arrays without the object header.
#357: API extensions to query and apply filters to backends and devices from the TornadoExecutionPlan.
#359: Support Factory Methods for FFI-based array collections to be used/composed in TornadoVM Task-Graphs.

Compatibility

#351: Compatibility of TornadoVM Native Arrays with the Java Vector API.
#352: Refactor memory limit to take into account primitive types and wrappers.
#354: Add DFT-sample benchmark in FP32.
#356: Initial support for Windows 11 using Visual Studio Development tools.
#361: Compatibility with the SPIR-V toolkit v0.0.4.
#366: Level Zero JNI Dependency updated to 0.1.3.

Bug Fixes

#346: Computation of local-work group sizes for the Level Zero/SPIR-V backend fixed.
#360: Fix native tests to check the JIT compiler for each backend.
#355: Fix custom exceptions when a driver/device is not found.

TornadoVM 1.0.2

29/02/2024

Improvements

#323: Set Accelerator Memory Limit per Execution Plan at the API level
#328: Javadoc API to run with concurrent devices and memory limits
#340: New API calls to enable threadInfo and printKernel from the Execution Plan API.
#334: Dynamically enable/disable profiler after first run

Compatibility

#337 : Initial support for Graal and JDK 21.0.2

Bug Fixes

#322: Fix duplicate thread-info debug message when the debug option is also enabled.
#325: Set/Get accesses for the MatrixVectorFloat4 type fixed
#326: Fix installation script for running with Python >= 3.12
#327: Fix Memory Limits for all supported Panama off-heap types.
#329: Fix timers for the dynamic reconfiguration policies
#330: Fix the profiler logs when silent mode is enabled
#332: Fix Batch processing when having multiple task-graphs in a single execution plan.

TornadoVM 1.0.1

30/01/2024

Improvements

#305: Under-demand data transfer for custom data ranges.
#313: Initial support for Half-Precision (FP16) data types.
#311: Enable Multi-Task Multiple Device (MTMD) model from the TornadoExecutionPlan API:
#315: Math Ceil function added

Compatibility/Integration

#294: Separation of the OpenCL Headers from the code base.
#297: Separation of the LevelZero JNI API in a separate repository.
#301: Temurin configuration supported.
#304: Refactor of the common phases for the JIT compiler.
#316: Beehive SPIR-V Toolkit version updated.

Bug Fixes

#298: OpenCL Codegen fixed open-close brackets.
#300: Python Dependencies fixed for AWS
#308: Runtime check for Grid-Scheduler names
#309: Fix check-style to support STR templates
#314: emit Vector16 Capability for 16-width vectors

TornadoVM 1.0

05/12/2023

Improvements

Brand-new API for allocating off-heap objects and array collections using the Panama Memory Segment API. - New Arrays, Matrix and Vector type objects are allocated using the Panama API. - Migration of existing applications to use the new Panama-based types: https://tornadovm.readthedocs.io/en/latest/offheap-types.html
Handling of the TornadoVM’s internal bytecode improved to avoid write-only copies from host to device.
cospi and sinpi math operations supported for OpenCL, PTX and SPIR-V.
Vector 16 data types supported for float, double and int.
Support for Mesa’s rusticl.
Device default ordering improved based on maximum thread size.
Move all the installation and configuration scripts from Bash to Python.
The installation process has been improved for Linux and OSx with M1/M2 chips.
Documentation improved.
Add profiling information for the testing scripts.

Compatibility/Integration

Integration with the Graal 23.1.0 JIT Compiler.
Integration with OpenJDK 21.
Integration with Truffle Languages (Python, Ruby and Javascript) using Graal 23.1.0.
TornadoVM API Refactored.
Backport bug-fixes for branch using OpenJDK 17: master-jdk17

Bug fixes:

Multiple SPIR-V Devices fixed.
Runtime Exception when no SPIR-V devices are present.
Issue with the kernel context API when invoking multiple kernels fixed.
MTMD mode is fixed when running multiple backends on the same device.
long type as a constant parameter for a kernel fixed.
FPGA Compilation and Execution fixed for AWS and Xilinx devices.
Batch processing fixed for different data types of the same size.

TornadoVM 0.15.2

26/07/2023

Improvements

Initial Support for Multi-Tasks on Multiple Devices (MTMD): This mode enables the execution of multiple independent tasks on more than one hardware accelerators. Documentation in link: https://tornadovm.readthedocs.io/en/latest/multi-device.html
Support for trigonometric radian, cospi and sinpi functions for the OpenCL/PTX and SPIR-V backends.
Clean-up Java modules not being used and TornadoVM core classes refactored.

Compatibility/Integration

Initial integration with ComputeAorta (part of the Codeplay’s oneAPI Construction Kit for RISC-V) to run on RISC-V with Vector Instructions (OpenCL backend) in emulation mode.
Beehive SPIR-V Toolkit dependency updated.
Tests for prebuilt SPIR-V kernels fixed to dispatch SPIR-V binaries through the Level Zero and OpenCL runtimes.
Deprecated javac.py script removed.

Bug fixes:

TornadoVM OpenCL Runtime throws an exception when the detected hardware does not support FP64.
Fix the installer for the older Apple with the x86 architecture using AMD GPUs.
Installer for ARM based systems fixed.
Installer fixed for Microsoft WSL and NVIDIA GPUs.
OpenCL code generator fixed to avoid using the reserved OpenCL keywords from Java function parameters.
Dump profiler option fixed.

TornadoVM 0.15.1

15/05/2023

Improvements

Introduction of a device selection heuristic based on the computing capabilities of devices. TornadoVM selects, as the default device, the fastest device based on its computing capability.
Optimisation of removing redundant data copies for Read-Only and Write-Only buffers from between the host (CPU) and the device (GPU) based on the Tornado Data Flow Graph.
New installation script for TornadoVM.
Option to dump the TornadoVM bytecodes for the unit tests.
Full debug option improved. Use --fullDebug.

Compatibility/Integration

Integration and compatibility with the Graal 22.3.2 JIT Compiler.
Improved compatibility with Apple M1 and Apple M2 through the OpenCL Backend.
GraalVM/Truffle programs integration improved. Use --truffle in the tornado script to run guest programs with Truffle. Example: tornado --truffle python myProgram.py Full documentation: https://tornadovm.readthedocs.io/en/latest/truffle-languages.html

Bug fixes:

Documentation that resets the device’s memory: https://github.com/beehive-lab/TornadoVM/blob/master/tornado-api/src/main/java/uk/ac/manchester/tornado/api/TornadoExecutionPlan.java#L282
Append the Java CLASSPATH to the cp option from the tornado script.
Dependency fixed for the cmake-maven plugin fixed for ARM-64 arch.
Fixed the automatic installation for Apple M1/M2 and ARM-64 and NVIDIA Jetson nano computing systems.
Integration with IGV fixed. Use the --igv option for the tornado and tornado-test scripts.

TornadoVM 0.15

27/01/2023

Improvements

New TornadoVM API:
- API refactoring (TaskSchedule has been renamed to TaskGraph)
- Introduction of the Immutable TaskGraphs
- Introduction of the TornadoVM Execution Plans: (TornadoExecutionPlan)
- The documentation of migration of existing TornadoVM applications to the new API can be found here: https://tornadovm.readthedocs.io/en/latest/programming.html#migration-to-tornadovm-v0-15
Launch a new website https://tornadovm.readthedocs.io/en/latest/ for the documentation
Improved documentation
Initial support for Intel ARC discrete GPUs.
Improved TornadoVM installer for Linux
ImprovedTornadoVM launch script with optional parameters
Support of large buffer allocations with Intel Level Zero. Use: tornado.spirv.levelzero.extended.memory=True

Bug fixes:

Vector and Matrix types
TornadoVM Floating Replacement compiler phase fixed
Fix CMAKE for Intel ARC GPUs
Device query tool fixed for the PTX backend
Documentation for Windows 11 fixed

TornadoVM 0.14.1

29/09/2022

Improvements

The tornado command is replaced from a Bash to a Python script.
- Use tornado --help to check the new options and examples.
Support of native tests for the SPIR-V backend.
Improvement of the OpenCL and PTX tests of the internal APIs.

Compatibility/Integration

Integration and compatibility with the Graal 22.2.0 JIT Compiler.
Compatibility with JDK 18 and JDK 19.
Compatibility with Apple M1 Pro using the OpenCL backend.

Bug Fixes

CUDA PTX generated header fixed to target NVIDIA 30xx GPUs and CUDA 11.7.
The signature of generated PTX kernels fixed for NVIDIA driver >= 510 and 30XX GPUs when using the TornadoVM Kernel API.
Tests of virtual OpenCL devices fixed.
Thread deployment information for the OpenCL backend is fixed.
TornadoVMRuntimeCI moved to TornadoVMRutimeInterface.

TornadoVM 0.14

15/06/2022

New Features

New device memory management for addressing the memory allocation limitations of OpenCL and enabling pinned memory of device buffers.
- The execution of task-schedules will still automatically allocate/deallocate memory every time a task-schedule is executed, unless lock/unlock functions are invoked explicitly at the task-schedule level.
- One heap per device has been replaced with a device buffer per input variable.
- A new API call has been added for releasing memory: unlockObjectFromMemory
- A new API call has been added for locking objects to the device: lockObjectInMemory This requires the user to release memory by invoking unlockObjectFromMemory at the task-schedule level.
Enhanced Live Task migration by supporting multi-backend execution (PTX <-> OpenCL <-> SPIR-V).

Compatibility/Integration

Integration with the Graal 22.1.0 JIT Compiler
JDK 8 deprecated
Azul Zulu JDK supported
OpenCL 2.1 as a default target for the OpenCL Backend
Single Docker Image for Intel XPU platforms, including the SPIR-V backend (using the Intel Integrated Graphics), and OpenCL (using the Intel Integrated Graphics, Intel CPU and Intel FPGA in emulation mode). Image: https://github.com/beehive-lab/docker-tornado#intel-integrated-graphics

Improvements/Bug Fixes

SIGNUM Math Function included for all three backends.
SPIR-V optimizer enabled by default (3x reduce in binary size).
Extended Memory Mode enabled for the SPIR-V Backend via Level Zero.
Phi instructions fixed for the SPIR-V Backend.
SPIR-V Vector Select instructions fixed.
Duplicated IDs for Non-Inlined SPIR-V Functions fixed.
Refactoring of the TornadoVM Math Library.
FPGA Configuration files fixed.
Bitwise operations for OpenCL fixed.
Code Generation Times and Backend information are included in the profiling info.

TornadoVM 0.13

21/03/2022

Integration with JDK 17 and Graal 21.3.0
- JDK 11 is the default version and the support for the JDK 8 has been deprecated
Support for extended intrinsics regarding math operations
Native functions are enabled by default
Support for 2D arrays for PTX and SPIR-V backends:
- https://github.com/beehive-lab/TornadoVM/commit/2ef32ca97941410672720f9dfa15f0151ae2a1a1
Integer Test Move operation supported:
- https://github.com/beehive-lab/TornadoVM/pull/177
Improvements in the SPIR-V Backend:
- Experimental SPIR-V optimizer. Binary size reduction of up to 3x
  - https://github.com/beehive-lab/TornadoVM/commit/394ca94dcdc3cb58d15a17046e1d22c6389b55b7
- Fix malloc functions for Level-Zero
- Support for pre-built SPIR-V binary modules using the TornadoVM runtime for OpenCL
- Performance increase due to cached buffers on GPUs by default
- Disassembler option for SPIR-V binary modules. Use --printKernel
Improved Installation:
- Full automatic installer script integrated
Documentation about the installation for Windows 11
Refactoring and several bug fixes
- https://github.com/beehive-lab/TornadoVM/commit/57694186b42ec28b16066fb549ab8fcf9bff9753
- Vector types fixed:
  - https://github.com/beehive-lab/TornadoVM/pull/181/files
  - https://github.com/beehive-lab/TornadoVM/commit/004d61d6d26945b45ebff66641b60f90f00486be
- Fix AtomicInteger get for OpenCL:
  - https://github.com/beehive-lab/TornadoVM/pull/177
Dependencies for Math3 and Lang3 updated

TornadoVM 0.12

17/11/2021

New backend: initial support for SPIR-V and Intel Level Zero
- Level-Zero dispatcher for SPIR-V integrated
- SPIR-V Code generator framework for Java
Benchmarking framework improved to accommodate all three backends
Driver metrics, such as kernel time and data transfers included in the benchmarking framework
TornadoVM profiler improved:
- Command line options added: --enableProfiler <silent|console> and --dumpProfiler <jsonFile>
- Logging improve for debugging purposes. JIT Compiler, JNI calls and code generation
New math intrinsincs operations supported
Several bug fixes:
- Duplicated barriers removed. TornadoVM BARRIER bytecode fixed when running multi-context
- Copy in when having multiple reductions fixed
- TornadoVM profiler fixed for multiple context switching (device switching)
Pretty printer for device information

TornadoVM 0.11

29/09/2021

TornadoVM JIT Compiler upgrade to work with Graal 21.2.0 and JDK 8 with JVMCI 21.2.0
Refactoring of the Kernel Parallel API for Heterogeneous Programming:
- Methods getLocalGroupSize(index) and getGlobalGroupSize moved to public fields to keep consistency with the rest of the thread properties within the KernelContext class.
  - Changeset: https://github.com/beehive-lab/TornadoVM/commit/e1ebd66035d0722ca90eb0121c55dbc744840a74
Compiler update to register the global number of threads: https://github.com/beehive-lab/TornadoVM/pull/133/files
Simplification of the TornadoVM events handler: https://github.com/beehive-lab/TornadoVM/pull/135/files
Renaming the Profiler API method from event.getExecutionTime to event.getElapsedTime: https://github.com/beehive-lab/TornadoVM/pull/134
Deprecating OCLWriteNode and PTXWriteNode and fixing stores for bytes: https://github.com/beehive-lab/TornadoVM/pull/131
Refactoring of the FPGA IR extensions, from the high-tier to the low-tier of the JIT compiler
- Utilizing the FPGA Thread-Attributes compiler phase for the FPGA execution
- Using the GridScheduler object (if present) or use a default value (e.g., 64, 1, 1) for defining the FPGA OpenCL local workgroup
Several bugs fixed:
- Codegen for sequential kernels fixed
- Function parameters with non-inlined method calls fixed

TornadoVM 0.10

29/06/2021

TornadoVM JIT Compiler sync with Graal 21.1.0
Experimental support for OpenJDK 16
Tracing the TornadoVM thread distribution and device information with a new option --threadInfo instead of --debug
Refactoring of the new API:
- TornadoVMExecutionContext renamed to KernelContext
- GridTask renamed to GridScheduler
AWS F1 AMI version upgraded to 1.10.0 and automated the generation of AFI image
Xilinx OpenCL backend expanded with:
- 1. Initial integration of Xilinx OpenCL attributes for loop
    pipelining in the TornadoVM compiler
- 1. Support for multiple compute units
Logging FPGA compilation option added to dump FPGA HLS compilation to a file
TornadoVM profiler enhanced for including data transfers for the stack-frame and kernel dispatch time
Initial support for 2D Arrays added
Several bug fixes and stability support for the OpenCL and PTX backends

TornadoVM 0.9

15/04/2021

Expanded API for expressing kernel parallelism within Java. It can work with the existing loop parallelism in TornadoVM.
- Direct access to thread-ids, OpenCL local memory (PTX shared memory), and barriers
- TornadoVMContext added:
  
  See https://github.com/beehive-lab/TornadoVM/blob/5bcd3d6dfa2506032322c32d72b7bbd750623a95/tornado-api/src/main/java/uk/ac/manchester/tornado/api/TornadoVMContext.java
- Code examples:
  - https://github.com/beehive-lab/TornadoVM/tree/master/examples/src/main/java/uk/ac/manchester/tornado/examples/tornadovmcontext
- Documentation:
  - https://github.com/beehive-lab/TornadoVM/blob/master/assembly/src/docs/21_TORNADOVM_CONTEXT.md
Profiler integrated with Chrome debug:
- Use flags: -Dtornado.chrome.event.tracer.enabled=True -Dtornado.chrome.event.tracer.filename=userFile.json
- See https://github.com/beehive-lab/TornadoVM/pull/41
Added support for Windows 10:
- See https://github.com/beehive-lab/TornadoVM/blob/develop/assembly/src/docs/20_INSTALL_WINDOWS_WITH_GRAALVM.md
TornadoVM running with Windows JDK 11 supported (Linux & Windows)
Xilinx FPGAs workflow supported for Vitis 2020.2
Pre-compiled tasks for Xilinx/Intel FPGAs fixed
Slambench fixed when compiling for PTX and OpenCL backends
Several bug fixes for the runtime, JIT compiler and data management.

TornadoVM 0.8

19/11/2020

Added PTX backend for NVIDIA GPUs
- Build TornadoVM using make BACKEND=ptx,opencl to obtain the two supported backends.
TornadoVM JIT Compiler aligned with Graal 20.2.0
Support for other JDKs:
- Red Hat Mandrel 11.0.9
- Amazon Coretto 11.0.9
- GraalVM LabsJDK 11.0.8
- OpenJDK 11.0.8
- OpenJDK 12.0.2
- OpenJDK 13.0.2
- OpenJDK 14.0.2
Support for hybrid (CPU-GPU) parallel reductions
New API for generic kernel dispatch. It introduces the concept of WorkerGrid and GridTask
- A WorkerGrid is an object that stores how threads are organized on an OpenCL device: java WorkerGrid1D worker1D = new WorkerGrid1D(4096);
- A GridTask is a map that relates a task-name with a worker-grid. java GridTask gridTask = new GridTask(); gridTask.set("s0.t0", worker1D);
- A TornadoVM Task-Schedule can be executed using a GridTask: java ts.execute(gridTask);
- More info: link
TornadoVM profiler improved
- Profiler metrics added
- Code features per task-graph
Lazy device initialisation moved to early initialisation of PTX and OpenCL devices
Initial support for Atomics (OpenCL backend)
- Link to examples
Task Schedules with 11-14 parameters supported
Documentation improved
Bug fixes for code generation, numeric promotion, basic block traversal, Xilinx FPGA compilation.

TornadoVM 0.7

22/06/2020

Support for ARM Mali GPUs.
Support parallel reductions on FPGAs
Agnostic FPGA vendor compilation via configuration files (Intel & Xilinx)
Support for AWS on Xilinx FPGAs
Recompilation for different input data sizes supported
New TornadoVM API calls:
1. Update references for re-compilation: taskSchedule.updateReferences(oldRef, newRef);
2. Use the default OpenCL scheduler: taskSchedule.useDefaultThreadScheduler(true);
Use of JMH for benchmarking
Support for Fused Multiply-Add (FMA) instructions
Easy-selection of different devices for unit-tests tornado-test.py -V --debug -J"-Dtornado.unittests.device=0:1"
Bailout mechanism improved from parallel to sequential
Improve thread scheduling
Support for private memory allocation
Assertion mode included
Documentation improved
Several bug fixes

TornadoVM 0.6

21/02/2020

TornadoVM compatible with GraalVM 19.3.0 using JDK 8 and JDK 11
TornadoVM compiler update for using Graal 19.3.0 compiler API
Support for dynamic languages on top of Truffle
- examples
Support for multiple tasks per task-schedule on FPGAs (Intel and Xilinx)
Support for OSX Mojave and Catalina
Task-schedule name handling for FPGAs improved
Exception handling improved
Reductions for long type supported
Bug fixes for ternary conditions, reductions and code generator
Documentation improved

TornadoVM 0.5

16/12/2019

Initial support for Xilinx FPGAs
TornadoVM API classes are now Serializable
Initial support for local memory for reductions
JVMCI built with local annotation patch removed. Now TornadoVM requires unmodified JDK8 with JVMCI support
Support of multiple reductions within the same task-schedules
Emulation mode on Intel FPGAs is fixed
Fix reductions on Intel Integrated Graphics
TornadoVM driver OpenCL initialization and OpenCL code cache improved
Refactoring of the FPGA execution modes (full JIT and emulation modes improved).

TornadoVM 0.4

14/10/2019

Profiler supported
- Use -Dtornado.profiler=True to enable profiler
- Use -Dtornado.profiler=True -Dtornado.profiler.save=True to dump the profiler logs
Feature extraction added
- Use -Dtornado.feature.extraction=True to enable code extraction features
Mac OSx support
Automatic reductions composition (map-reduce) within the same task-schedule
Bug related to a memory leak when running on GPUs solved
Bug fixes and stability improvements

TornadoVM 0.3

22/07/2019

New Matrix 2D and Matrix 3D classes with type specializations.
New API-call TaskSchedule#batch for batch processing. It allows programmers to run with more data than the maximum capacity of the accelerator by creating batches of executions.
FPGA full automatic compilation pipeline.
FPGA options simplified:
- -Dtornado.precompiled.binary=<binary> for loading the bitstream.
- -Dtornado.opencl.userelative=True for using relative addresses.
- -Dtornado.opencl.codecache.loadbin=True removed.
Reductions support enhanced and fully automated on GPUs and CPUs.
Initial support for reductions on FPGAs.
Initial API for profiling tasks integrated.

TornadoVM 0.2

25/02/2019

Rename to TornadoVM
Device selection for better performance (CPU, multi-core, GPU, FPGA) via an API for Dynamic Reconfiguration
- Added methods executeWithProfiler and executeWithProfilerSequential with an input policy.
- Policies: Policy.PERFORMANCE, Policy.END_2_END, and Policy.LATENCY implemented.
Basic heuristic for predicting the highest performing target device with Dynamic Reconfiguration
Initial FPGA integration for Altera FPGAs:
- Full JIT compilation mode
- Ahead of time compilation mode
- Emulation/debug mode
FPGA JIT compiler specializations
Added support for Java reductions:
- Compiler specializations for CPU and GPU reductions
Performance and stability fixes

Tornado 0.1.0

07/09/2018

Initial Implementation of the Tornado compiler
Initial GPU/CPU code generation for OpenCL
Initial support in the runtime to execute OpenCL programs generated by the Tornado JIT compiler
Initial Tornado-API release (@Parallel Java annotation and TaskSchedule API)
Multi-GPU enabled through multiple tasks-schedules