.. _changelog: TornadoVM Changelog =================== This file summarizes the new features and major changes for each *TornadoVM* version. CHANGELOG TornadoVM 4.0.0 --------------- 02/04/26 Improvements ~~~~~~~~~~~~ - `#811 `_: Add support for CUDA Graphs to replay bytecodes to reduce dispatch overhead Bug Fixes ~~~~~~~~~~~~ - `#817 `_: [hotfix] Fix double-deletion of PiNode when multiple OffsetAddressNodes share the same PiNode Other Changes ~~~~~~~~~~~~ - `#813 `_: Add SIMD Shuffle/Reduction Support to PTX Backend - `#814 `_: [docs] Update readme to inlcude metal - `#819 `_: Sync master with develop - `#818 `_: [fix] Make cmake to always set CMAKE_OSX_SYSROOT explicitly, making i… - `#796 `_: [JDK21] Add \`Apple Metal\` backend to run natively on Apple Silicon - `#808 `_: Add TornadoVM developer skill (build, test, debug, Java 21+ idioms) for Claude - `#806 `_: Release 3.0.0-jdk25 - `#807 `_: Post release minor fixes for mvn deploy and readme budges TornadoVM 3.0.0 --------------- 24/02/26 Improvements ~~~~~~~~~~~~ - `#790 `_: [feat] Move --intellijinit from CLI to developer-only Makefile target with dynamic backend selection Compatibility ~~~~~~~~~~~~ - `#804 `_: Refactor GH actions to split JDK21 and JDK25 testing, packaging and deployment - `#777 `_: Bump org.apache.logging.log4j:log4j-core from 2.17.1 to 2.25.3 - `#775 `_: [docs] Revise TornadoVM installation instructions - `#776 `_: [feat] Add new action to push default tornadovm version Bug Fixes ~~~~~~~~~~~~ - `#785 `_: [fix] Added scripts in dist directory to resolve issue with intellijinit Other Changes ~~~~~~~~~~~~ - `#803 `_: Whitelist \`TestInheritedFields\` for non-OpenCL backends - `#802 `_: [fix] Handle Windows CRLF line endings in virtual device tests and native tests & Update Makefile.mak - `#801 `_: Add GitHub workflows for JDK 25 build, deployment, and release prepar… - `#799 `_: Add JDK 25.0.2 release automation workflows - `#787 `_: [refactor] Prepare compiler and API infrastructure for Jdk25 migration - `#781 `_: Fix OCLFieldBuffer to include inherited instance fields TornadoVM 2.2.0 --------------- 17/12/25 Improvements ~~~~~~~~~~~~ - `#765 `_: Add cross-platform SDK compatibility checks and fix launcher issues - `#713 `_: [ptx] Support for CUDA JIT compiler flags Compatibility ~~~~~~~~~~~~ - `#764 `_: [cicd] Prevent workflows from running on forks Other Changes ~~~~~~~~~~~~ - `#773 `_: [build] Replace TORNADO_SDK with TORNADOVM_HOME - `#772 `_: [docs] Refactor license table in README.md for clarity and conciseness - `#771 `_: [CI] Publish archives to sdkman action - `#769 `_: Update README.md for TornadoVM version 2.1.0 for SDKs TornadoVM 2.1.0 --------------- 09/12/25 Improvements ~~~~~~~~~~~~ - `#754 `_: Support to express Q8_0 tensors as Tornado ByteArray Compatibility ~~~~~~~~~~~~ - `#756 `_: [CI] Add night workflow to build and test all supported JDKs (Zulu, OpenJDK, GraalVM, Corretto, Mandrel etc) on Linux x64 runner - `#755 `_: [docs] Refining README and simplifying instructions Bug Fixes ~~~~~~~~~~~~ - `#753 `_: [hotfix] Fix Conversion Error from FP16 to FP32 Other Changes ~~~~~~~~~~~~ - `#752 `_: Update POM files: bump parent version to \`2.0.1-dev\` across all modules - `#758 `_: [CI] Pre and post release workflows to automate release deployments TornadoVM 2.0.0 --------------- 02/12/25 Improvements ~~~~~~~~~~~~ - `#722 `_: Simplify running tornadovm with a Java argfile. - `#732 `_: [types] Support for GPU-native Int8 types for PTX and OpenCL. - `#736 `_: Implement support for compressed oops (coops). - `#738 `_: [feat] Update TornadoVM to be packaged as SDK across multiple platforms. - `#739 `_: [feat] Zero-copy TornadoNativeArray type instances with shallow memory segments. - `#740 `_: Add support for byte and half-float arrays in local memory across all backends. - `#748 `_: Support FP32 to FP16 conversion across all backends. Compatibility ~~~~~~~~~~~~ - `#704 `_: Bump org.apache.commons:commons-lang3 from 3.12.0 to 3.18.0 in tornado-benchmark. - `#709 `_: Fix Python dependency installation issues in installer. - `#717 `_: [feat] Added streamlit python dependency for compatibility with TornadoViz. Bug Fixes ~~~~~~~~~~~~ - `#705 `_: [fix] Support for ShortCircuits in OpenCL and PTX. - `#706 `_: [fix] Codegen support for IntegerBelowNode & Fix of closing bracket in OpenCL. - `#712 `_: [fix] Fix for Loop Partial Unroll Phase. - `#714 `_: [fix] The differences CUDA 13 introduced to CUDA API cuCtxCreate. - `#721 `_: [fix] Integer overflow in TornadoNativeArray implementations that caused IllegalArgumentException when allocating large arrays. - `#723 `_: [fix] Prevent NullPointerException when trying to dump bytecodes during warm-up. - `#746 `_: [fix] Remove @ prefix from inline Truffle export flags. Refactors & Infrastructure ~~~~~~~~~~~~~~~~~~~~~~~~~~ - `#703 `_: Updated build instructions for using a single thread for maven. - `#708 `_: [test] Mark unsupported tests for SPIRV. - `#711 `_: Add mvn test configuration to ease unit-testing when porting TornadoVM to third-party projects. - `#716 `_: [build-infra] Add missing checksums and script to generate checksum files for TornadoVM Maven artifacts. - `#720 `_: Increase default memory size on device. - `#725 `_: [refactor] Move Dynamic Reconfiguration to research features - `#730 `_: [build] Revamp build infrastructure by adding Maven wrapper mvnw. - `#733 `_: Move argfile generation to python from bash. - `#742 `_: [CI] Migrate build & test workflows (OpenCL, PTX, SPIR-V) to GitHub Actions for Linux (x64) and macOs(arm64). - `#737 `_: [tests] Marked Quantization Tests as whitelisted due to NVIDIA driver issues. - `#743 `_: [build] Use shared export‑list files instead of verbose inline --add-exports in pom.xml. - `#744 `_: Add licences in pom files - prep work for migrating release to maven central. - `#745 `_: [deploy] Add release profile in maven to prepare maven central release. - `#747 `_: [CI] Add deploy-maven-central workflow for deploying artifacts to Maven central. - `#749 `_: [CI] Add GPG key configuration and Maven settings to deploy-maven-central. TornadoVM 1.1.1 --------------- 07/07/25 Improvements ~~~~~~~~~~~~ - `#657 `_: Optimize to reuse the allocated buffers for batch processing. - `#659 `_: Fixed object state to be the one from the last executed TaskGraph. - `#660 `_: New ``PERSIST`` bytecode to improve object lifecycle tracking. - `#661 `_: Saving the TornadoVM Bytecodes in a log file. - `#660 `_: Distinguish the data transfer mode when logging the execution of the ``TRANSFER_TO_DEVICE_ONCE`` Bytecode. - `#667 `_: Update documentation of the TornadoVM flags. - `#670 `_: Refactoring of the ``Matrix4x4Float`` type. - `#674 `_: Updated project links in ``README``. - `#675 `_: Avoid rescheduling ``IfNodes`` used for loop-bound evaluation. - `#676 `_: Added unit-tests for Transformer Compute Kernels. - `#679 `_: Added Matrix-Vector Row-Major compute example. - `#683 `_: Mark flash attention unittest unsupported for SPIR-V. - `#684 `_: Performance improvements for processing with Dynamic Reconfiguration. - `#685 `_: Dynamic reconfiguration refactored. - `#686 `_: New API Functions for warmup. - `#693 `_: Disabling fast math to support FMA in PTX. - `#695 `_: Update ``tornadovm-installer`` script to be interactive. - `#696 `_: Increase sizes for auxiliary data structures related with the number of Tasks in a TaskGraph. - `#697 `_: Added auto-deps mode in ``tornadovm-installer`` and restored backend and jdk console arguments. - `#698 `_: Update tornadovm-installer changes in README. Compatibility ~~~~~~~~~~~~ - `#668 `_: Updated build instructions for RISC-V systems. Bug Fixes ~~~~~~~~~~~~ - `#664 `_: Fix kernel name in PTX with sanitizer check. - `#666 `_: Fix GridScheduler for execution plans that have multiple TaskGraphs. - `#671 `_: Fix ANSI espace characters for logging TornadoVM Bytecodes. - `#677 `_: Fix 1.0/sqrt(x) replacement with native rsqrt(x) function. - `#678 `_: Fix profiling on macOS systems, regarding accessing UPS metrics. - `#681 `_: Fix closing bracket for flash attention. - `#688 `_: Fix state after warmup phase. TornadoVM 1.1.0 --------------- 31/03/25 Improvements ~~~~~~~~~~~~ - `#620 `_: Support of computation with mixed precision ``FP16`` to ``FP32`` for matrix operations. - `#622 `_: New API to allow buffer mapping between two different buffers on the hardware accelerator. - `#624 `_: Enhanced TornadoVM profiler with correct information for the ``UNDER_DEMAND`` transfer to host data. - `#627 `_: New feature to persist data on the hardware accelerator, and consume data already allocated on the hardware accelerator. - `#630 `_: Support for atomics using the kernel API for OpenCL and PTX backends. - `#636 `_: TornadoVM bytecode logging improved. - `#642 `_: Math functions extended: ``acosh`` and ``asinh`` supported for OpenCL and SPIR-V. - `#645 `_: Memory deallocations improved. Action by default when closing the ``TornadoExecutionPlan`` resource. Compatibility ~~~~~~~~~~~~ - `#625 `_: Documentation to build on RISC-V updated. - `#632 `_: Add maven build with Single thread. - `#633 `_: Add tests for running multiple task graphs with different grid schedulers. - `#638 `_: Add tests to check force copy in buffers and persist buffers on the hardware accelerator. - `#640 `_: Rename XPUFuffer to FieldBuffer for all backends. - `#649 `_: Update the fast mode to live mode for testing. - `#654 `_: Add loop condition test in white list. Bug Fixes ~~~~~~~~~~~~ - `#626 `_: Fix data accessors when using the ``UNDER_DEMAND`` transfer to host innovation from the task-graph. - `#628 `_: Device filtering API fixed to use device type and device names. - `#635 `_: Update nodes for local memory to be subtype of ``ValueNode`` instead of ``ConstantNode`` in the TornadoVM IR. - `#639 `_: Fix subgraph execution when combining with the ``GridScheduler``. - `#644 `_: Fix TornadoVM execution frame setter. - `#646 `_: Fix shared memory buffers across task-graphs when no new allocation is present as new parameters for the following task-graphs. - `#647 `_: Fix ``UNDER_DEMAND`` invocation for the batch processor mode and read-write arrays. - `#651 `_: Fix memory mapping regions for the PTX Backend. - `#653 `_: Object repetition with shared buffers on ``ON_DEVICE`` bytecodes. TornadoVM 1.0.10 --------------- 31/01/25 Improvements ~~~~~~~~~~~~ - `#608 `_: Selective execution with multiple SPIR-V runtimes (either OpenCL, Intel Level Zero, or both) to unlock execution on RISC-V systems. - `#611 `_: Support of ``HalfFloat`` for Matrix Types (``FP16`` -> ``FP16``). Compatibility ~~~~~~~~~~~~ - `#607 `_: WSL installation and configuration updated for WSL Ubuntu 24 LTS and Windows 11. - `#609 `_: Documentation and patch for RISC-V64 updated. - `#610 `_: Maven dependency updated - `#612 `_: Re-enable colours in maven builds on Linux. Bug Fixes ~~~~~~~~~~~~ - `#606 `_: Fix data sizes in benchmark suite. - `#613 `_: Fix code formatter. - `#614 `_: Fix flags for the benchmark pipeline in Jenkins. - `#615 `_: Fix code style based on the formatter. - `#616 `_: Fix atomics for the Kernel API and the OpenCL backend. TornadoVM 1.0.9 --------------- 20th December 2024 Improvements ~~~~~~~~~~~~ - `#573 `_: Enhanced output of unit-tests with a summary of pass-rates and fail-rates. - `#576 `_: Extended support for 3D matrices. - `#580 `_: Extended debug information for execution plans. - `#584 `_: Added helper menu for the ``tornado`` launcher script when no arguments are passed. - `#589 `_: Enable partial loop unrolling for all backends. - `#594 `_: Added RISC-V 64 CPU port support to run OpenCL with vector instructions RVV 1.0 (using the Codeplay OCK Toolkit). - `#598 `_: OpenCL low-level buffers tagged as read, write and read/write based on the data dependency analysis. - `#601 `_: Feature to select an immutable task graph to execute from a multi-task graph execution plan. Compatibility ~~~~~~~~~~~~~ - `#570 `_: Extended timeout for all suite of unit-tests. - `#579 `_: Removed legacy JDK 8 and JDK11 build options from the TornadoVM installer. - `#582 `_: Restored tornado runner scripts for IntellIJ. - `#583 `_: Automatic generation of IDE IntelliJ configuration runner files from the TornadoVM command. - `#597 `_: Updated white-list of unit-test and checkstyle improved. Bug Fixes ~~~~~~~~~ - `#571 `_: Fix issues with bracket closing for if/loops conditions. - `#572 `_: Fix for printing default execution plans (execution plans with default parameters). - `#575 `_: Fix the Level Zero version used for building the SPIR-V backend. - `#577 `_: Fix checkstyle. - `#587 `_: Fix thread scheduler for new NVIDIA Drivers. - `#592 `_: Fix ``Float.POSITIVE_INFINITY`` and ``Float.NEGATIVE_INFINITIVE`` constants for the OpenCL, CUDA and SPIR-V backends. - `#596 `_: Fix extra closing bracket during the code-generation for the FPGAs. - Remove the intermediate CUDA pinned memory regions in the JNI code: `link `_ - Fix bitwise negation operations for the PTX backend: `link `_ - ``GetBackendImpl::getAllDevices`` thread-safe: `link `_ - Check size elements for memory segments: `link `_. TornadoVM 1.0.8 --------------- 30th September 2024 Improvements ~~~~~~~~~~~~ - `#565 `_: New API call in the Execution Plan to log/trace the executed configuration plans. - `#563 `_: Expand the TornadoVM profiler with Level Zero Sysman Energy Metrics. - `#559 `_: Refactoring Power Metric handlers for PTX and OpenCL. - `#548 `_: Benchmarking improvements. - `#549 `_: Prebuilt API tests added using multiple backend-setup. - Add internal tests for monitoring memory management `(link) `_. Compatibility ~~~~~~~~~~~~~ - `#561 `_: Build for OSx 14.6 and OSx 15 fixed. Bug Fixes ~~~~~~~~~ - `#564 `_: Jenkins configuration fixed to run KFusion per backend. - `#562 `_: Warmup action from the Execution Plan fixed to run with correct internal IDs. - `#557 `_: Shared Execution Plans Context fixed. - `#553 `_: OpenCL compiler flags for Intel Integrated GPUs fixed. - `#552 `_: Fixed runtime to select any device among multiple SPIR-V devices. - Fixed zero extend arithmetic operations: `link `_ TornadoVM 1.0.7 ---------------- 30th August 2024 Improvements ~~~~~~~~~~~~ - `#468 `_: Cleanup Abstract Metadata Class. - `#473 `_: Add maven plugin to build TornadoVM source for the releases. - `#474 `_: Refactor TornadoDevice to place common methods in the ``TornadoXPUInterface``. - `#482 `_: Help messages improved when an out-of-memory exception is raised. - `#484 `_: Double-type for the trigonometric functions added in the ``TornadoMath`` class. - `#487 `_: Prebuilt API simplified. - `#494 `_: Add test to trigger unsupported features related to direct use of Memory Segments. - `#509 `_: Add a quick pass configuration to skip the heavy tests during active development. - `#532 `_: Improve thread scheduler to support RISC-V Accelerators from Codeplay. - `#533 `_: Support for scalar values to be passed via lambda expressions as tasks. - `#538 `_: ``README`` file updated. - `#539 `_: Refactor core classes and add new API methods to pass compilation flags to the low-level driver compilers (OpenCL, PTX and Level Zero). - `#542 `_: Tagged LevelZero JNI and Beehive Toolkit dependencies added in the build and installer. Compatibility ~~~~~~~~~~~~~ - `#465 `_: Support for JDK 22 and GraalVM 24.0.2. - `#486 `_: Temurin for Windows added in the list of supported JDKs. - `#525 `_: Revert usage of String Templates in preparation for JDK 23. - `#527 `_: SPIR-V version parameter added. TornadoVM may run previous SPIR-V versions (e.g., ComputeAorta from Codeplay). - `#513 `_: LevelZero JNI Library updated to v0.1.4. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#470 `_: README documentation fixed. - `#478 `_: Fix the test names that are present in the white list. - `#488 `_: FP64 Kind for radian operations and the PTX backend fixed. - `#493 `_: Tests Whitelist for PTX backend fixed. - `#502 `_: Fix barrier type in the documentation regarding programmability of reductions. - `#514 `_: Installer script fixed. - `#540 `_: Fix issue with clean-up execution IDs function. - `#541 `_: Fix Data Accessors for the prebuilt API. - `#543 `_: Fix checkstyle condition and FP16 error message improved. TornadoVM 1.0.6 ---------------- 27th June 2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#442 `_: Support for multiple SPIR-V device versions (>= 1.2). - `#444 `_: Enabling automatic device memory clean-up after each run from the execution plan. - `#448 `_: API extension to query device memory consumption at the TaskGraph granularity. - `#451 `_: Option to select the default SPIR-V runtime. - `#455 `_: Refactoring the API and documentation updated. - `#460 `_: Refactoring all examples to use try-with-resources execution plans by default. - `#462 `_: Support for copy array references from private to private memory on the hardware accelerator. Compatibility ~~~~~~~~~~~~~~~~~~ - `#438 `_: No writes for intermediate files to avoid permissions issues with Jenkins. - `#440 `_: Update Jenkinsfile for CI/CD testing. - `#443 `_: Level Zero and OpenCL runtimes for SPIR-V included in the Jenkins CI/CD. - `#450 `_: TornadoVM benchmark script improved to report dimensions and sizes. - `#453 `_: Update Jenkinsfile with regards to the runtime for SPIR-V. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#434 `_: Fix for building TornadoVM on OSx after integration with SPIR-V binaries for OpenCL. - `#441 `_: Fix PTX unit-tests. - `#446 `_: Fix NVIDIA thread-block scheduler for new GPU drivers. - `#447 `_: Fix recompilation when batch processing is not triggered. - `#463 `_: Fix unit-tests for CPU virtual devices. TornadoVM 1.0.5 ---------------- 26th May 2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#402 `_: Support for TornadoNativeArrays from FFI buffers. - `#403 `_: Clean-up and refactoring for the code analysis of the loop-interchange. - `#405 `_: Disable Loop-Interchange for CPU offloading.. - `#407 `_: Debugging OpenCL Kernels builds improved. - `#410 `_: CPU block scheduler disabled by default and option to switch between different thread-schedulers added. - `#418 `_: TornadoOptions and TornadoLogger improved. - `#423 `_: MxM using ns instead of ms to report performance. - `#425 `_: Vector types for ``Float`` and ``Int`` supported. - `#429 `_: Documentation of the installation process updated and improved. - `#432 `_: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime. Compatibility ~~~~~~~~~~~~~~~~~~ - `#409 `_: Guidelines to build the documentation. - `#411 `_: Windows installer improved. - `#412 `_: Python installer improved to check download all Python dependencies before the main installer. - `#413 `_: Improved documentation for installing all configurations of backends and OS. - `#424 `_: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime. - `#430 `_: Improved the installer by checking that the TornadoVM environment is loaded upfront. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#400 `_: Fix batch computation when the global thread indexes are used to compute the outputs. - `#414 `_: Recover Test-Field unit-tests using Panama types. - `#415 `_: Check style errors fixed. - `#416 `_: FPGA execution with multiple tasks in a task-graph fixed. - `#417 `_: Lazy-copy out fixed for Java fields. - `#420 `_: Fix Mandelbrot example. - `#421 `_: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs. - `#422 `_: Compilation for NVIDIA Jetson Nano fixed. - `#426 `_: Fix Logger for all backends. - `#428 `_: Math cos/sin operations supported for vector types. - `#431 `_: Jenkins files fixed. TornadoVM 1.0.4 ---------------- 30th April 2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#369 `_: Introduction of Tensor types in TornadoVM API and interoperability with ONNX Runtime. - `#370 `_ : Array concatenation operation for TornadoVM native arrays. - `#371 `_: TornadoVM installer script ported for Windows 10/11. - `#372 `_: Add support for ``HalfFloat`` (``Float16``) in vector types. - `#374 `_: Support for TornadoVM array concatenations from the constructor-level. - `#375 `_: Support for TornadoVM native arrays using slices from the Panama API. - `#376 `_: Support for lazy copy-outs in the batch processing mode. - `#377 `_: Expand the TornadoVM profiler with power metrics for NVIDIA GPUs (OpenCL and PTX backends). - `#384 `_: Auto-closable Execution Plans for automatic memory management. Compatibility ~~~~~~~~~~~~~~~~~~ - `#386 `_: OpenJDK 17 support removed. - `#390 `_: SapMachine OpenJDK 21 supported. - `#395 `_: OpenJDK 22 and GraalVM 22.0.1 supported. - TornadoVM tested with Apple M3 chips. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#367 `_: Fix for Graal/Truffle languages in which some Java modules were not visible. - `#373 `_: Fix for data copies of the ``HalfFloat`` types for all backends. - `#378 `_: Fix free memory markers when running multi-thread execution plans. - `#379 `_: Refactoring package of vector api unit-tests. - `#380 `_: Fix event list sizes to accommodate profiling of large applications. - `#385 `_: Fix code check style. - `#387 `_: Fix TornadoVM internal events in OpenCL, SPIR-V and PTX for running multi-threaded execution plans. - `#388 `_: Fix of expected and actual values of tests. - `#392 `_: Fix installer for using existing JDKs. - `#389 `_: Fix ``DataObjectState`` for multi-thread execution plans. - `#396 `_: Fix JNI code for the CUDA NVML library access with OpenCL. TornadoVM 1.0.3 ---------------- 27th March 2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#344 `_: Support for Multi-threaded Execution Plans. - `#347 `_: Enhanced examples. - `#350 `_: Obtain internal memory segment for the Tornado Native Arrays without the object header. - `#357 `_: API extensions to query and apply filters to backends and devices from the ``TornadoExecutionPlan``. - `#359 `_: Support Factory Methods for FFI-based array collections to be used/composed in TornadoVM Task-Graphs. Compatibility ~~~~~~~~~~~~~~~~~~ - `#351 `_: Compatibility of TornadoVM Native Arrays with the Java Vector API. - `#352 `_: Refactor memory limit to take into account primitive types and wrappers. - `#354 `_: Add DFT-sample benchmark in FP32. - `#356 `_: Initial support for Windows 11 using Visual Studio Development tools. - `#361 `_: Compatibility with the SPIR-V toolkit v0.0.4. - `#366 `_: Level Zero JNI Dependency updated to 0.1.3. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#346 `_: Computation of local-work group sizes for the Level Zero/SPIR-V backend fixed. - `#360 `_: Fix native tests to check the JIT compiler for each backend. - `#355 `_: Fix custom exceptions when a driver/device is not found. TornadoVM 1.0.2 ---------------- 29/02/2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#323 `_: Set Accelerator Memory Limit per Execution Plan at the API level - `#328 `_: Javadoc API to run with concurrent devices and memory limits - `#340 `_: New API calls to enable ``threadInfo`` and ``printKernel`` from the Execution Plan API. - `#334 `_: Dynamically enable/disable profiler after first run Compatibility ~~~~~~~~~~~~~~~~~~ - `#337 `_ : Initial support for Graal and JDK 21.0.2 Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#322 `_: Fix duplicate thread-info debug message when the debug option is also enabled. - `#325 `_: Set/Get accesses for the ``MatrixVectorFloat4`` type fixed - `#326 `_: Fix installation script for running with Python >= 3.12 - `#327 `_: Fix Memory Limits for all supported Panama off-heap types. - `#329 `_: Fix timers for the dynamic reconfiguration policies - `#330 `_: Fix the profiler logs when silent mode is enabled - `#332 `_: Fix Batch processing when having multiple task-graphs in a single execution plan. TornadoVM 1.0.1 ---------------- 30/01/2024 Improvements ~~~~~~~~~~~~~~~~~~ - `#305 `_: Under-demand data transfer for custom data ranges. - `#313 `_: Initial support for Half-Precision (FP16) data types. - `#311 `_: Enable Multi-Task Multiple Device (MTMD) model from the ``TornadoExecutionPlan`` API: - `#315 `_: Math ``Ceil`` function added Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - `#294 `_: Separation of the OpenCL Headers from the code base. - `#297 `_: Separation of the LevelZero JNI API in a separate repository. - `#301 `_: Temurin configuration supported. - `#304 `_: Refactor of the common phases for the JIT compiler. - `#316 `_: Beehive SPIR-V Toolkit version updated. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `#298 `_: OpenCL Codegen fixed open-close brackets. - `#300 `_: Python Dependencies fixed for AWS - `#308 `_: Runtime check for Grid-Scheduler names - `#309 `_: Fix check-style to support STR templates - `#314 `_: emit Vector16 Capability for 16-width vectors TornadoVM 1.0 ---------------- 05/12/2023 Improvements ~~~~~~~~~~~~~~~~~~ - Brand-new API for allocating off-heap objects and array collections using the Panama Memory Segment API. - New Arrays, Matrix and Vector type objects are allocated using the Panama API. - Migration of existing applications to use the new Panama-based types: https://tornadovm.readthedocs.io/en/latest/offheap-types.html - Handling of the TornadoVM's internal bytecode improved to avoid write-only copies from host to device. - ``cospi`` and ``sinpi`` math operations supported for OpenCL, PTX and SPIR-V. - Vector 16 data types supported for ``float``, ``double`` and ``int``. - Support for Mesa's ``rusticl``. - Device default ordering improved based on maximum thread size. - Move all the installation and configuration scripts from Bash to Python. - The installation process has been improved for Linux and OSx with M1/M2 chips. - Documentation improved. - Add profiling information for the testing scripts. Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~ - Integration with the Graal 23.1.0 JIT Compiler. - Integration with OpenJDK 21. - Integration with Truffle Languages (Python, Ruby and Javascript) using Graal 23.1.0. - TornadoVM API Refactored. - Backport bug-fixes for branch using OpenJDK 17: ``master-jdk17`` Bug fixes: ~~~~~~~~~~~~~~~~~ - Multiple SPIR-V Devices fixed. - Runtime Exception when no SPIR-V devices are present. - Issue with the kernel context API when invoking multiple kernels fixed. - MTMD mode is fixed when running multiple backends on the same device. - ``long`` type as a constant parameter for a kernel fixed. - FPGA Compilation and Execution fixed for AWS and Xilinx devices. - Batch processing fixed for different data types of the same size. TornadoVM 0.15.2 ---------------- 26/07/2023 Improvements ~~~~~~~~~~~~~~~~~~ - Initial Support for Multi-Tasks on Multiple Devices (MTMD): This mode enables the execution of multiple independent tasks on more than one hardware accelerators. Documentation in link: https://tornadovm.readthedocs.io/en/latest/multi-device.html - Support for trigonometric ``radian``, ``cospi`` and ``sinpi`` functions for the OpenCL/PTX and SPIR-V backends. - Clean-up Java modules not being used and TornadoVM core classes refactored. Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~ - Initial integration with ComputeAorta (part of the Codeplay's oneAPI Construction Kit for RISC-V) to run on RISC-V with Vector Instructions (OpenCL backend) in emulation mode. - Beehive SPIR-V Toolkit dependency updated. - Tests for prebuilt SPIR-V kernels fixed to dispatch SPIR-V binaries through the Level Zero and OpenCL runtimes. - Deprecated ``javac.py`` script removed. Bug fixes: ~~~~~~~~~~~~~~~~~ - TornadoVM OpenCL Runtime throws an exception when the detected hardware does not support FP64. - Fix the installer for the older Apple with the x86 architecture using AMD GPUs. - Installer for ARM based systems fixed. - Installer fixed for Microsoft WSL and NVIDIA GPUs. - OpenCL code generator fixed to avoid using the reserved OpenCL keywords from Java function parameters. - Dump profiler option fixed. TornadoVM 0.15.1 ---------------- 15/05/2023 Improvements ~~~~~~~~~~~~~~~~~~ - Introduction of a device selection heuristic based on the computing capabilities of devices. TornadoVM selects, as the default device, the fastest device based on its computing capability. - Optimisation of removing redundant data copies for Read-Only and Write-Only buffers from between the host (CPU) and the device (GPU) based on the Tornado Data Flow Graph. - New installation script for TornadoVM. - Option to dump the TornadoVM bytecodes for the unit tests. - Full debug option improved. Use ``--fullDebug``. Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Integration and compatibility with the Graal 22.3.2 JIT Compiler. - Improved compatibility with Apple M1 and Apple M2 through the OpenCL Backend. - GraalVM/Truffle programs integration improved. Use ``--truffle`` in the ``tornado`` script to run guest programs with Truffle. Example: ``tornado --truffle python myProgram.py`` Full documentation: https://tornadovm.readthedocs.io/en/latest/truffle-languages.html Bug fixes: ~~~~~~~~~~~~~~~~~ - Documentation that resets the device's memory: https://github.com/beehive-lab/TornadoVM/blob/master/tornado-api/src/main/java/uk/ac/manchester/tornado/api/TornadoExecutionPlan.java#L282 - Append the Java ``CLASSPATH`` to the ``cp`` option from the ``tornado`` script. - Dependency fixed for the ``cmake-maven`` plugin fixed for ARM-64 arch. - Fixed the automatic installation for Apple M1/M2 and ARM-64 and NVIDIA Jetson nano computing systems. - Integration with IGV fixed. Use the ``--igv`` option for the ``tornado`` and ``tornado-test`` scripts. TornadoVM 0.15 ---------------- 27/01/2023 Improvements ~~~~~~~~~~~~~~~~~~ - New TornadoVM API: - API refactoring (``TaskSchedule`` has been renamed to ``TaskGraph``) - Introduction of the Immutable ``TaskGraphs`` - Introduction of the TornadoVM Execution Plans: (``TornadoExecutionPlan``) - The documentation of migration of existing TornadoVM applications to the new API can be found here: https://tornadovm.readthedocs.io/en/latest/programming.html#migration-to-tornadovm-v0-15 - Launch a new website https://tornadovm.readthedocs.io/en/latest/ for the documentation - Improved documentation - Initial support for Intel ARC discrete GPUs. - Improved TornadoVM installer for Linux - ImprovedTornadoVM launch script with optional parameters - Support of large buffer allocations with Intel Level Zero. Use: ``tornado.spirv.levelzero.extended.memory=True`` Bug fixes: ~~~~~~~~~~~~~~~~~ - Vector and Matrix types - TornadoVM Floating Replacement compiler phase fixed - Fix ``CMAKE`` for Intel ARC GPUs - Device query tool fixed for the PTX backend - Documentation for Windows 11 fixed TornadoVM 0.14.1 ---------------- 29/09/2022 Improvements ~~~~~~~~~~~~~~~~~~~~~ - The tornado command is replaced from a Bash to a Python script. - Use ``tornado --help`` to check the new options and examples. - Support of native tests for the SPIR-V backend. - Improvement of the OpenCL and PTX tests of the internal APIs. Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Integration and compatibility with the Graal 22.2.0 JIT Compiler. - Compatibility with JDK 18 and JDK 19. - Compatibility with Apple M1 Pro using the OpenCL backend. Bug Fixes ~~~~~~~~~~~~~~~~~~~~~ - CUDA PTX generated header fixed to target NVIDIA 30xx GPUs and CUDA 11.7. - The signature of generated PTX kernels fixed for NVIDIA driver >= 510 and 30XX GPUs when using the TornadoVM Kernel API. - Tests of virtual OpenCL devices fixed. - Thread deployment information for the OpenCL backend is fixed. - ``TornadoVMRuntimeCI`` moved to ``TornadoVMRutimeInterface``. TornadoVM 0.14 -------------- 15/06/2022 New Features ~~~~~~~~~~~~ - New device memory management for addressing the memory allocation limitations of OpenCL and enabling pinned memory of device buffers. - The execution of task-schedules will still automatically allocate/deallocate memory every time a task-schedule is executed, unless lock/unlock functions are invoked explicitly at the task-schedule level. - One heap per device has been replaced with a device buffer per input variable. - A new API call has been added for releasing memory: ``unlockObjectFromMemory`` - A new API call has been added for locking objects to the device: ``lockObjectInMemory`` This requires the user to release memory by invoking ``unlockObjectFromMemory`` at the task-schedule level. - Enhanced Live Task migration by supporting multi-backend execution (PTX <-> OpenCL <-> SPIR-V). .. _compatibilityintegration-1: Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~~~~~ - Integration with the Graal 22.1.0 JIT Compiler - JDK 8 deprecated - Azul Zulu JDK supported - OpenCL 2.1 as a default target for the OpenCL Backend - Single Docker Image for Intel XPU platforms, including the SPIR-V backend (using the Intel Integrated Graphics), and OpenCL (using the Intel Integrated Graphics, Intel CPU and Intel FPGA in emulation mode). Image: https://github.com/beehive-lab/docker-tornado#intel-integrated-graphics Improvements/Bug Fixes ~~~~~~~~~~~~~~~~~~~~~~ - ``SIGNUM`` Math Function included for all three backends. - SPIR-V optimizer enabled by default (3x reduce in binary size). - Extended Memory Mode enabled for the SPIR-V Backend via Level Zero. - Phi instructions fixed for the SPIR-V Backend. - SPIR-V Vector Select instructions fixed. - Duplicated IDs for Non-Inlined SPIR-V Functions fixed. - Refactoring of the TornadoVM Math Library. - FPGA Configuration files fixed. - Bitwise operations for OpenCL fixed. - Code Generation Times and Backend information are included in the profiling info. TornadoVM 0.13 -------------- 21/03/2022 - Integration with JDK 17 and Graal 21.3.0 - JDK 11 is the default version and the support for the JDK 8 has been deprecated - Support for extended intrinsics regarding math operations - Native functions are enabled by default - Support for 2D arrays for PTX and SPIR-V backends: - https://github.com/beehive-lab/TornadoVM/commit/2ef32ca97941410672720f9dfa15f0151ae2a1a1 - Integer Test Move operation supported: - https://github.com/beehive-lab/TornadoVM/pull/177 - Improvements in the SPIR-V Backend: - Experimental SPIR-V optimizer. Binary size reduction of up to 3x - https://github.com/beehive-lab/TornadoVM/commit/394ca94dcdc3cb58d15a17046e1d22c6389b55b7 - Fix malloc functions for Level-Zero - Support for pre-built SPIR-V binary modules using the TornadoVM runtime for OpenCL - Performance increase due to cached buffers on GPUs by default - Disassembler option for SPIR-V binary modules. Use ``--printKernel`` - Improved Installation: - Full automatic installer script integrated - Documentation about the installation for Windows 11 - Refactoring and several bug fixes - https://github.com/beehive-lab/TornadoVM/commit/57694186b42ec28b16066fb549ab8fcf9bff9753 - Vector types fixed: - https://github.com/beehive-lab/TornadoVM/pull/181/files - https://github.com/beehive-lab/TornadoVM/commit/004d61d6d26945b45ebff66641b60f90f00486be - Fix AtomicInteger get for OpenCL: - https://github.com/beehive-lab/TornadoVM/pull/177 - Dependencies for Math3 and Lang3 updated TornadoVM 0.12 -------------- 17/11/2021 - New backend: initial support for SPIR-V and Intel Level Zero - Level-Zero dispatcher for SPIR-V integrated - SPIR-V Code generator framework for Java - Benchmarking framework improved to accommodate all three backends - Driver metrics, such as kernel time and data transfers included in the benchmarking framework - TornadoVM profiler improved: - Command line options added: ``--enableProfiler `` and ``--dumpProfiler `` - Logging improve for debugging purposes. JIT Compiler, JNI calls and code generation - New math intrinsincs operations supported - Several bug fixes: - Duplicated barriers removed. TornadoVM BARRIER bytecode fixed when running multi-context - Copy in when having multiple reductions fixed - TornadoVM profiler fixed for multiple context switching (device switching) - Pretty printer for device information TornadoVM 0.11 -------------- 29/09/2021 - TornadoVM JIT Compiler upgrade to work with Graal 21.2.0 and JDK 8 with JVMCI 21.2.0 - Refactoring of the Kernel Parallel API for Heterogeneous Programming: - Methods ``getLocalGroupSize(index)`` and ``getGlobalGroupSize`` moved to public fields to keep consistency with the rest of the thread properties within the ``KernelContext`` class. - Changeset: https://github.com/beehive-lab/TornadoVM/commit/e1ebd66035d0722ca90eb0121c55dbc744840a74 - Compiler update to register the global number of threads: https://github.com/beehive-lab/TornadoVM/pull/133/files - Simplification of the TornadoVM events handler: https://github.com/beehive-lab/TornadoVM/pull/135/files - Renaming the Profiler API method from ``event.getExecutionTime`` to ``event.getElapsedTime``: https://github.com/beehive-lab/TornadoVM/pull/134 - Deprecating ``OCLWriteNode`` and ``PTXWriteNode`` and fixing stores for bytes: https://github.com/beehive-lab/TornadoVM/pull/131 - Refactoring of the FPGA IR extensions, from the high-tier to the low-tier of the JIT compiler - Utilizing the FPGA Thread-Attributes compiler phase for the FPGA execution - Using the ``GridScheduler`` object (if present) or use a default value (e.g., 64, 1, 1) for defining the FPGA OpenCL local workgroup - Several bugs fixed: - Codegen for sequential kernels fixed - Function parameters with non-inlined method calls fixed TornadoVM 0.10 -------------- 29/06/2021 - TornadoVM JIT Compiler sync with Graal 21.1.0 - Experimental support for OpenJDK 16 - Tracing the TornadoVM thread distribution and device information with a new option ``--threadInfo`` instead of ``--debug`` - Refactoring of the new API: - ``TornadoVMExecutionContext`` renamed to ``KernelContext`` - ``GridTask`` renamed to ``GridScheduler`` - AWS F1 AMI version upgraded to 1.10.0 and automated the generation of AFI image - Xilinx OpenCL backend expanded with: - a) Initial integration of Xilinx OpenCL attributes for loop pipelining in the TornadoVM compiler - b) Support for multiple compute units - Logging FPGA compilation option added to dump FPGA HLS compilation to a file - TornadoVM profiler enhanced for including data transfers for the stack-frame and kernel dispatch time - Initial support for 2D Arrays added - Several bug fixes and stability support for the OpenCL and PTX backends TornadoVM 0.9 ------------- 15/04/2021 - Expanded API for expressing kernel parallelism within Java. It can work with the existing loop parallelism in TornadoVM. - Direct access to thread-ids, OpenCL local memory (PTX shared memory), and barriers - ``TornadoVMContext`` added: See https://github.com/beehive-lab/TornadoVM/blob/5bcd3d6dfa2506032322c32d72b7bbd750623a95/tornado-api/src/main/java/uk/ac/manchester/tornado/api/TornadoVMContext.java - Code examples: - https://github.com/beehive-lab/TornadoVM/tree/master/examples/src/main/java/uk/ac/manchester/tornado/examples/tornadovmcontext - Documentation: - https://github.com/beehive-lab/TornadoVM/blob/master/assembly/src/docs/21_TORNADOVM_CONTEXT.md - Profiler integrated with Chrome debug: - Use flags: ``-Dtornado.chrome.event.tracer.enabled=True -Dtornado.chrome.event.tracer.filename=userFile.json`` - See https://github.com/beehive-lab/TornadoVM/pull/41 - Added support for Windows 10: - See https://github.com/beehive-lab/TornadoVM/blob/develop/assembly/src/docs/20_INSTALL_WINDOWS_WITH_GRAALVM.md - TornadoVM running with Windows JDK 11 supported (Linux & Windows) - Xilinx FPGAs workflow supported for Vitis 2020.2 - Pre-compiled tasks for Xilinx/Intel FPGAs fixed - Slambench fixed when compiling for PTX and OpenCL backends - Several bug fixes for the runtime, JIT compiler and data management. -------------- TornadoVM 0.8 ------------- 19/11/2020 - Added PTX backend for NVIDIA GPUs - Build TornadoVM using ``make BACKEND=ptx,opencl`` to obtain the two supported backends. - TornadoVM JIT Compiler aligned with Graal 20.2.0 - Support for other JDKs: - Red Hat Mandrel 11.0.9 - Amazon Coretto 11.0.9 - GraalVM LabsJDK 11.0.8 - OpenJDK 11.0.8 - OpenJDK 12.0.2 - OpenJDK 13.0.2 - OpenJDK 14.0.2 - Support for hybrid (CPU-GPU) parallel reductions - New API for generic kernel dispatch. It introduces the concept of ``WorkerGrid`` and ``GridTask`` - A ``WorkerGrid`` is an object that stores how threads are organized on an OpenCL device: ``java WorkerGrid1D worker1D = new WorkerGrid1D(4096);`` - A ``GridTask`` is a map that relates a task-name with a worker-grid. ``java GridTask gridTask = new GridTask(); gridTask.set("s0.t0", worker1D);`` - A TornadoVM Task-Schedule can be executed using a ``GridTask``: ``java ts.execute(gridTask);`` - More info: `link `__ - TornadoVM profiler improved - Profiler metrics added - Code features per task-graph - Lazy device initialisation moved to early initialisation of PTX and OpenCL devices - Initial support for Atomics (OpenCL backend) - `Link to examples `__ - Task Schedules with 11-14 parameters supported - Documentation improved - Bug fixes for code generation, numeric promotion, basic block traversal, Xilinx FPGA compilation. -------------- TornadoVM 0.7 ------------- 22/06/2020 - Support for ARM Mali GPUs. - Support parallel reductions on FPGAs - Agnostic FPGA vendor compilation via configuration files (Intel & Xilinx) - Support for AWS on Xilinx FPGAs - Recompilation for different input data sizes supported - New TornadoVM API calls: a) Update references for re-compilation: ``taskSchedule.updateReferences(oldRef, newRef);`` b) Use the default OpenCL scheduler: ``taskSchedule.useDefaultThreadScheduler(true);`` - Use of JMH for benchmarking - Support for Fused Multiply-Add (FMA) instructions - Easy-selection of different devices for unit-tests ``tornado-test.py -V --debug -J"-Dtornado.unittests.device=0:1"`` - Bailout mechanism improved from parallel to sequential - Improve thread scheduling - Support for private memory allocation - Assertion mode included - Documentation improved - Several bug fixes TornadoVM 0.6 ------------- 21/02/2020 - TornadoVM compatible with GraalVM 19.3.0 using JDK 8 and JDK 11 - TornadoVM compiler update for using Graal 19.3.0 compiler API - Support for dynamic languages on top of Truffle - `examples `__ - Support for multiple tasks per task-schedule on FPGAs (Intel and Xilinx) - Support for OSX Mojave and Catalina - Task-schedule name handling for FPGAs improved - Exception handling improved - Reductions for ``long`` type supported - Bug fixes for ternary conditions, reductions and code generator - Documentation improved TornadoVM 0.5 ------------- 16/12/2019 - Initial support for Xilinx FPGAs - TornadoVM API classes are now ``Serializable`` - Initial support for local memory for reductions - JVMCI built with local annotation patch removed. Now TornadoVM requires unmodified JDK8 with JVMCI support - Support of multiple reductions within the same ``task-schedules`` - Emulation mode on Intel FPGAs is fixed - Fix reductions on Intel Integrated Graphics - TornadoVM driver OpenCL initialization and OpenCL code cache improved - Refactoring of the FPGA execution modes (full JIT and emulation modes improved). TornadoVM 0.4 ------------- 14/10/2019 - Profiler supported - Use ``-Dtornado.profiler=True`` to enable profiler - Use ``-Dtornado.profiler=True -Dtornado.profiler.save=True`` to dump the profiler logs - Feature extraction added - Use ``-Dtornado.feature.extraction=True`` to enable code extraction features - Mac OSx support - Automatic reductions composition (map-reduce) within the same task-schedule - Bug related to a memory leak when running on GPUs solved - Bug fixes and stability improvements TornadoVM 0.3 ------------- 22/07/2019 - New Matrix 2D and Matrix 3D classes with type specializations. - New API-call ``TaskSchedule#batch`` for batch processing. It allows programmers to run with more data than the maximum capacity of the accelerator by creating batches of executions. - FPGA full automatic compilation pipeline. - FPGA options simplified: - ``-Dtornado.precompiled.binary=`` for loading the bitstream. - ``-Dtornado.opencl.userelative=True`` for using relative addresses. - ``-Dtornado.opencl.codecache.loadbin=True`` *removed*. - Reductions support enhanced and fully automated on GPUs and CPUs. - Initial support for reductions on FPGAs. - Initial API for profiling tasks integrated. TornadoVM 0.2 ------------- 25/02/2019 - Rename to TornadoVM - Device selection for better performance (CPU, multi-core, GPU, FPGA) via an API for Dynamic Reconfiguration - Added methods ``executeWithProfiler`` and ``executeWithProfilerSequential`` with an input policy. - Policies: ``Policy.PERFORMANCE``, ``Policy.END_2_END``, and ``Policy.LATENCY`` implemented. - Basic heuristic for predicting the highest performing target device with Dynamic Reconfiguration - Initial FPGA integration for Altera FPGAs: - Full JIT compilation mode - Ahead of time compilation mode - Emulation/debug mode - FPGA JIT compiler specializations - Added support for Java reductions: - Compiler specializations for CPU and GPU reductions - Performance and stability fixes Tornado 0.1.0 ------------- 07/09/2018 - Initial Implementation of the Tornado compiler - Initial GPU/CPU code generation for OpenCL - Initial support in the runtime to execute OpenCL programs generated by the Tornado JIT compiler - Initial Tornado-API release (``@Parallel`` Java annotation and ``TaskSchedule`` API) - Multi-GPU enabled through multiple tasks-schedules