TornadoVM Flags
TornadoVM provides runtime and compiler flags to enable experimental features, tuning, and profiling. These flags fall into two categories:
JVM Flags (passed with the -D prefix via the –jvm option)
TornadoVM CLI Flags (passed directly to the tornado Python wrapper)
Note
In the examples below, s0 refers to a task graph and t0 to a specific task within that graph.
Example Usage
$ tornado --jvm "-Dtornado.fullDebug=true" -m tornado.examples/uk.ac.manchester.examples.compute.Montecarlo 1024
Debugging and Logging
CLI Flags
Flag |
Description |
|---|---|
|
Enables full debug mode (maps to |
|
Enables basic debug output such as compilation status and device info. |
|
Prints generated OpenCL/PTX/SPIR-V kernels. |
|
Displays the number of threads used. |
|
Lists available hardware devices. |
JVM Flags
Flag |
Description |
|---|---|
|
Enables full debug output including bytecode and runtime internals. |
|
Dumps FPGA HLS compilation logs. |
|
Prints generated OpenCL/PTX/SPIR-V kernels. |
|
Saves generated kernels to the specified file. |
|
Displays the number of threads used. |
|
Prints TornadoVM Internal Bytecodes to stdout. |
|
Dumps TornadoVM Internal Bytecodes to the specified file. |
Profiling
CLI Flags
Flag |
Description |
|---|---|
|
Prints profiling metrics as JSON to stdout. |
|
Collects profiling metrics internally (see TornadoVM Profiler API). |
|
Saves profiling output to the specified file. |
JVM Flags
Flag |
Description |
|---|---|
|
Enables profiling and prints metrics as JSON to sdout. |
|
Collects profiling metrics internally for logging. |
|
Saves profiling output to the specified file. |
Performance & Scheduling
JVM Flags
Flag |
Description |
|---|---|
|
Uses nanoseconds for timing instead of milliseconds (default: true). |
|
Sets custom global workgroup size. |
|
Sets custom local workgroup size. |
|
Enables concurrent execution across devices (default: false). |
|
Sets driver priority (default: PTX=1, OpenCL=0). |
Precompiled and FPGA Options
JVM Flags
Flag |
Description |
|---|---|
|
Path to precompiled kernel or FPGA bitstream. |
|
Path to the FPGA configuration file. |
Optimizations
JVM Flags
Flag |
Description |
|---|---|
|
Enables fused multiply-add (default: true). May cause issues on some platforms. |
|
Enables math simplifications (e.g., |
|
Enables loop partial unrolling (default: false). Use |
|
Enables native math functions (default: false). |
CUDA (PTX Specific)
Flags
Flag |
Description |
|---|---|
|
Level of optimizations to apply to generated code (0 - 4), with 4 being the highest level of optimizations (default: 4). |
|
Max number of registers that a thread may use (default: none). |
|
Target microarchitecture (default: none). Note that the available target microarchitecture depends on the CUDA version. Currently CUDA 13.0 supports the following: 30, 32, 35, 37, 50, 52, 53, 60, 61, 62, 70, 72, 75, 80, 86, 87, 89, 90, 100, 103, 110, 120, 121. Older version of CUDA might supports less microarchitecture, for example, CUDA 12.0 supports up to 90. |
|
Specifies whether to enable caching explicitly (-dlcm). 0, compile with no -dlcm flag specified. 1, compile with L1 cache disabled (use only L2 cache). 2, compile with L1 cache enabled (use both L1 and L2 cache) (default: none). |
|
Specifies whether to create debug information in output (-g) (0: false) (default: none). |
|
Generate verbose log messages (0: false) (default: none). |
|
Generate line number information (-lineinfo) (0: false) (default: none). |
Level Zero (SPIR-V Specific)
JVM Flags
Flag |
Description |
|---|---|
|
Sets memory alignment (in bytes) for Level Zero buffers (default: 64). |
|
Uses Level Zero’s thread block suggestion (default: true). |
|
Optimizes Loads/Stores and simplifies the generated SPIR-V binary (experimental - default: false). |
|
Enables shared memory buffers (default: false). |
Notes
All Java flags (those beginning with -Dtornado.) are defined in the TornadoOptions.java file.
TornadoVM CLI flags (those beginning with --) are mapped to Java flags by the Python interface for ease of use.
For example, --printKernel maps internally to -Dtornado.printKernel=true.