Functions And Control Flow
Ave programs are Python functions compiled with the Ave JIT. Programs usually import the top-level package as avelang and the language package as al:
import avelang
import avelang.language as al
JIT Functions
@avelang.jit marks a Python function as an Ave JIT function. A JIT function can be launched as a GPU kernel, and it can also be called as a helper from another JIT function.
@avelang.jit
def axpy(
a: al.i32,
b: al.i32,
n: al.i32,
x: al.Tensor((32,), al.i32),
y: al.Tensor((32,), al.i32),
):
idx = al.block_id(0) * al.block_dim(0) + al.thread_id(0)
if idx < n:
y[idx] = a * x[idx] + b
Functions called from JIT code must also be decorated with @avelang.jit. JIT functions must be defined in a Python file so the compiler can recover stable source code.
Arguments
Function arguments use Ave type annotations. Scalar annotations such as al.i32 describe scalar values. al.Tensor(shape, type) describes tensor-like memory with a compile-time shape. al.Pointer(type) describes raw pointer arguments whose logical shape will be recovered inside the kernel. See Builtin Types for the type forms.
Launch
Launch a decorated kernel by providing grid and block dimensions. The launch provider returns ((grid_x, grid_y, grid_z), (block_x, block_y, block_z)).
axpy[lambda: ((1, 1, 1), (256, 1, 1))](a, b, n, x, y)
Backend options are passed as keyword arguments:
kernel[lambda: ((1, 1, 1), (256, 1, 1))](x, num_warps=8)
GPU Coordinates
Ave exposes GPU coordinates through al.block_id, al.thread_id, al.block_dim, and al.grid_dim. The dimension argument must be a constant integer 0, 1, or 2.
idx = al.block_id(0) * al.block_dim(0) + al.thread_id(0)
Conditionals
Ave uses Python if syntax for conditional execution. Conditions can depend on runtime values such as thread coordinates and problem sizes.
if idx < n:
out[idx] = value
Conditionals lower to GPU control flow.
Loops
Use al.range for compiler-lowered loops.
for k in al.range(128):
acc = acc + x[k]
Loop bounds may use supported runtime expressions:
for i in al.range((n + al.block_dim(0) - 1) // al.block_dim(0)):
...
Use al.constexpr for tile sizes, static shapes, and loop bounds that should be known during specialization. Specialization lets the compiler see static storage sizes and unroll-friendly bounds while still accepting runtime problem sizes.
Runtime Shapes
Generic kernels combine runtime problem sizes with specialization-time constants. Runtime sizes are passed as scalar arguments, while tile sizes are often passed as al.constexpr.
@avelang.jit
def matrix_kernel(
A_ptr: al.Pointer(al.bf16),
C_ptr: al.Pointer(al.f32),
m: al.u32,
k: al.u32,
BLOCK_M: al.constexpr,
):
...
Launch-Time Tuning
The current tuning workflow is explicit. Choose launch geometry, specialize tile sizes, and pass backend options at launch.
for block in [64, 128, 256]:
kernel[lambda block=block: ((1, 1, 1), (block, 1, 1))](
x,
n,
BLOCK=block,
num_warps=block // 32,
)
Tunable inputs include grid and block dimensions, tile sizes passed as al.constexpr, layout shapes and strides, and backend options such as num_warps.
Python Subset
Ave parses a supported subset of Python source. Supported patterns include local variables, assignments, tuple values for shapes and launches, calls to compiler-registered language functions, calls to other JIT functions, and selected Python builtins such as range, len, int, float, min, and max.