whether to offload the KQV ops (including the KV cache) to GPU
@ffi.Bool() external bool offload_kqv;