proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
external ffi.Pointer<ffi.Float> tensor_split;