llama_set_n_threads method
Set the number of threads used for decoding n_threads is the number of threads used for generation (single token) n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
Implementation
void llama_set_n_threads(
ffi.Pointer<llama_context> ctx,
int n_threads,
int n_threads_batch,
) {
return _llama_set_n_threads(
ctx,
n_threads,
n_threads_batch,
);
}