llamadart library

Classes

GenerationParams
Parameters for text generation.
ggml_backend_buffer
ggml_backend_buffer_type
ggml_backend_device
ggml_cgraph
ggml_context
ggml_opt_context
ggml_opt_dataset
ggml_opt_optimizer_params
parameters that control which optimizer is used and how said optimizer tries to find the minimal loss
ggml_opt_result
ggml_tensor
n-dimensional tensor
ggml_threadpool
llama_adapter_lora
lora adapter
llama_batch
Input data for llama_encode/llama_decode A llama_batch object can contain input about one or many sequences The provided arrays (i.e. token, embd, pos, etc.) must have size of n_tokens
llama_chat_message
used in chat template
llama_context
llama_context_params
NOTE: changing the default values of parameters marked as EXPERIMENTAL may cause crashes or incorrect results in certain configurations https://github.com/ggml-org/llama.cpp/pull/7544
llama_logit_bias
llama_memory_i
llama_model
llama_model_kv_override
llama_model_params
llama_model_quantize_params
model quantization parameters
llama_model_tensor_buft_override
llama_opt_params
llama_perf_context_data
Performance utils
llama_perf_sampler_data
llama_sampler
llama_sampler_chain_params
llama_sampler_data
llama_sampler_i
user code can implement the interface below in order to create custom llama_sampler
llama_sampler_seq_config
llama_token_data
TODO: simplify (https://github.com/ggml-org/llama.cpp/pull/9294#pullrequestreview-2286561979)
llama_token_data_array
llama_vocab
C interface
LlamaChatMessage
A message in a chat conversation.
LlamaCpp
Bindings to llama.cpp
LlamaService
Stub implementation that throws if used on unsupported platforms.
LlamaServiceBase
Platform-agnostic interface for LLM inference.
ModelParams
Configuration parameters for loading the model.
UnnamedStruct
UnnamedStruct$1
UnnamedUnion

Enums

ggml_log_level
ggml_numa_strategy
numa strategies
ggml_op
available tensor operations:
ggml_opt_optimizer_type
ggml_type
NOTE: always add types at the end of the enum to keep backward compatibility
GpuBackend
GPU backend selection for runtime device preference.
llama_attention_type
llama_flash_attn_type
llama_ftype
model file types
llama_model_kv_override_type
llama_model_meta_key
llama_params_fit_status
llama_pooling_type
llama_rope_scaling_type
llama_rope_type
llama_split_mode
llama_token_attr
llama_token_type
llama_vocab_type

Typedefs

Dartggml_abort_callbackFunction = bool Function(Pointer<Void> data)
Dartggml_backend_sched_eval_callbackFunction = bool Function(Pointer<ggml_tensor> t, bool ask, Pointer<Void> user_data)
Dartggml_log_callbackFunction = void Function(ggml_log_level level, Pointer<Char> text, Pointer<Void> user_data)
Dartggml_opt_epoch_callbackFunction = void Function(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us)
Dartllama_opt_param_filterFunction = bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
Dartllama_pos = int
Dartllama_progress_callbackFunction = bool Function(double progress, Pointer<Void> user_data)
Dartllama_seq_id = int
Dartllama_state_seq_flags = int
Dartllama_token = int
ggml_abort_callback = Pointer<NativeFunction<ggml_abort_callbackFunction>>
Abort callback If not NULL, called before ggml computation If it returns true, the computation is aborted
ggml_abort_callbackFunction = Bool Function(Pointer<Void> data)
ggml_backend_buffer_type_t = Pointer<ggml_backend_buffer_type>
ggml_backend_dev_t = Pointer<ggml_backend_device>
ggml_backend_sched_eval_callback = Pointer<NativeFunction<ggml_backend_sched_eval_callbackFunction>>
Evaluation callback for each node in the graph (set with ggml_backend_sched_set_eval_callback) when ask == true, the scheduler wants to know if the user wants to observe this node this allows the scheduler to batch nodes together in order to evaluate them in a single call
ggml_backend_sched_eval_callbackFunction = Bool Function(Pointer<ggml_tensor> t, Bool ask, Pointer<Void> user_data)
ggml_log_callback = Pointer<NativeFunction<ggml_log_callbackFunction>>
TODO these functions were sandwiched in the old optimization interface, is there a better place for them?
ggml_log_callbackFunction = Void Function(UnsignedInt level, Pointer<Char> text, Pointer<Void> user_data)
ggml_opt_context_t = Pointer<ggml_opt_context>
ggml_opt_dataset_t = Pointer<ggml_opt_dataset>
ggml_opt_epoch_callback = Pointer<NativeFunction<ggml_opt_epoch_callbackFunction>>
signature for a callback while evaluating opt_ctx on dataset, called after an evaluation
ggml_opt_epoch_callbackFunction = Void Function(Bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, Int64 ibatch, Int64 ibatch_max, Int64 t_start_us)
ggml_opt_get_optimizer_params = Pointer<NativeFunction<ggml_opt_get_optimizer_paramsFunction>>
callback to calculate optimizer parameters prior to a backward pass userdata can be used to pass arbitrary data
ggml_opt_get_optimizer_paramsFunction = ggml_opt_optimizer_params Function(Pointer<Void> userdata)
ggml_opt_result_t = Pointer<ggml_opt_result>
ggml_threadpool_t = Pointer<ggml_threadpool>
llama_memory_t = Pointer<llama_memory_i>
llama_opt_param_filter = Pointer<NativeFunction<llama_opt_param_filterFunction>>
function that returns whether or not a given tensor contains trainable parameters
llama_opt_param_filterFunction = Bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
llama_pos = Int32
llama_progress_callback = Pointer<NativeFunction<llama_progress_callbackFunction>>
llama_progress_callbackFunction = Bool Function(Float progress, Pointer<Void> user_data)
llama_sampler_context_t = Pointer<Void>
Sampling API
llama_seq_id = Int32
llama_state_seq_flags = Uint32
llama_token = Int32