llamadart library

Classes

GenerationParams: Parameters for text generation.
ggml_backend_buffer
ggml_backend_buffer_type
ggml_backend_device
ggml_cgraph
ggml_context
ggml_opt_context
ggml_opt_dataset
ggml_opt_optimizer_params: parameters that control which optimizer is used and how said optimizer tries to find the minimal loss
ggml_opt_result
ggml_tensor: n-dimensional tensor
ggml_threadpool
llama_adapter_lora: lora adapter
llama_batch: Input data for llama_encode/llama_decode A llama_batch object can contain input about one or many sequences The provided arrays (i.e. token, embd, pos, etc.) must have size of n_tokens
llama_chat_message: used in chat template
llama_context
llama_context_params: NOTE: changing the default values of parameters marked as EXPERIMENTAL may cause crashes or incorrect results in certain configurations https://github.com/ggml-org/llama.cpp/pull/7544
llama_logit_bias
llama_memory_i
llama_model
llama_model_kv_override
llama_model_params
llama_model_quantize_params: model quantization parameters
llama_model_tensor_buft_override
llama_opt_params
llama_perf_context_data: Performance utils
llama_perf_sampler_data
llama_sampler
llama_sampler_chain_params
llama_sampler_data
llama_sampler_i: user code can implement the interface below in order to create custom llama_sampler
llama_sampler_seq_config
llama_token_data: TODO: simplify (https://github.com/ggml-org/llama.cpp/pull/9294#pullrequestreview-2286561979)
llama_token_data_array
llama_vocab: C interface
LlamaChatMessage: A message in a chat conversation.
LlamaCpp: Bindings to llama.cpp
LlamaService: Stub implementation that throws if used on unsupported platforms.
LlamaServiceBase: Platform-agnostic interface for LLM inference.
ModelParams: Configuration parameters for loading the model.
UnnamedStruct
UnnamedStruct$1
UnnamedUnion

Enums

ggml_log_level
ggml_numa_strategy: numa strategies
ggml_op: available tensor operations:
ggml_opt_optimizer_type
ggml_type: NOTE: always add types at the end of the enum to keep backward compatibility
GpuBackend: GPU backend selection for runtime device preference.
llama_attention_type
llama_flash_attn_type
llama_ftype: model file types
llama_model_kv_override_type
llama_model_meta_key
llama_params_fit_status
llama_pooling_type
llama_rope_scaling_type
llama_rope_type
llama_split_mode
llama_token_attr
llama_token_type
llama_vocab_type

Constants

LLAMA_DEFAULT_SEED → const int
LLAMA_FILE_MAGIC_GGLA → const int
LLAMA_FILE_MAGIC_GGSN → const int
LLAMA_FILE_MAGIC_GGSQ → const int
LLAMA_SESSION_MAGIC → const int
LLAMA_SESSION_VERSION → const int
LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY → const int
LLAMA_STATE_SEQ_FLAGS_SWA_ONLY → const int
LLAMA_STATE_SEQ_MAGIC → const int
LLAMA_STATE_SEQ_VERSION → const int
LLAMA_TOKEN_NULL → const int

Typedefs

Dartggml_abort_callbackFunction = bool Function(Pointer<Void> data)
Dartggml_backend_sched_eval_callbackFunction = bool Function(Pointer<ggml_tensor> t, bool ask, Pointer<Void> user_data)
Dartggml_log_callbackFunction = void Function(ggml_log_level level, Pointer<Char> text, Pointer<Void> user_data)
Dartggml_opt_epoch_callbackFunction = void Function(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us)
Dartllama_opt_param_filterFunction = bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
Dartllama_pos = int
Dartllama_progress_callbackFunction = bool Function(double progress, Pointer<Void> user_data)
Dartllama_seq_id = int
Dartllama_state_seq_flags = int
Dartllama_token = int
ggml_abort_callback = Pointer<NativeFunction<ggml_abort_callbackFunction>>: Abort callback If not NULL, called before ggml computation If it returns true, the computation is aborted
ggml_abort_callbackFunction = Bool Function(Pointer<Void> data)
ggml_backend_buffer_type_t = Pointer<ggml_backend_buffer_type>
ggml_backend_dev_t = Pointer<ggml_backend_device>
ggml_backend_sched_eval_callback = Pointer<NativeFunction<ggml_backend_sched_eval_callbackFunction>>: Evaluation callback for each node in the graph (set with ggml_backend_sched_set_eval_callback) when ask == true, the scheduler wants to know if the user wants to observe this node this allows the scheduler to batch nodes together in order to evaluate them in a single call
ggml_backend_sched_eval_callbackFunction = Bool Function(Pointer<ggml_tensor> t, Bool ask, Pointer<Void> user_data)
ggml_log_callback = Pointer<NativeFunction<ggml_log_callbackFunction>>: TODO these functions were sandwiched in the old optimization interface, is there a better place for them?
ggml_log_callbackFunction = Void Function(UnsignedInt level, Pointer<Char> text, Pointer<Void> user_data)
ggml_opt_context_t = Pointer<ggml_opt_context>
ggml_opt_dataset_t = Pointer<ggml_opt_dataset>
ggml_opt_epoch_callback = Pointer<NativeFunction<ggml_opt_epoch_callbackFunction>>: signature for a callback while evaluating opt_ctx on dataset, called after an evaluation
ggml_opt_epoch_callbackFunction = Void Function(Bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, Int64 ibatch, Int64 ibatch_max, Int64 t_start_us)
ggml_opt_get_optimizer_params = Pointer<NativeFunction<ggml_opt_get_optimizer_paramsFunction>>: callback to calculate optimizer parameters prior to a backward pass userdata can be used to pass arbitrary data
ggml_opt_get_optimizer_paramsFunction = ggml_opt_optimizer_params Function(Pointer<Void> userdata)
ggml_opt_result_t = Pointer<ggml_opt_result>
ggml_threadpool_t = Pointer<ggml_threadpool>
llama_memory_t = Pointer<llama_memory_i>
llama_opt_param_filter = Pointer<NativeFunction<llama_opt_param_filterFunction>>: function that returns whether or not a given tensor contains trainable parameters
llama_opt_param_filterFunction = Bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
llama_pos = Int32
llama_progress_callback = Pointer<NativeFunction<llama_progress_callbackFunction>>
llama_progress_callbackFunction = Bool Function(Float progress, Pointer<Void> user_data)
llama_sampler_context_t = Pointer<Void>: Sampling API
llama_seq_id = Int32
llama_state_seq_flags = Uint32
llama_token = Int32

llamadart library

Classes

Enums

Constants

Typedefs

llamadart package

llamadart library