llamadart library

High-performance Dart and Flutter plugin for llama.cpp.

llamadart allows you to run Large Language Models (LLMs) locally using GGUF models across all major platforms (Android, iOS, macOS, Linux, Windows, Web).

Core Components

  • LlamaEngine: The low-level orchestrator for model loading, tokenization, and raw inference.
  • ChatSession: A high-level, stateful interface for chat-based interactions. It automatically manages conversation history and context window limits.
  • LlamaBackend: The platform-agnostic interface for inference.

Simple Example

final engine = LlamaEngine(LlamaBackend());
await engine.loadModel('path/to/model.gguf');

final session = ChatSession(engine);
await for (final token in session.create([LlamaTextContent('Hello!')])) {
  stdout.write(token);
}

await engine.dispose();

Classes

BackendAvailability
Optional backend capability for exposing selectable backend options.
BackendBatchEmbeddings
Optional backend capability for batching embedding requests.
BackendEmbeddings
Optional backend capability for generating text embeddings.
BackendPerfContextData
Native performance timings reported by llama.cpp for the active context.
BackendPerformanceDiagnostics
Optional backend capability for exposing llama.cpp perf timings.
BackendRuntimeDiagnostics
Optional backend capability for exposing resolved runtime diagnostics.
ChatParseResult
The result of parsing raw LLM output into structured components.
ChatSession
Convenience wrapper for multi-turn chat with automatic history management.
ChatTemplateEngine
Orchestrates chat template detection, rendering, and output parsing.
ChatTemplateHandler
Abstract base class for per-format chat template handlers.
GenerationGrammarTrigger
Parameters controlling the token sampling and generation process.
GenerationParams
Parameters controlling the token sampling and generation process.
ggml_backend
ggml_backend_buffer
ggml_backend_buffer_type
ggml_backend_dev_caps
ggml_backend_dev_props
ggml_backend_device
ggml_backend_event
ggml_backend_feature
ggml_backend_graph_copy$1
ggml_backend_reg
ggml_backend_sched
ggml_bf16_t
ggml_cgraph
ggml_context
ggml_init_params
ggml_object
ggml_opt_context
ggml_opt_dataset
ggml_opt_optimizer_params
ggml_opt_result
ggml_tensor
ggml_threadpool
ggml_threadpool_params
ggml_type_traits
gguf_context
GrammarTrigger
A trigger that activates grammar constraints.
llama_adapter_lora
llama_batch
llama_chat_message
llama_context
llama_context_params
llama_logit_bias
llama_memory_i
llama_model
llama_model_imatrix_data
llama_model_kv_override
llama_model_params
llama_model_quantize_params
llama_model_tensor_buft_override
llama_model_tensor_override
llama_opt_params
llama_perf_context_data
llama_perf_sampler_data
llama_sampler
llama_sampler_chain_params
llama_sampler_data
llama_sampler_i
llama_sampler_seq_config
llama_token_data
llama_token_data_array
llama_vocab
LlamaAudioContent
A part of a message containing audio data for speech-to-text models.
LlamaBackend
Platform-agnostic interface for Llama model inference.
LlamaChatMessage
A message in a chat conversation history.
LlamaChatTemplateResult
The result of applying a chat template to a conversation history.
LlamaCompletionChunk
Represents a streaming chunk of a chat completion. Aligns with OpenAI's ChatCompletionChunk.
LlamaCompletionChunkChoice
Represents a choice in a completion chunk.
LlamaCompletionChunkDelta
Represents a delta in a completion choice.
LlamaCompletionChunkFunction
Represents a function call within a tool call.
LlamaCompletionChunkToolCall
Represents a tool call within a completion chunk. Aligns with OpenAI's ToolCall in streaming chunks.
LlamaContentPart
Base class for all content types in a message.
LlamaEngine
Stateless chat completions engine (like OpenAI's Chat Completions API).
LlamaImageContent
A part of a message containing image data for vision models.
LlamaLogger
A lightweight singleton logger for the llama_dart library.
LlamaLogRecord
A log record containing level, message, time, and optional error info.
LlamaTextContent
A part of a message containing plain text.
LlamaThinkingContent
A part of a message containing the model's Chain-of-Thought (reasoning).
LlamaToolCallContent
A part of a message containing a tool call solicitation from the model.
LlamaToolResultContent
A part of a message containing the result of a tool execution.
LoraAdapterConfig
Configuration for a LoRA (Low-Rank Adaptation) adapter.
ModelParams
Configuration parameters for loading a Llama model.
mtmd_bitmap
mtmd_context
mtmd_context_params
mtmd_image_tokens
mtmd_input_chunk
mtmd_input_chunks
mtmd_input_text
ToolDefinition
Defines a tool that the LLM can invoke.
ToolParam
Represents a single parameter in a tool's input schema.
ToolParams
Provides type-safe access to tool call arguments.
UnnamedStruct
UnnamedStruct$1
UnnamedUnion

Properties

GGML_TENSOR_SIZE int
final

Functions

ggml_abort(Pointer<Char> file, int line, Pointer<Char> fmt) → void
ggml_abs(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_abs_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_acc(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) Pointer<ggml_tensor>
ggml_acc_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) Pointer<ggml_tensor>
ggml_add(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_add1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_add1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_add_cast(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_type type) Pointer<ggml_tensor>
ggml_add_id(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) Pointer<ggml_tensor>
ggml_add_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_add_rel_pos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) Pointer<ggml_tensor>
ggml_add_rel_pos_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) Pointer<ggml_tensor>
ggml_arange(Pointer<ggml_context> ctx, double start, double stop, double step) Pointer<ggml_tensor>
ggml_are_same_shape(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) bool
ggml_are_same_stride(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) bool
ggml_argmax(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_argsort(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_sort_order order) Pointer<ggml_tensor>
ggml_argsort_top_k(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int k) Pointer<ggml_tensor>
ggml_backend_alloc_buffer(ggml_backend_t backend, int size) ggml_backend_buffer_t
ggml_backend_buffer_clear(ggml_backend_buffer_t buffer, int value) → void
ggml_backend_buffer_free(ggml_backend_buffer_t buffer) → void
ggml_backend_buffer_get_alignment(ggml_backend_buffer_t buffer) int
ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor) int
ggml_backend_buffer_get_base(ggml_backend_buffer_t buffer) Pointer<Void>
ggml_backend_buffer_get_max_size(ggml_backend_buffer_t buffer) int
ggml_backend_buffer_get_size(ggml_backend_buffer_t buffer) int
ggml_backend_buffer_get_type(ggml_backend_buffer_t buffer) ggml_backend_buffer_type_t
ggml_backend_buffer_get_usage(ggml_backend_buffer_t buffer) ggml_backend_buffer_usage
ggml_backend_buffer_init_tensor(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor) ggml_status
ggml_backend_buffer_is_host(ggml_backend_buffer_t buffer) bool
ggml_backend_buffer_name(ggml_backend_buffer_t buffer) Pointer<Char>
ggml_backend_buffer_reset(ggml_backend_buffer_t buffer) → void
ggml_backend_buffer_set_usage(ggml_backend_buffer_t buffer, ggml_backend_buffer_usage usage) → void
ggml_backend_buft_alloc_buffer(ggml_backend_buffer_type_t buft, int size) ggml_backend_buffer_t
ggml_backend_buft_get_alignment(ggml_backend_buffer_type_t buft) int
ggml_backend_buft_get_alloc_size(ggml_backend_buffer_type_t buft, Pointer<ggml_tensor> tensor) int
ggml_backend_buft_get_device(ggml_backend_buffer_type_t buft) ggml_backend_dev_t
ggml_backend_buft_get_max_size(ggml_backend_buffer_type_t buft) int
ggml_backend_buft_is_host(ggml_backend_buffer_type_t buft) bool
ggml_backend_buft_name(ggml_backend_buffer_type_t buft) Pointer<Char>
ggml_backend_compare_graph_backend(ggml_backend_t backend1, ggml_backend_t backend2, Pointer<ggml_cgraph> graph, ggml_backend_eval_callback callback, Pointer<Void> user_data, Pointer<Pointer<ggml_tensor>> test_nodes, int num_test_nodes) bool
ggml_backend_cpu_buffer_from_ptr(Pointer<Void> ptr, int size) ggml_backend_buffer_t
ggml_backend_cpu_buffer_type() ggml_backend_buffer_type_t
ggml_backend_dev_backend_reg(ggml_backend_dev_t device) ggml_backend_reg_t
ggml_backend_dev_buffer_from_host_ptr(ggml_backend_dev_t device, Pointer<Void> ptr, int size, int max_tensor_size) ggml_backend_buffer_t
ggml_backend_dev_buffer_type(ggml_backend_dev_t device) ggml_backend_buffer_type_t
ggml_backend_dev_by_name(Pointer<Char> name) ggml_backend_dev_t
ggml_backend_dev_by_type(ggml_backend_dev_type type) ggml_backend_dev_t
ggml_backend_dev_count() int
ggml_backend_dev_description(ggml_backend_dev_t device) Pointer<Char>
ggml_backend_dev_get(int index) ggml_backend_dev_t
ggml_backend_dev_get_props(ggml_backend_dev_t device, Pointer<ggml_backend_dev_props> props) → void
ggml_backend_dev_host_buffer_type(ggml_backend_dev_t device) ggml_backend_buffer_type_t
ggml_backend_dev_init(ggml_backend_dev_t device, Pointer<Char> params) ggml_backend_t
ggml_backend_dev_memory(ggml_backend_dev_t device, Pointer<Size> free, Pointer<Size> total) → void
ggml_backend_dev_name(ggml_backend_dev_t device) Pointer<Char>
ggml_backend_dev_offload_op(ggml_backend_dev_t device, Pointer<ggml_tensor> op) bool
ggml_backend_dev_supports_buft(ggml_backend_dev_t device, ggml_backend_buffer_type_t buft) bool
ggml_backend_dev_supports_op(ggml_backend_dev_t device, Pointer<ggml_tensor> op) bool
ggml_backend_dev_type$1(ggml_backend_dev_t device) ggml_backend_dev_type
ggml_backend_device_register(ggml_backend_dev_t device) → void
ggml_backend_event_free(ggml_backend_event_t event) → void
ggml_backend_event_new(ggml_backend_dev_t device) ggml_backend_event_t
ggml_backend_event_record(ggml_backend_event_t event, ggml_backend_t backend) → void
ggml_backend_event_synchronize(ggml_backend_event_t event) → void
ggml_backend_event_wait(ggml_backend_t backend, ggml_backend_event_t event) → void
ggml_backend_free(ggml_backend_t backend) → void
ggml_backend_get_alignment(ggml_backend_t backend) int
ggml_backend_get_default_buffer_type(ggml_backend_t backend) ggml_backend_buffer_type_t
ggml_backend_get_device(ggml_backend_t backend) ggml_backend_dev_t
ggml_backend_get_max_size(ggml_backend_t backend) int
ggml_backend_graph_compute(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) ggml_status
ggml_backend_graph_compute_async(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) ggml_status
ggml_backend_graph_copy(ggml_backend_t backend, Pointer<ggml_cgraph> graph) ggml_backend_graph_copy$1
ggml_backend_graph_copy_free(ggml_backend_graph_copy$1 copy) → void
ggml_backend_graph_plan_compute(ggml_backend_t backend, ggml_backend_graph_plan_t plan) ggml_status
ggml_backend_graph_plan_create(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) ggml_backend_graph_plan_t
ggml_backend_graph_plan_free(ggml_backend_t backend, ggml_backend_graph_plan_t plan) → void
ggml_backend_guid(ggml_backend_t backend) ggml_guid_t
ggml_backend_init_best() ggml_backend_t
ggml_backend_init_by_name(Pointer<Char> name, Pointer<Char> params) ggml_backend_t
ggml_backend_init_by_type(ggml_backend_dev_type type, Pointer<Char> params) ggml_backend_t
ggml_backend_load(Pointer<Char> path) ggml_backend_reg_t
ggml_backend_load_all() → void
ggml_backend_load_all_from_path(Pointer<Char> dir_path) → void
ggml_backend_name(ggml_backend_t backend) Pointer<Char>
ggml_backend_offload_op(ggml_backend_t backend, Pointer<ggml_tensor> op) bool
ggml_backend_reg_by_name(Pointer<Char> name) ggml_backend_reg_t
ggml_backend_reg_count() int
ggml_backend_reg_dev_count(ggml_backend_reg_t reg) int
ggml_backend_reg_dev_get(ggml_backend_reg_t reg, int index) ggml_backend_dev_t
ggml_backend_reg_get(int index) ggml_backend_reg_t
ggml_backend_reg_get_proc_address(ggml_backend_reg_t reg, Pointer<Char> name) Pointer<Void>
ggml_backend_reg_name(ggml_backend_reg_t reg) Pointer<Char>
ggml_backend_register(ggml_backend_reg_t reg) → void
ggml_backend_sched_alloc_graph(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) bool
ggml_backend_sched_free(ggml_backend_sched_t sched) → void
ggml_backend_sched_get_backend(ggml_backend_sched_t sched, int i) ggml_backend_t
ggml_backend_sched_get_buffer_size(ggml_backend_sched_t sched, ggml_backend_t backend) int
ggml_backend_sched_get_buffer_type(ggml_backend_sched_t sched, ggml_backend_t backend) ggml_backend_buffer_type_t
ggml_backend_sched_get_n_backends(ggml_backend_sched_t sched) int
ggml_backend_sched_get_n_copies(ggml_backend_sched_t sched) int
ggml_backend_sched_get_n_splits(ggml_backend_sched_t sched) int
ggml_backend_sched_get_tensor_backend(ggml_backend_sched_t sched, Pointer<ggml_tensor> node) ggml_backend_t
ggml_backend_sched_graph_compute(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) ggml_status
ggml_backend_sched_graph_compute_async(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) ggml_status
ggml_backend_sched_new(Pointer<ggml_backend_t> backends, Pointer<ggml_backend_buffer_type_t> bufts, int n_backends, int graph_size, bool parallel, bool op_offload) ggml_backend_sched_t
ggml_backend_sched_reserve(ggml_backend_sched_t sched, Pointer<ggml_cgraph> measure_graph) bool
ggml_backend_sched_reserve_size(ggml_backend_sched_t sched, Pointer<ggml_cgraph> measure_graph, Pointer<Size> sizes) → void
ggml_backend_sched_reset(ggml_backend_sched_t sched) → void
ggml_backend_sched_set_eval_callback(ggml_backend_sched_t sched, ggml_backend_sched_eval_callback callback, Pointer<Void> user_data) → void
ggml_backend_sched_set_tensor_backend(ggml_backend_sched_t sched, Pointer<ggml_tensor> node, ggml_backend_t backend) → void
ggml_backend_sched_split_graph(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) → void
ggml_backend_sched_synchronize(ggml_backend_sched_t sched) → void
ggml_backend_supports_buft(ggml_backend_t backend, ggml_backend_buffer_type_t buft) bool
ggml_backend_supports_op(ggml_backend_t backend, Pointer<ggml_tensor> op) bool
ggml_backend_synchronize(ggml_backend_t backend) → void
ggml_backend_tensor_alloc(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor, Pointer<Void> addr) ggml_status
ggml_backend_tensor_copy(Pointer<ggml_tensor> src, Pointer<ggml_tensor> dst) → void
ggml_backend_tensor_copy_async(ggml_backend_t backend_src, ggml_backend_t backend_dst, Pointer<ggml_tensor> src, Pointer<ggml_tensor> dst) → void
ggml_backend_tensor_get(Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_tensor_get_async(ggml_backend_t backend, Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_tensor_memset(Pointer<ggml_tensor> tensor, int value, int offset, int size) → void
ggml_backend_tensor_set(Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_tensor_set_async(ggml_backend_t backend, Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_unload(ggml_backend_reg_t reg) → void
ggml_backend_view_init(Pointer<ggml_tensor> tensor) ggml_status
ggml_bf16_to_fp32(ggml_bf16_t arg0) double
ggml_bf16_to_fp32_row(Pointer<ggml_bf16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_blck_size(ggml_type type) int
ggml_build_backward_expand(Pointer<ggml_context> ctx, Pointer<ggml_cgraph> cgraph, Pointer<Pointer<ggml_tensor>> grad_accs) → void
ggml_build_forward_expand(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void
ggml_build_forward_select(Pointer<ggml_cgraph> cgraph, Pointer<Pointer<ggml_tensor>> tensors, int n_tensors, int idx) Pointer<ggml_tensor>
ggml_can_repeat(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) bool
ggml_cast(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_type type) Pointer<ggml_tensor>
ggml_ceil(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_ceil_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_clamp(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double min, double max) Pointer<ggml_tensor>
ggml_commit() Pointer<Char>
ggml_concat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int dim) Pointer<ggml_tensor>
ggml_cont(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_cont_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) Pointer<ggml_tensor>
ggml_cont_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) Pointer<ggml_tensor>
ggml_cont_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) Pointer<ggml_tensor>
ggml_cont_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) Pointer<ggml_tensor>
ggml_conv_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) Pointer<ggml_tensor>
ggml_conv_1d_dw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) Pointer<ggml_tensor>
ggml_conv_1d_dw_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int d0) Pointer<ggml_tensor>
ggml_conv_1d_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s, int d) Pointer<ggml_tensor>
ggml_conv_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) Pointer<ggml_tensor>
ggml_conv_2d_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) Pointer<ggml_tensor>
ggml_conv_2d_dw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) Pointer<ggml_tensor>
ggml_conv_2d_dw_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride0, int stride1, int pad0, int pad1, int dilation0, int dilation1) Pointer<ggml_tensor>
ggml_conv_2d_s1_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_conv_2d_sk_p0(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_conv_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2) Pointer<ggml_tensor>
ggml_conv_3d_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, int n_channels, int n_batch, int n_channels_out) Pointer<ggml_tensor>
ggml_conv_transpose_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) Pointer<ggml_tensor>
ggml_conv_transpose_2d_p0(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride) Pointer<ggml_tensor>
ggml_cos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_cos_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_count_equal(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_cpy(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_cross_entropy_loss(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_cross_entropy_loss_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) Pointer<ggml_tensor>
ggml_cumsum(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_custom_4d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3, Pointer<Pointer<ggml_tensor>> args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_custom_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<Pointer<ggml_tensor>> args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_cycles() int
ggml_cycles_per_ms() int
ggml_diag(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_diag_mask_inf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) Pointer<ggml_tensor>
ggml_diag_mask_inf_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) Pointer<ggml_tensor>
ggml_diag_mask_zero(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) Pointer<ggml_tensor>
ggml_diag_mask_zero_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) Pointer<ggml_tensor>
ggml_div(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_div_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_dup(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_dup_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_dup_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> src) Pointer<ggml_tensor>
ggml_element_size(Pointer<ggml_tensor> tensor) int
ggml_elu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_elu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_exp(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_exp_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_expm1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_expm1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_fill(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double c) Pointer<ggml_tensor>
ggml_fill_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double c) Pointer<ggml_tensor>
ggml_flash_attn_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> d, bool masked) Pointer<ggml_tensor>
ggml_flash_attn_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> mask, double scale, double max_bias, double logit_softcap) Pointer<ggml_tensor>
ggml_flash_attn_ext_add_sinks(Pointer<ggml_tensor> a, Pointer<ggml_tensor> sinks) → void
ggml_flash_attn_ext_get_prec(Pointer<ggml_tensor> a) ggml_prec
ggml_flash_attn_ext_set_prec(Pointer<ggml_tensor> a, ggml_prec prec) → void
ggml_floor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_floor_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_fopen(Pointer<Char> fname, Pointer<Char> mode) Pointer<FILE>
ggml_format_name(Pointer<ggml_tensor> tensor, Pointer<Char> fmt) Pointer<ggml_tensor>
ggml_fp16_to_fp32(int arg0) double
ggml_fp16_to_fp32_row(Pointer<ggml_fp16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_fp32_to_bf16(double arg0) ggml_bf16_t
ggml_fp32_to_bf16_row(Pointer<Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void
ggml_fp32_to_bf16_row_ref(Pointer<Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void
ggml_fp32_to_fp16(double arg0) int
ggml_fp32_to_fp16_row(Pointer<Float> arg0, Pointer<ggml_fp16_t> arg1, int arg2) → void
ggml_free(Pointer<ggml_context> ctx) → void
ggml_ftype_to_ggml_type(ggml_ftype ftype) ggml_type
ggml_gated_delta_net(Pointer<ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> g, Pointer<ggml_tensor> beta, Pointer<ggml_tensor> state) Pointer<ggml_tensor>
ggml_gated_linear_attn(Pointer<ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> q, Pointer<ggml_tensor> g, Pointer<ggml_tensor> state, double scale) Pointer<ggml_tensor>
ggml_geglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_geglu_erf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_geglu_erf_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_geglu_erf_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_geglu_quick(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_geglu_quick_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_geglu_quick_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_geglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_geglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu_erf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu_erf_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu_quick(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_gelu_quick_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_get_data(Pointer<ggml_tensor> tensor) Pointer<Void>
ggml_get_data_f32(Pointer<ggml_tensor> tensor) Pointer<Float>
ggml_get_first_tensor(Pointer<ggml_context> ctx) Pointer<ggml_tensor>
ggml_get_glu_op(Pointer<ggml_tensor> tensor) ggml_glu_op
ggml_get_max_tensor_size(Pointer<ggml_context> ctx) int
ggml_get_mem_buffer(Pointer<ggml_context> ctx) Pointer<Void>
ggml_get_mem_size(Pointer<ggml_context> ctx) int
ggml_get_name(Pointer<ggml_tensor> tensor) Pointer<Char>
ggml_get_next_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> tensor) Pointer<ggml_tensor>
ggml_get_no_alloc(Pointer<ggml_context> ctx) bool
ggml_get_rel_pos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int qh, int kh) Pointer<ggml_tensor>
ggml_get_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_get_rows_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) Pointer<ggml_tensor>
ggml_get_tensor(Pointer<ggml_context> ctx, Pointer<Char> name) Pointer<ggml_tensor>
ggml_get_type_traits(ggml_type type) Pointer<ggml_type_traits>
ggml_get_unary_op(Pointer<ggml_tensor> tensor) ggml_unary_op
ggml_glu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_glu_op op, bool swapped) Pointer<ggml_tensor>
ggml_glu_op_name(ggml_glu_op op) Pointer<Char>
ggml_glu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_glu_op op) Pointer<ggml_tensor>
ggml_graph_add_node(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void
ggml_graph_clear(Pointer<ggml_cgraph> cgraph) → void
ggml_graph_cpy(Pointer<ggml_cgraph> src, Pointer<ggml_cgraph> dst) → void
ggml_graph_dump_dot(Pointer<ggml_cgraph> gb, Pointer<ggml_cgraph> cgraph, Pointer<Char> filename) → void
ggml_graph_dup(Pointer<ggml_context> ctx, Pointer<ggml_cgraph> cgraph, bool force_grads) Pointer<ggml_cgraph>
ggml_graph_get_grad(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> node) Pointer<ggml_tensor>
ggml_graph_get_grad_acc(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> node) Pointer<ggml_tensor>
ggml_graph_get_tensor(Pointer<ggml_cgraph> cgraph, Pointer<Char> name) Pointer<ggml_tensor>
ggml_graph_n_nodes(Pointer<ggml_cgraph> cgraph) int
ggml_graph_node(Pointer<ggml_cgraph> cgraph, int i) Pointer<ggml_tensor>
ggml_graph_nodes(Pointer<ggml_cgraph> cgraph) Pointer<Pointer<ggml_tensor>>
ggml_graph_overhead() int
ggml_graph_overhead_custom(int size, bool grads) int
ggml_graph_print(Pointer<ggml_cgraph> cgraph) → void
ggml_graph_reset(Pointer<ggml_cgraph> cgraph) → void
ggml_graph_size(Pointer<ggml_cgraph> cgraph) int
ggml_group_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) Pointer<ggml_tensor>
ggml_group_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) Pointer<ggml_tensor>
ggml_guid_matches(ggml_guid_t guid_a, ggml_guid_t guid_b) bool
ggml_hardsigmoid(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_hardswish(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_im2col(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D, ggml_type dst_type) Pointer<ggml_tensor>
ggml_im2col_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, ggml_type dst_type) Pointer<ggml_tensor>
ggml_im2col_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<Int64> ne, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D) Pointer<ggml_tensor>
ggml_init(ggml_init_params params) Pointer<ggml_context>
ggml_interpolate(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int mode) Pointer<ggml_tensor>
ggml_is_3d(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous_0(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous_1(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous_2(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous_channels(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguous_rows(Pointer<ggml_tensor> tensor) bool
ggml_is_contiguously_allocated(Pointer<ggml_tensor> tensor) bool
ggml_is_empty(Pointer<ggml_tensor> tensor) bool
ggml_is_matrix(Pointer<ggml_tensor> tensor) bool
ggml_is_permuted(Pointer<ggml_tensor> tensor) bool
ggml_is_quantized(ggml_type type) bool
ggml_is_scalar(Pointer<ggml_tensor> tensor) bool
ggml_is_transposed(Pointer<ggml_tensor> tensor) bool
ggml_is_vector(Pointer<ggml_tensor> tensor) bool
ggml_is_view(Pointer<ggml_tensor> tensor) bool
ggml_l2_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_l2_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_leaky_relu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double negative_slope, bool inplace) Pointer<ggml_tensor>
ggml_log(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_log_get(Pointer<ggml_log_callback> log_callback, Pointer<Pointer<Void>> user_data) → void
ggml_log_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
ggml_map_custom1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_map_custom1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_map_custom2(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_map_custom2_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_map_custom3(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_map_custom3_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) Pointer<ggml_tensor>
ggml_mean(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_mul(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_mul_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_mul_mat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_mul_mat_id(Pointer<ggml_context> ctx, Pointer<ggml_tensor> as, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) Pointer<ggml_tensor>
ggml_mul_mat_set_prec(Pointer<ggml_tensor> a, ggml_prec prec) → void
ggml_n_dims(Pointer<ggml_tensor> tensor) int
ggml_nbytes(Pointer<ggml_tensor> tensor) int
ggml_nbytes_pad(Pointer<ggml_tensor> tensor) int
ggml_neg(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_neg_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_nelements(Pointer<ggml_tensor> tensor) int
ggml_new_buffer(Pointer<ggml_context> ctx, int nbytes) Pointer<Void>
ggml_new_graph(Pointer<ggml_context> ctx) Pointer<ggml_cgraph>
ggml_new_graph_custom(Pointer<ggml_context> ctx, int size, bool grads) Pointer<ggml_cgraph>
ggml_new_tensor(Pointer<ggml_context> ctx, ggml_type type, int n_dims, Pointer<Int64> ne) Pointer<ggml_tensor>
ggml_new_tensor_1d(Pointer<ggml_context> ctx, ggml_type type, int ne0) Pointer<ggml_tensor>
ggml_new_tensor_2d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1) Pointer<ggml_tensor>
ggml_new_tensor_3d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2) Pointer<ggml_tensor>
ggml_new_tensor_4d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3) Pointer<ggml_tensor>
ggml_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_nrows(Pointer<ggml_tensor> tensor) int
ggml_op_desc(Pointer<ggml_tensor> t) Pointer<Char>
ggml_op_name(ggml_op op) Pointer<Char>
ggml_op_symbol(ggml_op op) Pointer<Char>
ggml_opt_step_adamw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> m, Pointer<ggml_tensor> v, Pointer<ggml_tensor> adamw_params) Pointer<ggml_tensor>
ggml_opt_step_sgd(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> sgd_params) Pointer<ggml_tensor>
ggml_out_prod(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_pad(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) Pointer<ggml_tensor>
ggml_pad_circular(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) Pointer<ggml_tensor>
ggml_pad_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) Pointer<ggml_tensor>
ggml_pad_ext_circular(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) Pointer<ggml_tensor>
ggml_pad_reflect_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1) Pointer<ggml_tensor>
ggml_permute(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int axis0, int axis1, int axis2, int axis3) Pointer<ggml_tensor>
ggml_pool_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int s0, int p0) Pointer<ggml_tensor>
ggml_pool_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) Pointer<ggml_tensor>
ggml_pool_2d_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> af, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) Pointer<ggml_tensor>
ggml_print_object(Pointer<ggml_object> obj) → void
ggml_print_objects(Pointer<ggml_context> ctx) → void
ggml_quantize_chunk(ggml_type type, Pointer<Float> src, Pointer<Void> dst, int start, int nrows, int n_per_row, Pointer<Float> imatrix) int
ggml_quantize_free() → void
ggml_quantize_init(ggml_type type) → void
ggml_quantize_requires_imatrix(ggml_type type) bool
ggml_reglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_reglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_reglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_relu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_relu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_repeat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_repeat_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) Pointer<ggml_tensor>
ggml_repeat_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_reset(Pointer<ggml_context> ctx) → void
ggml_reshape(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_reshape_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) Pointer<ggml_tensor>
ggml_reshape_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) Pointer<ggml_tensor>
ggml_reshape_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) Pointer<ggml_tensor>
ggml_reshape_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) Pointer<ggml_tensor>
ggml_rms_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_rms_norm_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double eps) Pointer<ggml_tensor>
ggml_rms_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) Pointer<ggml_tensor>
ggml_roll(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int shift0, int shift1, int shift2, int shift3) Pointer<ggml_tensor>
ggml_rope(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) Pointer<ggml_tensor>
ggml_rope_custom(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_custom_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_ext_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_ext_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) Pointer<ggml_tensor>
ggml_rope_multi(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_multi_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_multi_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) Pointer<ggml_tensor>
ggml_rope_yarn_corr_dims(int n_dims, int n_ctx_orig, double freq_base, double beta_fast, double beta_slow, Pointer<Float> dims) → void
ggml_round(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_round_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_row_size(ggml_type type, int ne) int
ggml_rwkv_wkv6(Pointer<ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> r, Pointer<ggml_tensor> tf, Pointer<ggml_tensor> td, Pointer<ggml_tensor> state) Pointer<ggml_tensor>
ggml_rwkv_wkv7(Pointer<ggml_context> ctx, Pointer<ggml_tensor> r, Pointer<ggml_tensor> w, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> state) Pointer<ggml_tensor>
ggml_scale(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s) Pointer<ggml_tensor>
ggml_scale_bias(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) Pointer<ggml_tensor>
ggml_scale_bias_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) Pointer<ggml_tensor>
ggml_scale_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s) Pointer<ggml_tensor>
ggml_set(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) Pointer<ggml_tensor>
ggml_set_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) Pointer<ggml_tensor>
ggml_set_1d_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) Pointer<ggml_tensor>
ggml_set_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) Pointer<ggml_tensor>
ggml_set_2d_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) Pointer<ggml_tensor>
ggml_set_abort_callback(ggml_abort_callback_t callback) ggml_abort_callback_t
ggml_set_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) Pointer<ggml_tensor>
ggml_set_input(Pointer<ggml_tensor> tensor) → void
ggml_set_loss(Pointer<ggml_tensor> tensor) → void
ggml_set_name(Pointer<ggml_tensor> tensor, Pointer<Char> name) Pointer<ggml_tensor>
ggml_set_no_alloc(Pointer<ggml_context> ctx, bool no_alloc) → void
ggml_set_output(Pointer<ggml_tensor> tensor) → void
ggml_set_param(Pointer<ggml_tensor> tensor) → void
ggml_set_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) Pointer<ggml_tensor>
ggml_set_zero(Pointer<ggml_tensor> tensor) Pointer<ggml_tensor>
ggml_sgn(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sgn_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sigmoid(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sigmoid_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_silu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_silu_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_silu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sin(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sin_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_soft_max(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_soft_max_add_sinks(Pointer<ggml_tensor> a, Pointer<ggml_tensor> sinks) → void
ggml_soft_max_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) Pointer<ggml_tensor>
ggml_soft_max_ext_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) Pointer<ggml_tensor>
ggml_soft_max_ext_back_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) Pointer<ggml_tensor>
ggml_soft_max_ext_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) Pointer<ggml_tensor>
ggml_soft_max_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_softplus(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_softplus_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_solve_tri(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, bool left, bool lower, bool uni) Pointer<ggml_tensor>
ggml_sqr(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sqr_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sqrt(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sqrt_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_ssm_conv(Pointer<ggml_context> ctx, Pointer<ggml_tensor> sx, Pointer<ggml_tensor> c) Pointer<ggml_tensor>
ggml_ssm_scan(Pointer<ggml_context> ctx, Pointer<ggml_tensor> s, Pointer<ggml_tensor> x, Pointer<ggml_tensor> dt, Pointer<ggml_tensor> A, Pointer<ggml_tensor> B, Pointer<ggml_tensor> C, Pointer<ggml_tensor> ids) Pointer<ggml_tensor>
ggml_status_to_string(ggml_status status) Pointer<Char>
ggml_step(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_step_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sub(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_sub_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_sum(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_sum_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_swiglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_swiglu_oai(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double alpha, double limit) Pointer<ggml_tensor>
ggml_swiglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) Pointer<ggml_tensor>
ggml_swiglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_tanh(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_tanh_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_tensor_overhead() int
ggml_threadpool_params_default(int n_threads) ggml_threadpool_params
ggml_threadpool_params_init(Pointer<ggml_threadpool_params> p, int n_threads) → void
ggml_threadpool_params_match(Pointer<ggml_threadpool_params> p0, Pointer<ggml_threadpool_params> p1) bool
ggml_time_init() → void
ggml_time_ms() int
ggml_time_us() int
ggml_timestep_embedding(Pointer<ggml_context> ctx, Pointer<ggml_tensor> timesteps, int dim, int max_period) Pointer<ggml_tensor>
ggml_top_k(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int k) Pointer<ggml_tensor>
ggml_transpose(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_tri(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_tri_type type) Pointer<ggml_tensor>
ggml_trunc(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
Truncates the fractional part of each element in the tensor (towards zero). For example: trunc(3.7) = 3.0, trunc(-2.9) = -2.0 Similar to std::trunc in C/C++.
ggml_trunc_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) Pointer<ggml_tensor>
ggml_type_name(ggml_type type) Pointer<Char>
ggml_type_size(ggml_type type) int
ggml_type_sizef(ggml_type type) double
ggml_unary(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) Pointer<ggml_tensor>
ggml_unary_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) Pointer<ggml_tensor>
ggml_unary_op_name(ggml_unary_op op) Pointer<Char>
ggml_unravel_index(Pointer<ggml_tensor> tensor, int i, Pointer<Int64> i0, Pointer<Int64> i1, Pointer<Int64> i2, Pointer<Int64> i3) → void
ggml_upscale(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int scale_factor, ggml_scale_mode mode) Pointer<ggml_tensor>
ggml_upscale_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, ggml_scale_mode mode) Pointer<ggml_tensor>
ggml_used_mem(Pointer<ggml_context> ctx) int
ggml_validate_row_data(ggml_type type, Pointer<Void> data, int nbytes) bool
ggml_version() Pointer<Char>
ggml_view_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int offset) Pointer<ggml_tensor>
ggml_view_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int nb1, int offset) Pointer<ggml_tensor>
ggml_view_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int nb1, int nb2, int offset) Pointer<ggml_tensor>
ggml_view_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int nb1, int nb2, int nb3, int offset) Pointer<ggml_tensor>
ggml_view_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> src) Pointer<ggml_tensor>
ggml_win_part(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int w) Pointer<ggml_tensor>
ggml_win_unpart(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int w0, int h0, int w) Pointer<ggml_tensor>
ggml_xielu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double alpha_n, double alpha_p, double beta, double eps) Pointer<ggml_tensor>
llama_adapter_get_alora_invocation_tokens(Pointer<llama_adapter_lora> adapter) Pointer<llama_token>
llama_adapter_get_alora_n_invocation_tokens(Pointer<llama_adapter_lora> adapter) int
llama_adapter_lora_free(Pointer<llama_adapter_lora> adapter) → void
llama_adapter_lora_init(Pointer<llama_model> model, Pointer<Char> path_lora) Pointer<llama_adapter_lora>
llama_adapter_meta_count(Pointer<llama_adapter_lora> adapter) int
llama_adapter_meta_key_by_index(Pointer<llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) int
llama_adapter_meta_val_str(Pointer<llama_adapter_lora> adapter, Pointer<Char> key, Pointer<Char> buf, int buf_size) int
llama_adapter_meta_val_str_by_index(Pointer<llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) int
llama_add_bos_token(Pointer<llama_vocab> vocab) bool
llama_add_eos_token(Pointer<llama_vocab> vocab) bool
llama_attach_threadpool(Pointer<llama_context> ctx, ggml_threadpool_t threadpool, ggml_threadpool_t threadpool_batch) → void
llama_backend_free() → void
llama_backend_init() → void
llama_batch_free(llama_batch batch) → void
llama_batch_get_one(Pointer<llama_token> tokens, int n_tokens) llama_batch
llama_batch_init(int n_tokens, int embd, int n_seq_max) llama_batch
llama_chat_apply_template(Pointer<Char> tmpl, Pointer<llama_chat_message> chat, int n_msg, bool add_ass, Pointer<Char> buf, int length) int
Apply chat template. Inspired by hf apply_chat_template() on python.
llama_chat_builtin_templates(Pointer<Pointer<Char>> output, int len) int
llama_context_default_params() llama_context_params
llama_copy_state_data(Pointer<llama_context> ctx, Pointer<Uint8> dst) int
llama_dart_set_log_level(int level) → void
llama_decode(Pointer<llama_context> ctx, llama_batch batch) int
llama_detach_threadpool(Pointer<llama_context> ctx) → void
llama_detokenize(Pointer<llama_vocab> vocab, Pointer<llama_token> tokens, int n_tokens, Pointer<Char> text, int text_len_max, bool remove_special, bool unparse_special) int
@details Convert the provided tokens into text (inverse of llama_tokenize()). @param text The char pointer must be large enough to hold the resulting text. @return Returns the number of chars/bytes on success, no more than text_len_max. @return Returns a negative number on failure - the number of chars/bytes that would have been returned. @param remove_special Allow to remove BOS and EOS tokens if model is configured to do so. @param unparse_special If true, special tokens are rendered in the output.
llama_encode(Pointer<llama_context> ctx, llama_batch batch) int
llama_flash_attn_type_name(llama_flash_attn_type flash_attn_type) Pointer<Char>
llama_free(Pointer<llama_context> ctx) → void
llama_free_model(Pointer<llama_model> model) → void
llama_get_embeddings(Pointer<llama_context> ctx) Pointer<Float>
llama_get_embeddings_ith(Pointer<llama_context> ctx, int i) Pointer<Float>
llama_get_embeddings_seq(Pointer<llama_context> ctx, int seq_id) Pointer<Float>
llama_get_logits(Pointer<llama_context> ctx) Pointer<Float>
llama_get_logits_ith(Pointer<llama_context> ctx, int i) Pointer<Float>
llama_get_memory(Pointer<llama_context> ctx) llama_memory_t
llama_get_model(Pointer<llama_context> ctx) Pointer<llama_model>
llama_get_sampled_candidates_count_ith(Pointer<llama_context> ctx, int i) int
llama_get_sampled_candidates_ith(Pointer<llama_context> ctx, int i) Pointer<llama_token>
llama_get_sampled_logits_count_ith(Pointer<llama_context> ctx, int i) int
llama_get_sampled_logits_ith(Pointer<llama_context> ctx, int i) Pointer<Float>
llama_get_sampled_probs_count_ith(Pointer<llama_context> ctx, int i) int
llama_get_sampled_probs_ith(Pointer<llama_context> ctx, int i) Pointer<Float>
llama_get_sampled_token_ith(Pointer<llama_context> ctx, int i) int
llama_get_state_size(Pointer<llama_context> ctx) int
llama_init_from_model(Pointer<llama_model> model, llama_context_params params) Pointer<llama_context>
llama_load_model_from_file(Pointer<Char> path_model, llama_model_params params) Pointer<llama_model>
llama_load_session_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) bool
llama_log_get(Pointer<ggml_log_callback> log_callback, Pointer<Pointer<Void>> user_data) → void
llama_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
llama_max_devices() int
llama_max_parallel_sequences() int
llama_max_tensor_buft_overrides() int
llama_memory_breakdown_print(Pointer<llama_context> ctx) → void
llama_memory_can_shift(llama_memory_t mem) bool
llama_memory_clear(llama_memory_t mem, bool data) → void
llama_memory_seq_add(llama_memory_t mem, int seq_id, int p0, int p1, int delta) → void
llama_memory_seq_cp(llama_memory_t mem, int seq_id_src, int seq_id_dst, int p0, int p1) → void
llama_memory_seq_div(llama_memory_t mem, int seq_id, int p0, int p1, int d) → void
llama_memory_seq_keep(llama_memory_t mem, int seq_id) → void
llama_memory_seq_pos_max(llama_memory_t mem, int seq_id) int
llama_memory_seq_pos_min(llama_memory_t mem, int seq_id) int
llama_memory_seq_rm(llama_memory_t mem, int seq_id, int p0, int p1) bool
llama_model_chat_template(Pointer<llama_model> model, Pointer<Char> name) Pointer<Char>
llama_model_cls_label(Pointer<llama_model> model, int i) Pointer<Char>
llama_model_decoder_start_token(Pointer<llama_model> model) int
llama_model_default_params() llama_model_params
llama_model_desc(Pointer<llama_model> model, Pointer<Char> buf, int buf_size) int
llama_model_free(Pointer<llama_model> model) → void
llama_model_get_vocab(Pointer<llama_model> model) Pointer<llama_vocab>
llama_model_has_decoder(Pointer<llama_model> model) bool
llama_model_has_encoder(Pointer<llama_model> model) bool
llama_model_init_from_user(Pointer<gguf_context> metadata, llama_model_set_tensor_data_t set_tensor_data, Pointer<Void> set_tensor_data_ud, llama_model_params params) Pointer<llama_model>
llama_model_is_diffusion(Pointer<llama_model> model) bool
llama_model_is_hybrid(Pointer<llama_model> model) bool
llama_model_is_recurrent(Pointer<llama_model> model) bool
llama_model_load_from_file(Pointer<Char> path_model, llama_model_params params) Pointer<llama_model>
llama_model_load_from_file_ptr(Pointer<FILE> file, llama_model_params params) Pointer<llama_model>
llama_model_load_from_splits(Pointer<Pointer<Char>> paths, int n_paths, llama_model_params params) Pointer<llama_model>
llama_model_meta_count(Pointer<llama_model> model) int
llama_model_meta_key_by_index(Pointer<llama_model> model, int i, Pointer<Char> buf, int buf_size) int
llama_model_meta_key_str(llama_model_meta_key key) Pointer<Char>
llama_model_meta_val_str(Pointer<llama_model> model, Pointer<Char> key, Pointer<Char> buf, int buf_size) int
llama_model_meta_val_str_by_index(Pointer<llama_model> model, int i, Pointer<Char> buf, int buf_size) int
llama_model_n_cls_out(Pointer<llama_model> model) int
llama_model_n_ctx_train(Pointer<llama_model> model) int
llama_model_n_embd(Pointer<llama_model> model) int
llama_model_n_embd_inp(Pointer<llama_model> model) int
llama_model_n_embd_out(Pointer<llama_model> model) int
llama_model_n_head(Pointer<llama_model> model) int
llama_model_n_head_kv(Pointer<llama_model> model) int
llama_model_n_layer(Pointer<llama_model> model) int
llama_model_n_params(Pointer<llama_model> model) int
llama_model_n_swa(Pointer<llama_model> model) int
llama_model_quantize(Pointer<Char> fname_inp, Pointer<Char> fname_out, Pointer<llama_model_quantize_params> params) int
llama_model_quantize_default_params() llama_model_quantize_params
llama_model_rope_freq_scale_train(Pointer<llama_model> model) double
llama_model_rope_type(Pointer<llama_model> model) llama_rope_type
llama_model_save_to_file(Pointer<llama_model> model, Pointer<Char> path_model) → void
llama_model_size(Pointer<llama_model> model) int
llama_n_batch(Pointer<llama_context> ctx) int
llama_n_ctx(Pointer<llama_context> ctx) int
llama_n_ctx_seq(Pointer<llama_context> ctx) int
llama_n_ctx_train(Pointer<llama_model> model) int
llama_n_embd(Pointer<llama_model> model) int
llama_n_head(Pointer<llama_model> model) int
llama_n_layer(Pointer<llama_model> model) int
llama_n_seq_max(Pointer<llama_context> ctx) int
llama_n_threads(Pointer<llama_context> ctx) int
llama_n_threads_batch(Pointer<llama_context> ctx) int
llama_n_ubatch(Pointer<llama_context> ctx) int
llama_n_vocab(Pointer<llama_vocab> vocab) int
llama_new_context_with_model(Pointer<llama_model> model, llama_context_params params) Pointer<llama_context>
llama_numa_init(ggml_numa_strategy numa) → void
llama_opt_epoch(Pointer<llama_context> lctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result_train, ggml_opt_result_t result_eval, int idata_split, ggml_opt_epoch_callback callback_train, ggml_opt_epoch_callback callback_eval) → void
llama_opt_init(Pointer<llama_context> lctx, Pointer<llama_model> model, llama_opt_params lopt_params) → void
llama_opt_param_filter_all(Pointer<ggml_tensor> tensor, Pointer<Void> userdata) bool
llama_params_fit(Pointer<Char> path_model, Pointer<llama_model_params> mparams, Pointer<llama_context_params> cparams, Pointer<Float> tensor_split, Pointer<llama_model_tensor_buft_override> tensor_buft_overrides, Pointer<Size> margins, int n_ctx_min, ggml_log_level log_level) llama_params_fit_status
llama_perf_context(Pointer<llama_context> ctx) llama_perf_context_data
llama_perf_context_print(Pointer<llama_context> ctx) → void
llama_perf_context_reset(Pointer<llama_context> ctx) → void
llama_perf_sampler(Pointer<llama_sampler> chain) llama_perf_sampler_data
llama_perf_sampler_print(Pointer<llama_sampler> chain) → void
llama_perf_sampler_reset(Pointer<llama_sampler> chain) → void
llama_pooling_type$1(Pointer<llama_context> ctx) llama_pooling_type
llama_print_system_info() Pointer<Char>
llama_sampler_accept(Pointer<llama_sampler> smpl, int token) → void
llama_sampler_apply(Pointer<llama_sampler> smpl, Pointer<llama_token_data_array> cur_p) → void
llama_sampler_chain_add(Pointer<llama_sampler> chain, Pointer<llama_sampler> smpl) → void
llama_sampler_chain_default_params() llama_sampler_chain_params
llama_sampler_chain_get(Pointer<llama_sampler> chain, int i) Pointer<llama_sampler>
llama_sampler_chain_init(llama_sampler_chain_params params) Pointer<llama_sampler>
llama_sampler_chain_n(Pointer<llama_sampler> chain) int
llama_sampler_chain_remove(Pointer<llama_sampler> chain, int i) Pointer<llama_sampler>
llama_sampler_clone(Pointer<llama_sampler> smpl) Pointer<llama_sampler>
llama_sampler_free(Pointer<llama_sampler> smpl) → void
llama_sampler_get_seed(Pointer<llama_sampler> smpl) int
llama_sampler_init(Pointer<llama_sampler_i> iface, llama_sampler_context_t ctx) Pointer<llama_sampler>
llama_sampler_init_adaptive_p(double target, double decay, int seed) Pointer<llama_sampler>
adaptive-p: select tokens near a configurable target probability over time.
llama_sampler_init_dist(int seed) Pointer<llama_sampler>
seed == LLAMA_DEFAULT_SEED to use a random seed.
llama_sampler_init_dry(Pointer<llama_vocab> vocab, int n_ctx_train, double dry_multiplier, double dry_base, int dry_allowed_length, int dry_penalty_last_n, Pointer<Pointer<Char>> seq_breakers, int num_breakers) Pointer<llama_sampler>
@details DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677, porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
llama_sampler_init_grammar(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root) Pointer<llama_sampler>
@details Initializes a GBNF grammar, see grammars/README.md for details. @param vocab The vocabulary that this grammar will be used with. @param grammar_str The production rules for the grammar, encoded as a string. Returns an empty grammar if empty. Returns NULL if parsing of grammar_str fails. @param grammar_root The name of the start symbol for the grammar.
llama_sampler_init_grammar_lazy(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer<Char>> trigger_words, int num_trigger_words, Pointer<llama_token> trigger_tokens, int num_trigger_tokens) Pointer<llama_sampler>
llama_sampler_init_grammar_lazy_patterns(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer<Char>> trigger_patterns, int num_trigger_patterns, Pointer<llama_token> trigger_tokens, int num_trigger_tokens) Pointer<llama_sampler>
@details Lazy grammar sampler, introduced in https://github.com/ggml-org/llama.cpp/pull/9639 @param trigger_patterns A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group. @param trigger_tokens A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included.
llama_sampler_init_greedy() Pointer<llama_sampler>
llama_sampler_init_infill(Pointer<llama_vocab> vocab) Pointer<llama_sampler>
llama_sampler_init_logit_bias(int n_vocab, int n_logit_bias, Pointer<llama_logit_bias> logit_bias) Pointer<llama_sampler>
llama_sampler_init_min_p(double p, int min_keep) Pointer<llama_sampler>
@details Minimum P sampling as described in https://github.com/ggml-org/llama.cpp/pull/3841
llama_sampler_init_mirostat(int n_vocab, int seed, double tau, double eta, int m) Pointer<llama_sampler>
@details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words. @param candidates A vector of llama_token_data containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates. @param m The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.
llama_sampler_init_mirostat_v2(int seed, double tau, double eta) Pointer<llama_sampler>
@details Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words. @param candidates A vector of llama_token_data containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.
llama_sampler_init_penalties(int penalty_last_n, double penalty_repeat, double penalty_freq, double penalty_present) Pointer<llama_sampler>
NOTE: Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
llama_sampler_init_temp(double t) Pointer<llama_sampler>
#details Updates the logits l_i` = l_i/t. When t <= 0.0f, the maximum logit is kept at it's original value, the rest are set to -inf
llama_sampler_init_temp_ext(double t, double delta, double exponent) Pointer<llama_sampler>
@details Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
llama_sampler_init_top_k(int k) Pointer<llama_sampler>
@details Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751 Setting k <= 0 makes this a noop
llama_sampler_init_top_n_sigma(double n) Pointer<llama_sampler>
@details Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
llama_sampler_init_top_p(double p, int min_keep) Pointer<llama_sampler>
@details Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
llama_sampler_init_typical(double p, int min_keep) Pointer<llama_sampler>
@details Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
llama_sampler_init_xtc(double p, double t, int min_keep, int seed) Pointer<llama_sampler>
@details XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
llama_sampler_name(Pointer<llama_sampler> smpl) Pointer<Char>
llama_sampler_reset(Pointer<llama_sampler> smpl) → void
llama_sampler_sample(Pointer<llama_sampler> smpl, Pointer<llama_context> ctx, int idx) int
llama_save_session_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) bool
llama_set_abort_callback(Pointer<llama_context> ctx, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data) → void
llama_set_adapter_cvec(Pointer<llama_context> ctx, Pointer<Float> data, int len, int n_embd, int il_start, int il_end) int
llama_set_adapters_lora(Pointer<llama_context> ctx, Pointer<Pointer<llama_adapter_lora>> adapters, int n_adapters, Pointer<Float> scales) int
llama_set_causal_attn(Pointer<llama_context> ctx, bool causal_attn) → void
llama_set_embeddings(Pointer<llama_context> ctx, bool embeddings) → void
llama_set_n_threads(Pointer<llama_context> ctx, int n_threads, int n_threads_batch) → void
llama_set_sampler(Pointer<llama_context> ctx, int seq_id, Pointer<llama_sampler> smpl) bool
llama_set_state_data(Pointer<llama_context> ctx, Pointer<Uint8> src) int
llama_set_warmup(Pointer<llama_context> ctx, bool warmup) → void
llama_split_path(Pointer<Char> split_path, int maxlen, Pointer<Char> path_prefix, int split_no, int split_count) int
@details Build a split GGUF final path for this chunk. llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) => split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf"
llama_split_prefix(Pointer<Char> split_prefix, int maxlen, Pointer<Char> split_path, int split_no, int split_count) int
@details Extract the path prefix from the split_path if and only if the split_no and split_count match. llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) => split_prefix = "/models/ggml-model-q4_0"
llama_state_get_data(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size) int
llama_state_get_size(Pointer<llama_context> ctx) int
llama_state_load_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) bool
llama_state_save_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) bool
llama_state_seq_get_data(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id) int
llama_state_seq_get_data_ext(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id, int flags) int
llama_state_seq_get_size(Pointer<llama_context> ctx, int seq_id) int
llama_state_seq_get_size_ext(Pointer<llama_context> ctx, int seq_id, int flags) int
llama_state_seq_load_file(Pointer<llama_context> ctx, Pointer<Char> filepath, int dest_seq_id, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) int
llama_state_seq_save_file(Pointer<llama_context> ctx, Pointer<Char> filepath, int seq_id, Pointer<llama_token> tokens, int n_token_count) int
llama_state_seq_set_data(Pointer<llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id) int
llama_state_seq_set_data_ext(Pointer<llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id, int flags) int
llama_state_set_data(Pointer<llama_context> ctx, Pointer<Uint8> src, int size) int
llama_supports_gpu_offload() bool
llama_supports_mlock() bool
llama_supports_mmap() bool
llama_supports_rpc() bool
llama_synchronize(Pointer<llama_context> ctx) → void
llama_time_us() int
llama_token_bos(Pointer<llama_vocab> vocab) int
llama_token_cls(Pointer<llama_vocab> vocab) int
llama_token_eos(Pointer<llama_vocab> vocab) int
llama_token_eot(Pointer<llama_vocab> vocab) int
llama_token_fim_mid(Pointer<llama_vocab> vocab) int
llama_token_fim_pad(Pointer<llama_vocab> vocab) int
llama_token_fim_pre(Pointer<llama_vocab> vocab) int
llama_token_fim_rep(Pointer<llama_vocab> vocab) int
llama_token_fim_sep(Pointer<llama_vocab> vocab) int
llama_token_fim_suf(Pointer<llama_vocab> vocab) int
llama_token_get_attr(Pointer<llama_vocab> vocab, Dartllama_token token) llama_token_attr
llama_token_get_score(Pointer<llama_vocab> vocab, int token) double
llama_token_get_text(Pointer<llama_vocab> vocab, int token) Pointer<Char>
llama_token_is_control(Pointer<llama_vocab> vocab, int token) bool
llama_token_is_eog(Pointer<llama_vocab> vocab, int token) bool
llama_token_nl(Pointer<llama_vocab> vocab) int
llama_token_pad(Pointer<llama_vocab> vocab) int
llama_token_sep(Pointer<llama_vocab> vocab) int
llama_token_to_piece(Pointer<llama_vocab> vocab, int token, Pointer<Char> buf, int length, int lstrip, bool special) int
llama_tokenize(Pointer<llama_vocab> vocab, Pointer<Char> text, int text_len, Pointer<llama_token> tokens, int n_tokens_max, bool add_special, bool parse_special) int
@details Convert the provided text into tokens. @param tokens The tokens pointer must be large enough to hold the resulting tokens. @return Returns the number of tokens on success, no more than n_tokens_max @return Returns a negative number on failure - the number of tokens that would have been returned @return Returns INT32_MIN on overflow (e.g., tokenization result size exceeds int32_t limit) @param add_special Allow to add BOS and EOS tokens if model is configured to do so. @param parse_special Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
llama_vocab_bos(Pointer<llama_vocab> vocab) int
llama_vocab_cls(Pointer<llama_vocab> vocab) int
llama_vocab_eos(Pointer<llama_vocab> vocab) int
llama_vocab_eot(Pointer<llama_vocab> vocab) int
llama_vocab_fim_mid(Pointer<llama_vocab> vocab) int
llama_vocab_fim_pad(Pointer<llama_vocab> vocab) int
llama_vocab_fim_pre(Pointer<llama_vocab> vocab) int
llama_vocab_fim_rep(Pointer<llama_vocab> vocab) int
llama_vocab_fim_sep(Pointer<llama_vocab> vocab) int
llama_vocab_fim_suf(Pointer<llama_vocab> vocab) int
llama_vocab_get_add_bos(Pointer<llama_vocab> vocab) bool
llama_vocab_get_add_eos(Pointer<llama_vocab> vocab) bool
llama_vocab_get_add_sep(Pointer<llama_vocab> vocab) bool
llama_vocab_get_attr(Pointer<llama_vocab> vocab, Dartllama_token token) llama_token_attr
llama_vocab_get_score(Pointer<llama_vocab> vocab, int token) double
llama_vocab_get_text(Pointer<llama_vocab> vocab, int token) Pointer<Char>
llama_vocab_is_control(Pointer<llama_vocab> vocab, int token) bool
llama_vocab_is_eog(Pointer<llama_vocab> vocab, int token) bool
llama_vocab_mask(Pointer<llama_vocab> vocab) int
llama_vocab_n_tokens(Pointer<llama_vocab> vocab) int
llama_vocab_nl(Pointer<llama_vocab> vocab) int
llama_vocab_pad(Pointer<llama_vocab> vocab) int
llama_vocab_sep(Pointer<llama_vocab> vocab) int
llama_vocab_type$1(Pointer<llama_vocab> vocab) llama_vocab_type
mtmd_bitmap_free(Pointer<mtmd_bitmap> bitmap) → void
mtmd_bitmap_get_data(Pointer<mtmd_bitmap> bitmap) Pointer<UnsignedChar>
mtmd_bitmap_get_id(Pointer<mtmd_bitmap> bitmap) Pointer<Char>
mtmd_bitmap_get_n_bytes(Pointer<mtmd_bitmap> bitmap) int
mtmd_bitmap_get_nx(Pointer<mtmd_bitmap> bitmap) int
mtmd_bitmap_get_ny(Pointer<mtmd_bitmap> bitmap) int
mtmd_bitmap_init(int nx, int ny, Pointer<UnsignedChar> data) Pointer<mtmd_bitmap>
mtmd_bitmap_init_from_audio(int n_samples, Pointer<Float> data) Pointer<mtmd_bitmap>
mtmd_bitmap_is_audio(Pointer<mtmd_bitmap> bitmap) bool
mtmd_bitmap_set_id(Pointer<mtmd_bitmap> bitmap, Pointer<Char> id) → void
mtmd_context_params_default() mtmd_context_params
mtmd_decode_use_mrope(Pointer<mtmd_context> ctx) bool
mtmd_decode_use_non_causal(Pointer<mtmd_context> ctx) bool
mtmd_default_marker() Pointer<Char>
mtmd_encode(Pointer<mtmd_context> ctx, Pointer<mtmd_image_tokens> image_tokens) int
mtmd_encode_chunk(Pointer<mtmd_context> ctx, Pointer<mtmd_input_chunk> chunk) int
mtmd_free(Pointer<mtmd_context> ctx) → void
mtmd_get_audio_sample_rate(Pointer<mtmd_context> ctx) int
mtmd_get_output_embd(Pointer<mtmd_context> ctx) Pointer<Float>
mtmd_helper_bitmap_init_from_buf(Pointer<mtmd_context> ctx, Pointer<UnsignedChar> buf, int len) Pointer<mtmd_bitmap>
mtmd_helper_bitmap_init_from_file(Pointer<mtmd_context> ctx, Pointer<Char> fname) Pointer<mtmd_bitmap>
mtmd_helper_decode_image_chunk(Pointer<mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunk> chunk, Pointer<Float> encoded_embd, int n_past, int seq_id, int n_batch, Pointer<llama_pos> new_n_past) int
mtmd_helper_eval_chunk_single(Pointer<mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunk> chunk, int n_past, int seq_id, int n_batch, bool logits_last, Pointer<llama_pos> new_n_past) int
mtmd_helper_eval_chunks(Pointer<mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunks> chunks, int n_past, int seq_id, int n_batch, bool logits_last, Pointer<llama_pos> new_n_past) int
mtmd_helper_get_n_pos(Pointer<mtmd_input_chunks> chunks) int
mtmd_helper_get_n_tokens(Pointer<mtmd_input_chunks> chunks) int
mtmd_helper_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
mtmd_image_tokens_get_id(Pointer<mtmd_image_tokens> image_tokens) Pointer<Char>
mtmd_image_tokens_get_n_pos(Pointer<mtmd_image_tokens> image_tokens) int
mtmd_image_tokens_get_n_tokens(Pointer<mtmd_image_tokens> image_tokens) int
mtmd_image_tokens_get_nx(Pointer<mtmd_image_tokens> image_tokens) int
mtmd_image_tokens_get_ny(Pointer<mtmd_image_tokens> image_tokens) int
mtmd_init_from_file(Pointer<Char> mmproj_fname, Pointer<llama_model> text_model, mtmd_context_params ctx_params) Pointer<mtmd_context>
mtmd_input_chunk_copy(Pointer<mtmd_input_chunk> chunk) Pointer<mtmd_input_chunk>
mtmd_input_chunk_free(Pointer<mtmd_input_chunk> chunk) → void
mtmd_input_chunk_get_id(Pointer<mtmd_input_chunk> chunk) Pointer<Char>
mtmd_input_chunk_get_n_pos(Pointer<mtmd_input_chunk> chunk) int
mtmd_input_chunk_get_n_tokens(Pointer<mtmd_input_chunk> chunk) int
mtmd_input_chunk_get_tokens_image(Pointer<mtmd_input_chunk> chunk) Pointer<mtmd_image_tokens>
mtmd_input_chunk_get_tokens_text(Pointer<mtmd_input_chunk> chunk, Pointer<Size> n_tokens_output) Pointer<llama_token>
mtmd_input_chunk_get_type(Pointer<mtmd_input_chunk> chunk) mtmd_input_chunk_type
mtmd_input_chunks_free(Pointer<mtmd_input_chunks> chunks) → void
mtmd_input_chunks_get(Pointer<mtmd_input_chunks> chunks, int idx) Pointer<mtmd_input_chunk>
mtmd_input_chunks_init() Pointer<mtmd_input_chunks>
mtmd_input_chunks_size(Pointer<mtmd_input_chunks> chunks) int
mtmd_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
mtmd_support_audio(Pointer<mtmd_context> ctx) bool
mtmd_support_vision(Pointer<mtmd_context> ctx) bool
mtmd_test_create_input_chunks() Pointer<mtmd_input_chunks>
//////////////////////////////////////
mtmd_tokenize(Pointer<mtmd_context> ctx, Pointer<mtmd_input_chunks> output, Pointer<mtmd_input_text> text, Pointer<Pointer<mtmd_bitmap>> bitmaps, int n_bitmaps) int

Typedefs

Dart__off64_t = int
Dart__off_t = int
Dart_IO_lock_t = void
Dartggml_abort_callback_tFunction = void Function(Pointer<Char> error_message)
Dartggml_abort_callbackFunction = bool Function(Pointer<Void> data)
Dartggml_backend_eval_callbackFunction = bool Function(int node_index, Pointer<ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data)
Dartggml_backend_sched_eval_callbackFunction = bool Function(Pointer<ggml_tensor> t, bool ask, Pointer<Void> user_data)
Dartggml_backend_set_abort_callback_tFunction = void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data)
Dartggml_backend_set_n_threads_tFunction = void Function(ggml_backend_t backend, int n_threads)
Dartggml_backend_split_buffer_type_tFunction = ggml_backend_buffer_type_t Function(int main_device, Pointer<Float> tensor_split)
Dartggml_custom1_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom2_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom3_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom_op_tFunction = void Function(Pointer<ggml_tensor> dst, int ith, int nth, Pointer<Void> userdata)
Dartggml_fp16_t = int
Dartggml_from_float_tFunction = void Function(Pointer<Float> x, Pointer<Void> y, int k)
Dartggml_log_callbackFunction = void Function(ggml_log_level level, Pointer<Char> text, Pointer<Void> user_data)
Dartggml_opt_epoch_callbackFunction = void Function(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us)
Dartggml_to_float_tFunction = void Function(Pointer<Void> x, Pointer<Float> y, int k)
Dartllama_model_set_tensor_data_tFunction = void Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
Dartllama_opt_param_filterFunction = bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
Dartllama_pos = int
Dartllama_progress_callbackFunction = bool Function(double progress, Pointer<Void> user_data)
Dartllama_seq_id = int
Dartllama_state_seq_flags = int
Dartllama_token = int
FILE = _IO_FILE
ggml_abort_callback = Pointer<NativeFunction<ggml_abort_callbackFunction>>
ggml_abort_callback_t = Pointer<NativeFunction<ggml_abort_callback_tFunction>>
ggml_abort_callback_tFunction = Void Function(Pointer<Char> error_message)
ggml_abort_callbackFunction = Bool Function(Pointer<Void> data)
ggml_backend_buffer_t = Pointer<ggml_backend_buffer>
ggml_backend_buffer_type_t = Pointer<ggml_backend_buffer_type>
ggml_backend_dev_get_extra_bufts_t = Pointer<NativeFunction<ggml_backend_dev_get_extra_bufts_tFunction>>
ggml_backend_dev_get_extra_bufts_tFunction = Pointer<ggml_backend_buffer_type_t> Function(ggml_backend_dev_t device)
ggml_backend_dev_t = Pointer<ggml_backend_device>
ggml_backend_eval_callback = Pointer<NativeFunction<ggml_backend_eval_callbackFunction>>
ggml_backend_eval_callbackFunction = Bool Function(Int node_index, Pointer<ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data)
ggml_backend_event_t = Pointer<ggml_backend_event>
ggml_backend_get_features_t = Pointer<NativeFunction<ggml_backend_get_features_tFunction>>
ggml_backend_get_features_tFunction = Pointer<ggml_backend_feature> Function(ggml_backend_reg_t reg)
ggml_backend_graph_plan_t = Pointer<Void>
ggml_backend_reg_t = Pointer<ggml_backend_reg>
ggml_backend_sched_eval_callback = Pointer<NativeFunction<ggml_backend_sched_eval_callbackFunction>>
ggml_backend_sched_eval_callbackFunction = Bool Function(Pointer<ggml_tensor> t, Bool ask, Pointer<Void> user_data)
ggml_backend_sched_t = Pointer<ggml_backend_sched>
ggml_backend_set_abort_callback_t = Pointer<NativeFunction<ggml_backend_set_abort_callback_tFunction>>
ggml_backend_set_abort_callback_tFunction = Void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data)
ggml_backend_set_n_threads_t = Pointer<NativeFunction<ggml_backend_set_n_threads_tFunction>>
ggml_backend_set_n_threads_tFunction = Void Function(ggml_backend_t backend, Int n_threads)
ggml_backend_split_buffer_type_t = Pointer<NativeFunction<ggml_backend_split_buffer_type_tFunction>>
ggml_backend_split_buffer_type_tFunction = ggml_backend_buffer_type_t Function(Int main_device, Pointer<Float> tensor_split)
ggml_backend_t = Pointer<ggml_backend>
ggml_custom1_op_t = Pointer<NativeFunction<ggml_custom1_op_tFunction>>
ggml_custom1_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom2_op_t = Pointer<NativeFunction<ggml_custom2_op_tFunction>>
ggml_custom2_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom3_op_t = Pointer<NativeFunction<ggml_custom3_op_tFunction>>
ggml_custom3_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom_op_t = Pointer<NativeFunction<ggml_custom_op_tFunction>>
ggml_custom_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Int ith, Int nth, Pointer<Void> userdata)
ggml_fp16_t = Uint16
ggml_from_float_t = Pointer<NativeFunction<ggml_from_float_tFunction>>
ggml_from_float_tFunction = Void Function(Pointer<Float> x, Pointer<Void> y, Int64 k)
ggml_guid_t = Pointer<Pointer<Uint8>>
ggml_log_callback = Pointer<NativeFunction<ggml_log_callbackFunction>>
ggml_log_callbackFunction = Void Function(UnsignedInt level, Pointer<Char> text, Pointer<Void> user_data)
ggml_opt_context_t = Pointer<ggml_opt_context>
ggml_opt_dataset_t = Pointer<ggml_opt_dataset>
ggml_opt_epoch_callback = Pointer<NativeFunction<ggml_opt_epoch_callbackFunction>>
ggml_opt_epoch_callbackFunction = Void Function(Bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, Int64 ibatch, Int64 ibatch_max, Int64 t_start_us)
ggml_opt_get_optimizer_params = Pointer<NativeFunction<ggml_opt_get_optimizer_paramsFunction>>
ggml_opt_get_optimizer_paramsFunction = ggml_opt_optimizer_params Function(Pointer<Void> userdata)
ggml_opt_result_t = Pointer<ggml_opt_result>
ggml_threadpool_t = Pointer<ggml_threadpool>
ggml_to_float_t = Pointer<NativeFunction<ggml_to_float_tFunction>>
ggml_to_float_tFunction = Void Function(Pointer<Void> x, Pointer<Float> y, Int64 k)
llama_memory_t = Pointer<llama_memory_i>
llama_model_set_tensor_data_t = Pointer<NativeFunction<llama_model_set_tensor_data_tFunction>>
llama_model_set_tensor_data_tFunction = Void Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
llama_opt_param_filter = Pointer<NativeFunction<llama_opt_param_filterFunction>>
llama_opt_param_filterFunction = Bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
llama_pos = Int32
llama_progress_callback = Pointer<NativeFunction<llama_progress_callbackFunction>>
llama_progress_callbackFunction = Bool Function(Float progress, Pointer<Void> user_data)
llama_sampler_context_t = Pointer<Void>
llama_seq_id = Int32
llama_state_seq_flags = Uint32
llama_token = Int32
LlamaLogHandler = void Function(LlamaLogRecord record)
Type definition for custom log handlers.
ToolHandler = Future<Object?> Function(ToolParams params)
Signature for a tool handler function.

Exceptions / Errors

LlamaContextException
Exception thrown when a context operation fails.
LlamaException
Base class for all Llama-related exceptions.
LlamaInferenceException
Exception thrown during text generation or tokenization.
LlamaModelException
Exception thrown when a model fails to load.
LlamaStateException
Exception thrown when the engine is in an invalid state.
LlamaUnsupportedException
Exception thrown when an operation is not supported on the current platform.