llamadart library
High-performance Dart and Flutter plugin for llama.cpp.
llamadart allows you to run Large Language Models (LLMs) locally using GGUF models across all major platforms (Android, iOS, macOS, Linux, Windows, Web).
Core Components
- LlamaEngine: The low-level orchestrator for model loading, tokenization, and raw inference.
- ChatSession: A high-level, stateful interface for chat-based interactions. It automatically manages conversation history and context window limits.
- LlamaBackend: The platform-agnostic interface for inference.
Simple Example
final engine = LlamaEngine(LlamaBackend());
await engine.loadModel('path/to/model.gguf');
final session = ChatSession(engine);
await for (final token in session.create([LlamaTextContent('Hello!')])) {
stdout.write(token);
}
await engine.dispose();
Classes
- BackendAvailability
- Optional backend capability for exposing selectable backend options.
- BackendBatchEmbeddings
- Optional backend capability for batching embedding requests.
- BackendEmbeddings
- Optional backend capability for generating text embeddings.
- BackendPerfContextData
- Native performance timings reported by llama.cpp for the active context.
- BackendPerformanceDiagnostics
- Optional backend capability for exposing llama.cpp perf timings.
- BackendRuntimeDiagnostics
- Optional backend capability for exposing resolved runtime diagnostics.
- ChatParseResult
- The result of parsing raw LLM output into structured components.
- ChatSession
- Convenience wrapper for multi-turn chat with automatic history management.
- ChatTemplateEngine
- Orchestrates chat template detection, rendering, and output parsing.
- ChatTemplateHandler
- Abstract base class for per-format chat template handlers.
- GenerationGrammarTrigger
- Parameters controlling the token sampling and generation process.
- GenerationParams
- Parameters controlling the token sampling and generation process.
- ggml_backend
- ggml_backend_buffer
- ggml_backend_buffer_type
- ggml_backend_dev_caps
- ggml_backend_dev_props
- ggml_backend_device
- ggml_backend_event
- ggml_backend_feature
- ggml_backend_graph_copy$1
- ggml_backend_reg
- ggml_backend_sched
- ggml_bf16_t
- ggml_cgraph
- ggml_context
- ggml_init_params
- ggml_object
- ggml_opt_context
- ggml_opt_dataset
- ggml_opt_optimizer_params
- ggml_opt_result
- ggml_tensor
- ggml_threadpool
- ggml_threadpool_params
- ggml_type_traits
- gguf_context
- GrammarTrigger
- A trigger that activates grammar constraints.
- llama_adapter_lora
- llama_batch
- llama_chat_message
- llama_context
- llama_context_params
- llama_logit_bias
- llama_memory_i
- llama_model
- llama_model_imatrix_data
- llama_model_kv_override
- llama_model_params
- llama_model_quantize_params
- llama_model_tensor_buft_override
- llama_model_tensor_override
- llama_opt_params
- llama_perf_context_data
- llama_perf_sampler_data
- llama_sampler
- llama_sampler_chain_params
- llama_sampler_data
- llama_sampler_i
- llama_sampler_seq_config
- llama_token_data
- llama_token_data_array
- llama_vocab
- LlamaAudioContent
- A part of a message containing audio data for speech-to-text models.
- LlamaBackend
- Platform-agnostic interface for Llama model inference.
- LlamaChatMessage
- A message in a chat conversation history.
- LlamaChatTemplateResult
- The result of applying a chat template to a conversation history.
- LlamaCompletionChunk
-
Represents a streaming chunk of a chat completion.
Aligns with OpenAI's
ChatCompletionChunk. - LlamaCompletionChunkChoice
- Represents a choice in a completion chunk.
- LlamaCompletionChunkDelta
- Represents a delta in a completion choice.
- LlamaCompletionChunkFunction
- Represents a function call within a tool call.
- LlamaCompletionChunkToolCall
-
Represents a tool call within a completion chunk.
Aligns with OpenAI's
ToolCallin streaming chunks. - LlamaContentPart
- Base class for all content types in a message.
- LlamaEngine
- Stateless chat completions engine (like OpenAI's Chat Completions API).
- LlamaImageContent
- A part of a message containing image data for vision models.
- LlamaLogger
- A lightweight singleton logger for the llama_dart library.
- LlamaLogRecord
- A log record containing level, message, time, and optional error info.
- LlamaTextContent
- A part of a message containing plain text.
- LlamaThinkingContent
- A part of a message containing the model's Chain-of-Thought (reasoning).
- LlamaToolCallContent
- A part of a message containing a tool call solicitation from the model.
- LlamaToolResultContent
- A part of a message containing the result of a tool execution.
- LoraAdapterConfig
- Configuration for a LoRA (Low-Rank Adaptation) adapter.
- ModelParams
- Configuration parameters for loading a Llama model.
- mtmd_bitmap
- mtmd_context
- mtmd_context_params
- mtmd_image_tokens
- mtmd_input_chunk
- mtmd_input_chunks
- mtmd_input_text
- ToolDefinition
- Defines a tool that the LLM can invoke.
- ToolParam
- Represents a single parameter in a tool's input schema.
- ToolParams
- Provides type-safe access to tool call arguments.
- UnnamedStruct
- UnnamedStruct$1
- UnnamedUnion
Enums
- ChatFormat
- Identifies the chat format detected from a model's template source.
- ggml_backend_buffer_usage
- ggml_backend_dev_type
- ggml_ftype
- ggml_glu_op
- ggml_log_level
- ggml_numa_strategy
- ggml_object_type
- ggml_op
- ggml_op_pool
- ggml_opt_optimizer_type
- ggml_prec
- ggml_scale_flag
- ggml_scale_mode
- ggml_sched_priority
- ggml_sort_order
- ggml_status
- ggml_tensor_flag
- ggml_tri_type
- ggml_type
- ggml_unary_op
- GpuBackend
- GPU backend selection for runtime device preference.
- llama_attention_type
- llama_flash_attn_type
- llama_ftype
- llama_model_kv_override_type
- llama_model_meta_key
- llama_params_fit_status
- llama_pooling_type
- llama_rope_scaling_type
- llama_rope_type
- llama_split_mode
- llama_token_attr
- llama_token_type
- llama_vocab_type
- LlamaChatRole
- Role of a message sender in a chat conversation.
- LlamaLogLevel
- Log level for the underlying llama.cpp engine.
- mtmd_input_chunk_type
- ToolChoice
- Controls which tool(s) the model should call.
Constants
- GGML_DEFAULT_GRAPH_SIZE → const int
- GGML_DEFAULT_N_THREADS → const int
- GGML_EXIT_ABORTED → const int
- GGML_EXIT_SUCCESS → const int
- GGML_FILE_MAGIC → const int
- GGML_FILE_VERSION → const int
- GGML_MAX_DIMS → const int
- GGML_MAX_N_THREADS → const int
- GGML_MAX_NAME → const int
- GGML_MAX_OP_PARAMS → const int
- GGML_MAX_PARAMS → const int
- GGML_MAX_SRC → const int
- GGML_MEM_ALIGN → const int
- GGML_MROPE_SECTIONS → const int
- GGML_N_TASKS_MAX → const int
- GGML_QNT_VERSION → const int
- GGML_QNT_VERSION_FACTOR → const int
- GGML_ROPE_TYPE_IMROPE → const int
- GGML_ROPE_TYPE_MROPE → const int
- GGML_ROPE_TYPE_NEOX → const int
- GGML_ROPE_TYPE_NORMAL → const int
- GGML_ROPE_TYPE_VISION → const int
- LLAMA_DEFAULT_SEED → const int
- LLAMA_FILE_MAGIC_GGLA → const int
- LLAMA_FILE_MAGIC_GGSN → const int
- LLAMA_FILE_MAGIC_GGSQ → const int
- LLAMA_SESSION_MAGIC → const int
- LLAMA_SESSION_VERSION → const int
- LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY → const int
- LLAMA_STATE_SEQ_FLAGS_SWA_ONLY → const int
- LLAMA_STATE_SEQ_MAGIC → const int
- LLAMA_STATE_SEQ_VERSION → const int
- LLAMA_TOKEN_NULL → const int
- MTMD_DEFAULT_IMAGE_MARKER → const String
Properties
- GGML_TENSOR_SIZE → int
-
final
Functions
-
ggml_abort(
Pointer< Char> file, int line, Pointer<Char> fmt) → void -
ggml_abs(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_abs_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_acc(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor> -
ggml_acc_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor> -
ggml_add(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_add1(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_add1_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_add_cast(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_type type) → Pointer<ggml_tensor> -
ggml_add_id(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor> -
ggml_add_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_add_rel_pos(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) → Pointer<ggml_tensor> -
ggml_add_rel_pos_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) → Pointer<ggml_tensor> -
ggml_arange(
Pointer< ggml_context> ctx, double start, double stop, double step) → Pointer<ggml_tensor> -
ggml_are_same_shape(
Pointer< ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool -
ggml_are_same_stride(
Pointer< ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool -
ggml_argmax(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_argsort(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_sort_order order) → Pointer<ggml_tensor> -
ggml_argsort_top_k(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int k) → Pointer<ggml_tensor> -
ggml_backend_alloc_buffer(
ggml_backend_t backend, int size) → ggml_backend_buffer_t -
ggml_backend_buffer_clear(
ggml_backend_buffer_t buffer, int value) → void -
ggml_backend_buffer_free(
ggml_backend_buffer_t buffer) → void -
ggml_backend_buffer_get_alignment(
ggml_backend_buffer_t buffer) → int -
ggml_backend_buffer_get_alloc_size(
ggml_backend_buffer_t buffer, Pointer< ggml_tensor> tensor) → int -
ggml_backend_buffer_get_base(
ggml_backend_buffer_t buffer) → Pointer< Void> -
ggml_backend_buffer_get_max_size(
ggml_backend_buffer_t buffer) → int -
ggml_backend_buffer_get_size(
ggml_backend_buffer_t buffer) → int -
ggml_backend_buffer_get_type(
ggml_backend_buffer_t buffer) → ggml_backend_buffer_type_t -
ggml_backend_buffer_get_usage(
ggml_backend_buffer_t buffer) → ggml_backend_buffer_usage -
ggml_backend_buffer_init_tensor(
ggml_backend_buffer_t buffer, Pointer< ggml_tensor> tensor) → ggml_status -
ggml_backend_buffer_is_host(
ggml_backend_buffer_t buffer) → bool -
ggml_backend_buffer_name(
ggml_backend_buffer_t buffer) → Pointer< Char> -
ggml_backend_buffer_reset(
ggml_backend_buffer_t buffer) → void -
ggml_backend_buffer_set_usage(
ggml_backend_buffer_t buffer, ggml_backend_buffer_usage usage) → void -
ggml_backend_buft_alloc_buffer(
ggml_backend_buffer_type_t buft, int size) → ggml_backend_buffer_t -
ggml_backend_buft_get_alignment(
ggml_backend_buffer_type_t buft) → int -
ggml_backend_buft_get_alloc_size(
ggml_backend_buffer_type_t buft, Pointer< ggml_tensor> tensor) → int -
ggml_backend_buft_get_device(
ggml_backend_buffer_type_t buft) → ggml_backend_dev_t -
ggml_backend_buft_get_max_size(
ggml_backend_buffer_type_t buft) → int -
ggml_backend_buft_is_host(
ggml_backend_buffer_type_t buft) → bool -
ggml_backend_buft_name(
ggml_backend_buffer_type_t buft) → Pointer< Char> -
ggml_backend_compare_graph_backend(
ggml_backend_t backend1, ggml_backend_t backend2, Pointer< ggml_cgraph> graph, ggml_backend_eval_callback callback, Pointer<Void> user_data, Pointer<Pointer< test_nodes, int num_test_nodes) → boolggml_tensor> > -
ggml_backend_cpu_buffer_from_ptr(
Pointer< Void> ptr, int size) → ggml_backend_buffer_t -
ggml_backend_cpu_buffer_type(
) → ggml_backend_buffer_type_t -
ggml_backend_dev_backend_reg(
ggml_backend_dev_t device) → ggml_backend_reg_t -
ggml_backend_dev_buffer_from_host_ptr(
ggml_backend_dev_t device, Pointer< Void> ptr, int size, int max_tensor_size) → ggml_backend_buffer_t -
ggml_backend_dev_buffer_type(
ggml_backend_dev_t device) → ggml_backend_buffer_type_t -
ggml_backend_dev_by_name(
Pointer< Char> name) → ggml_backend_dev_t -
ggml_backend_dev_by_type(
ggml_backend_dev_type type) → ggml_backend_dev_t -
ggml_backend_dev_count(
) → int -
ggml_backend_dev_description(
ggml_backend_dev_t device) → Pointer< Char> -
ggml_backend_dev_get(
int index) → ggml_backend_dev_t -
ggml_backend_dev_get_props(
ggml_backend_dev_t device, Pointer< ggml_backend_dev_props> props) → void -
ggml_backend_dev_host_buffer_type(
ggml_backend_dev_t device) → ggml_backend_buffer_type_t -
ggml_backend_dev_init(
ggml_backend_dev_t device, Pointer< Char> params) → ggml_backend_t -
ggml_backend_dev_memory(
ggml_backend_dev_t device, Pointer< Size> free, Pointer<Size> total) → void -
ggml_backend_dev_name(
ggml_backend_dev_t device) → Pointer< Char> -
ggml_backend_dev_offload_op(
ggml_backend_dev_t device, Pointer< ggml_tensor> op) → bool -
ggml_backend_dev_supports_buft(
ggml_backend_dev_t device, ggml_backend_buffer_type_t buft) → bool -
ggml_backend_dev_supports_op(
ggml_backend_dev_t device, Pointer< ggml_tensor> op) → bool -
ggml_backend_dev_type$1(
ggml_backend_dev_t device) → ggml_backend_dev_type -
ggml_backend_device_register(
ggml_backend_dev_t device) → void -
ggml_backend_event_free(
ggml_backend_event_t event) → void -
ggml_backend_event_new(
ggml_backend_dev_t device) → ggml_backend_event_t -
ggml_backend_event_record(
ggml_backend_event_t event, ggml_backend_t backend) → void -
ggml_backend_event_synchronize(
ggml_backend_event_t event) → void -
ggml_backend_event_wait(
ggml_backend_t backend, ggml_backend_event_t event) → void -
ggml_backend_free(
ggml_backend_t backend) → void -
ggml_backend_get_alignment(
ggml_backend_t backend) → int -
ggml_backend_get_default_buffer_type(
ggml_backend_t backend) → ggml_backend_buffer_type_t -
ggml_backend_get_device(
ggml_backend_t backend) → ggml_backend_dev_t -
ggml_backend_get_max_size(
ggml_backend_t backend) → int -
ggml_backend_graph_compute(
ggml_backend_t backend, Pointer< ggml_cgraph> cgraph) → ggml_status -
ggml_backend_graph_compute_async(
ggml_backend_t backend, Pointer< ggml_cgraph> cgraph) → ggml_status -
ggml_backend_graph_copy(
ggml_backend_t backend, Pointer< ggml_cgraph> graph) → ggml_backend_graph_copy$1 -
ggml_backend_graph_copy_free(
ggml_backend_graph_copy$1 copy) → void -
ggml_backend_graph_plan_compute(
ggml_backend_t backend, ggml_backend_graph_plan_t plan) → ggml_status -
ggml_backend_graph_plan_create(
ggml_backend_t backend, Pointer< ggml_cgraph> cgraph) → ggml_backend_graph_plan_t -
ggml_backend_graph_plan_free(
ggml_backend_t backend, ggml_backend_graph_plan_t plan) → void -
ggml_backend_guid(
ggml_backend_t backend) → ggml_guid_t -
ggml_backend_init_best(
) → ggml_backend_t -
ggml_backend_init_by_name(
Pointer< Char> name, Pointer<Char> params) → ggml_backend_t -
ggml_backend_init_by_type(
ggml_backend_dev_type type, Pointer< Char> params) → ggml_backend_t -
ggml_backend_load(
Pointer< Char> path) → ggml_backend_reg_t -
ggml_backend_load_all(
) → void -
ggml_backend_load_all_from_path(
Pointer< Char> dir_path) → void -
ggml_backend_name(
ggml_backend_t backend) → Pointer< Char> -
ggml_backend_offload_op(
ggml_backend_t backend, Pointer< ggml_tensor> op) → bool -
ggml_backend_reg_by_name(
Pointer< Char> name) → ggml_backend_reg_t -
ggml_backend_reg_count(
) → int -
ggml_backend_reg_dev_count(
ggml_backend_reg_t reg) → int -
ggml_backend_reg_dev_get(
ggml_backend_reg_t reg, int index) → ggml_backend_dev_t -
ggml_backend_reg_get(
int index) → ggml_backend_reg_t -
ggml_backend_reg_get_proc_address(
ggml_backend_reg_t reg, Pointer< Char> name) → Pointer<Void> -
ggml_backend_reg_name(
ggml_backend_reg_t reg) → Pointer< Char> -
ggml_backend_register(
ggml_backend_reg_t reg) → void -
ggml_backend_sched_alloc_graph(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> graph) → bool -
ggml_backend_sched_free(
ggml_backend_sched_t sched) → void -
ggml_backend_sched_get_backend(
ggml_backend_sched_t sched, int i) → ggml_backend_t -
ggml_backend_sched_get_buffer_size(
ggml_backend_sched_t sched, ggml_backend_t backend) → int -
ggml_backend_sched_get_buffer_type(
ggml_backend_sched_t sched, ggml_backend_t backend) → ggml_backend_buffer_type_t -
ggml_backend_sched_get_n_backends(
ggml_backend_sched_t sched) → int -
ggml_backend_sched_get_n_copies(
ggml_backend_sched_t sched) → int -
ggml_backend_sched_get_n_splits(
ggml_backend_sched_t sched) → int -
ggml_backend_sched_get_tensor_backend(
ggml_backend_sched_t sched, Pointer< ggml_tensor> node) → ggml_backend_t -
ggml_backend_sched_graph_compute(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> graph) → ggml_status -
ggml_backend_sched_graph_compute_async(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> graph) → ggml_status -
ggml_backend_sched_new(
Pointer< ggml_backend_t> backends, Pointer<ggml_backend_buffer_type_t> bufts, int n_backends, int graph_size, bool parallel, bool op_offload) → ggml_backend_sched_t -
ggml_backend_sched_reserve(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> measure_graph) → bool -
ggml_backend_sched_reserve_size(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> measure_graph, Pointer<Size> sizes) → void -
ggml_backend_sched_reset(
ggml_backend_sched_t sched) → void -
ggml_backend_sched_set_eval_callback(
ggml_backend_sched_t sched, ggml_backend_sched_eval_callback callback, Pointer< Void> user_data) → void -
ggml_backend_sched_set_tensor_backend(
ggml_backend_sched_t sched, Pointer< ggml_tensor> node, ggml_backend_t backend) → void -
ggml_backend_sched_split_graph(
ggml_backend_sched_t sched, Pointer< ggml_cgraph> graph) → void -
ggml_backend_sched_synchronize(
ggml_backend_sched_t sched) → void -
ggml_backend_supports_buft(
ggml_backend_t backend, ggml_backend_buffer_type_t buft) → bool -
ggml_backend_supports_op(
ggml_backend_t backend, Pointer< ggml_tensor> op) → bool -
ggml_backend_synchronize(
ggml_backend_t backend) → void -
ggml_backend_tensor_alloc(
ggml_backend_buffer_t buffer, Pointer< ggml_tensor> tensor, Pointer<Void> addr) → ggml_status -
ggml_backend_tensor_copy(
Pointer< ggml_tensor> src, Pointer<ggml_tensor> dst) → void -
ggml_backend_tensor_copy_async(
ggml_backend_t backend_src, ggml_backend_t backend_dst, Pointer< ggml_tensor> src, Pointer<ggml_tensor> dst) → void -
ggml_backend_tensor_get(
Pointer< ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void -
ggml_backend_tensor_get_async(
ggml_backend_t backend, Pointer< ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void -
ggml_backend_tensor_memset(
Pointer< ggml_tensor> tensor, int value, int offset, int size) → void -
ggml_backend_tensor_set(
Pointer< ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void -
ggml_backend_tensor_set_async(
ggml_backend_t backend, Pointer< ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void -
ggml_backend_unload(
ggml_backend_reg_t reg) → void -
ggml_backend_view_init(
Pointer< ggml_tensor> tensor) → ggml_status -
ggml_bf16_to_fp32(
ggml_bf16_t arg0) → double -
ggml_bf16_to_fp32_row(
Pointer< ggml_bf16_t> arg0, Pointer<Float> arg1, int arg2) → void -
ggml_blck_size(
ggml_type type) → int -
ggml_build_backward_expand(
Pointer< ggml_context> ctx, Pointer<ggml_cgraph> cgraph, Pointer<Pointer< grad_accs) → voidggml_tensor> > -
ggml_build_forward_expand(
Pointer< ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void -
ggml_build_forward_select(
Pointer< ggml_cgraph> cgraph, Pointer<Pointer< tensors, int n_tensors, int idx) → Pointer<ggml_tensor> >ggml_tensor> -
ggml_can_repeat(
Pointer< ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool -
ggml_cast(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_type type) → Pointer<ggml_tensor> -
ggml_ceil(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_ceil_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_clamp(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double min, double max) → Pointer<ggml_tensor> -
ggml_commit(
) → Pointer< Char> -
ggml_concat(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int dim) → Pointer<ggml_tensor> -
ggml_cont(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_cont_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) → Pointer<ggml_tensor> -
ggml_cont_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) → Pointer<ggml_tensor> -
ggml_cont_3d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) → Pointer<ggml_tensor> -
ggml_cont_4d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor> -
ggml_conv_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor> -
ggml_conv_1d_dw(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor> -
ggml_conv_1d_dw_ph(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int d0) → Pointer<ggml_tensor> -
ggml_conv_1d_ph(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s, int d) → Pointer<ggml_tensor> -
ggml_conv_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor> -
ggml_conv_2d_direct(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor> -
ggml_conv_2d_dw(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor> -
ggml_conv_2d_dw_direct(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride0, int stride1, int pad0, int pad1, int dilation0, int dilation1) → Pointer<ggml_tensor> -
ggml_conv_2d_s1_ph(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_conv_2d_sk_p0(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_conv_3d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2) → Pointer<ggml_tensor> -
ggml_conv_3d_direct(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, int n_channels, int n_batch, int n_channels_out) → Pointer<ggml_tensor> -
ggml_conv_transpose_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor> -
ggml_conv_transpose_2d_p0(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride) → Pointer<ggml_tensor> -
ggml_cos(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_cos_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_count_equal(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_cpy(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_cross_entropy_loss(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_cross_entropy_loss_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor> -
ggml_cumsum(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_custom_4d(
Pointer< ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3, Pointer<Pointer< args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<ggml_tensor> >Void> userdata) → Pointer<ggml_tensor> -
ggml_custom_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<Pointer< args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<ggml_tensor> >Void> userdata) → Pointer<ggml_tensor> -
ggml_cycles(
) → int -
ggml_cycles_per_ms(
) → int -
ggml_diag(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_diag_mask_inf(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor> -
ggml_diag_mask_inf_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor> -
ggml_diag_mask_zero(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor> -
ggml_diag_mask_zero_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor> -
ggml_div(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_div_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_dup(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_dup_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_dup_tensor(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> src) → Pointer<ggml_tensor> -
ggml_element_size(
Pointer< ggml_tensor> tensor) → int -
ggml_elu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_elu_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_exp(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_exp_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_expm1(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_expm1_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_fill(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double c) → Pointer<ggml_tensor> -
ggml_fill_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double c) → Pointer<ggml_tensor> -
ggml_flash_attn_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> d, bool masked) → Pointer<ggml_tensor> -
ggml_flash_attn_ext(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> mask, double scale, double max_bias, double logit_softcap) → Pointer<ggml_tensor> -
ggml_flash_attn_ext_add_sinks(
Pointer< ggml_tensor> a, Pointer<ggml_tensor> sinks) → void -
ggml_flash_attn_ext_get_prec(
Pointer< ggml_tensor> a) → ggml_prec -
ggml_flash_attn_ext_set_prec(
Pointer< ggml_tensor> a, ggml_prec prec) → void -
ggml_floor(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_floor_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_fopen(
Pointer< Char> fname, Pointer<Char> mode) → Pointer<FILE> -
ggml_format_name(
Pointer< ggml_tensor> tensor, Pointer<Char> fmt) → Pointer<ggml_tensor> -
ggml_fp16_to_fp32(
int arg0) → double -
ggml_fp16_to_fp32_row(
Pointer< ggml_fp16_t> arg0, Pointer<Float> arg1, int arg2) → void -
ggml_fp32_to_bf16(
double arg0) → ggml_bf16_t -
ggml_fp32_to_bf16_row(
Pointer< Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void -
ggml_fp32_to_bf16_row_ref(
Pointer< Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void -
ggml_fp32_to_fp16(
double arg0) → int -
ggml_fp32_to_fp16_row(
Pointer< Float> arg0, Pointer<ggml_fp16_t> arg1, int arg2) → void -
ggml_free(
Pointer< ggml_context> ctx) → void -
ggml_ftype_to_ggml_type(
ggml_ftype ftype) → ggml_type -
ggml_gated_delta_net(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> g, Pointer<ggml_tensor> beta, Pointer<ggml_tensor> state) → Pointer<ggml_tensor> -
ggml_gated_linear_attn(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> q, Pointer<ggml_tensor> g, Pointer<ggml_tensor> state, double scale) → Pointer<ggml_tensor> -
ggml_geglu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_geglu_erf(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_geglu_erf_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_geglu_erf_swapped(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_geglu_quick(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_geglu_quick_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_geglu_quick_swapped(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_geglu_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_geglu_swapped(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu_erf(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu_erf_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu_quick(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_gelu_quick_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_get_data(
Pointer< ggml_tensor> tensor) → Pointer<Void> -
ggml_get_data_f32(
Pointer< ggml_tensor> tensor) → Pointer<Float> -
ggml_get_first_tensor(
Pointer< ggml_context> ctx) → Pointer<ggml_tensor> -
ggml_get_glu_op(
Pointer< ggml_tensor> tensor) → ggml_glu_op -
ggml_get_max_tensor_size(
Pointer< ggml_context> ctx) → int -
ggml_get_mem_buffer(
Pointer< ggml_context> ctx) → Pointer<Void> -
ggml_get_mem_size(
Pointer< ggml_context> ctx) → int -
ggml_get_name(
Pointer< ggml_tensor> tensor) → Pointer<Char> -
ggml_get_next_tensor(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> tensor) → Pointer<ggml_tensor> -
ggml_get_no_alloc(
Pointer< ggml_context> ctx) → bool -
ggml_get_rel_pos(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int qh, int kh) → Pointer<ggml_tensor> -
ggml_get_rows(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_get_rows_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor> -
ggml_get_tensor(
Pointer< ggml_context> ctx, Pointer<Char> name) → Pointer<ggml_tensor> -
ggml_get_type_traits(
ggml_type type) → Pointer< ggml_type_traits> -
ggml_get_unary_op(
Pointer< ggml_tensor> tensor) → ggml_unary_op -
ggml_glu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_glu_op op, bool swapped) → Pointer<ggml_tensor> -
ggml_glu_op_name(
ggml_glu_op op) → Pointer< Char> -
ggml_glu_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_glu_op op) → Pointer<ggml_tensor> -
ggml_graph_add_node(
Pointer< ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void -
ggml_graph_clear(
Pointer< ggml_cgraph> cgraph) → void -
ggml_graph_cpy(
Pointer< ggml_cgraph> src, Pointer<ggml_cgraph> dst) → void -
ggml_graph_dump_dot(
Pointer< ggml_cgraph> gb, Pointer<ggml_cgraph> cgraph, Pointer<Char> filename) → void -
ggml_graph_dup(
Pointer< ggml_context> ctx, Pointer<ggml_cgraph> cgraph, bool force_grads) → Pointer<ggml_cgraph> -
ggml_graph_get_grad(
Pointer< ggml_cgraph> cgraph, Pointer<ggml_tensor> node) → Pointer<ggml_tensor> -
ggml_graph_get_grad_acc(
Pointer< ggml_cgraph> cgraph, Pointer<ggml_tensor> node) → Pointer<ggml_tensor> -
ggml_graph_get_tensor(
Pointer< ggml_cgraph> cgraph, Pointer<Char> name) → Pointer<ggml_tensor> -
ggml_graph_n_nodes(
Pointer< ggml_cgraph> cgraph) → int -
ggml_graph_node(
Pointer< ggml_cgraph> cgraph, int i) → Pointer<ggml_tensor> -
ggml_graph_nodes(
Pointer< ggml_cgraph> cgraph) → Pointer<Pointer< ggml_tensor> > -
ggml_graph_overhead(
) → int -
ggml_graph_overhead_custom(
int size, bool grads) → int -
ggml_graph_print(
Pointer< ggml_cgraph> cgraph) → void -
ggml_graph_reset(
Pointer< ggml_cgraph> cgraph) → void -
ggml_graph_size(
Pointer< ggml_cgraph> cgraph) → int -
ggml_group_norm(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) → Pointer<ggml_tensor> -
ggml_group_norm_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) → Pointer<ggml_tensor> -
ggml_guid_matches(
ggml_guid_t guid_a, ggml_guid_t guid_b) → bool -
ggml_hardsigmoid(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_hardswish(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_im2col(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D, ggml_type dst_type) → Pointer<ggml_tensor> -
ggml_im2col_3d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, ggml_type dst_type) → Pointer<ggml_tensor> -
ggml_im2col_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<Int64> ne, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D) → Pointer<ggml_tensor> -
ggml_init(
ggml_init_params params) → Pointer< ggml_context> -
ggml_interpolate(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int mode) → Pointer<ggml_tensor> -
ggml_is_3d(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous_0(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous_1(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous_2(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous_channels(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguous_rows(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_contiguously_allocated(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_empty(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_matrix(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_permuted(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_quantized(
ggml_type type) → bool -
ggml_is_scalar(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_transposed(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_vector(
Pointer< ggml_tensor> tensor) → bool -
ggml_is_view(
Pointer< ggml_tensor> tensor) → bool -
ggml_l2_norm(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_l2_norm_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_leaky_relu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double negative_slope, bool inplace) → Pointer<ggml_tensor> -
ggml_log(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_log_get(
Pointer< ggml_log_callback> log_callback, Pointer<Pointer< user_data) → voidVoid> > -
ggml_log_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_log_set(
ggml_log_callback log_callback, Pointer< Void> user_data) → void -
ggml_map_custom1(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_map_custom1_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_map_custom2(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_map_custom2_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_map_custom3(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_map_custom3_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor> -
ggml_mean(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_mul(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_mul_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_mul_mat(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_mul_mat_id(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> as, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor> -
ggml_mul_mat_set_prec(
Pointer< ggml_tensor> a, ggml_prec prec) → void -
ggml_n_dims(
Pointer< ggml_tensor> tensor) → int -
ggml_nbytes(
Pointer< ggml_tensor> tensor) → int -
ggml_nbytes_pad(
Pointer< ggml_tensor> tensor) → int -
ggml_neg(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_neg_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_nelements(
Pointer< ggml_tensor> tensor) → int -
ggml_new_buffer(
Pointer< ggml_context> ctx, int nbytes) → Pointer<Void> -
ggml_new_graph(
Pointer< ggml_context> ctx) → Pointer<ggml_cgraph> -
ggml_new_graph_custom(
Pointer< ggml_context> ctx, int size, bool grads) → Pointer<ggml_cgraph> -
ggml_new_tensor(
Pointer< ggml_context> ctx, ggml_type type, int n_dims, Pointer<Int64> ne) → Pointer<ggml_tensor> -
ggml_new_tensor_1d(
Pointer< ggml_context> ctx, ggml_type type, int ne0) → Pointer<ggml_tensor> -
ggml_new_tensor_2d(
Pointer< ggml_context> ctx, ggml_type type, int ne0, int ne1) → Pointer<ggml_tensor> -
ggml_new_tensor_3d(
Pointer< ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2) → Pointer<ggml_tensor> -
ggml_new_tensor_4d(
Pointer< ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor> -
ggml_norm(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_norm_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_nrows(
Pointer< ggml_tensor> tensor) → int -
ggml_op_desc(
Pointer< ggml_tensor> t) → Pointer<Char> -
ggml_op_name(
ggml_op op) → Pointer< Char> -
ggml_op_symbol(
ggml_op op) → Pointer< Char> -
ggml_opt_step_adamw(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> m, Pointer<ggml_tensor> v, Pointer<ggml_tensor> adamw_params) → Pointer<ggml_tensor> -
ggml_opt_step_sgd(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> sgd_params) → Pointer<ggml_tensor> -
ggml_out_prod(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_pad(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) → Pointer<ggml_tensor> -
ggml_pad_circular(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) → Pointer<ggml_tensor> -
ggml_pad_ext(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) → Pointer<ggml_tensor> -
ggml_pad_ext_circular(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) → Pointer<ggml_tensor> -
ggml_pad_reflect_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1) → Pointer<ggml_tensor> -
ggml_permute(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int axis0, int axis1, int axis2, int axis3) → Pointer<ggml_tensor> -
ggml_pool_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int s0, int p0) → Pointer<ggml_tensor> -
ggml_pool_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) → Pointer<ggml_tensor> -
ggml_pool_2d_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> af, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) → Pointer<ggml_tensor> -
ggml_print_object(
Pointer< ggml_object> obj) → void -
ggml_print_objects(
Pointer< ggml_context> ctx) → void -
ggml_quantize_chunk(
ggml_type type, Pointer< Float> src, Pointer<Void> dst, int start, int nrows, int n_per_row, Pointer<Float> imatrix) → int -
ggml_quantize_free(
) → void -
ggml_quantize_init(
ggml_type type) → void -
ggml_quantize_requires_imatrix(
ggml_type type) → bool -
ggml_reglu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_reglu_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_reglu_swapped(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_relu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_relu_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_repeat(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_repeat_4d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor> -
ggml_repeat_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_reset(
Pointer< ggml_context> ctx) → void -
ggml_reshape(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_reshape_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) → Pointer<ggml_tensor> -
ggml_reshape_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) → Pointer<ggml_tensor> -
ggml_reshape_3d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) → Pointer<ggml_tensor> -
ggml_reshape_4d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor> -
ggml_rms_norm(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_rms_norm_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double eps) → Pointer<ggml_tensor> -
ggml_rms_norm_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor> -
ggml_roll(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int shift0, int shift1, int shift2, int shift3) → Pointer<ggml_tensor> -
ggml_rope(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) → Pointer<ggml_tensor> -
ggml_rope_custom(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_custom_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_ext(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_ext_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_ext_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) → Pointer<ggml_tensor> -
ggml_rope_multi(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_multi_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_multi_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor> -
ggml_rope_yarn_corr_dims(
int n_dims, int n_ctx_orig, double freq_base, double beta_fast, double beta_slow, Pointer< Float> dims) → void -
ggml_round(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_round_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_row_size(
ggml_type type, int ne) → int -
ggml_rwkv_wkv6(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> r, Pointer<ggml_tensor> tf, Pointer<ggml_tensor> td, Pointer<ggml_tensor> state) → Pointer<ggml_tensor> -
ggml_rwkv_wkv7(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> r, Pointer<ggml_tensor> w, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> state) → Pointer<ggml_tensor> -
ggml_scale(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double s) → Pointer<ggml_tensor> -
ggml_scale_bias(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) → Pointer<ggml_tensor> -
ggml_scale_bias_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) → Pointer<ggml_tensor> -
ggml_scale_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double s) → Pointer<ggml_tensor> -
ggml_set(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor> -
ggml_set_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) → Pointer<ggml_tensor> -
ggml_set_1d_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) → Pointer<ggml_tensor> -
ggml_set_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) → Pointer<ggml_tensor> -
ggml_set_2d_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) → Pointer<ggml_tensor> -
ggml_set_abort_callback(
ggml_abort_callback_t callback) → ggml_abort_callback_t -
ggml_set_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor> -
ggml_set_input(
Pointer< ggml_tensor> tensor) → void -
ggml_set_loss(
Pointer< ggml_tensor> tensor) → void -
ggml_set_name(
Pointer< ggml_tensor> tensor, Pointer<Char> name) → Pointer<ggml_tensor> -
ggml_set_no_alloc(
Pointer< ggml_context> ctx, bool no_alloc) → void -
ggml_set_output(
Pointer< ggml_tensor> tensor) → void -
ggml_set_param(
Pointer< ggml_tensor> tensor) → void -
ggml_set_rows(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor> -
ggml_set_zero(
Pointer< ggml_tensor> tensor) → Pointer<ggml_tensor> -
ggml_sgn(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sgn_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sigmoid(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sigmoid_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_silu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_silu_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_silu_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sin(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sin_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_soft_max(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_soft_max_add_sinks(
Pointer< ggml_tensor> a, Pointer<ggml_tensor> sinks) → void -
ggml_soft_max_ext(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) → Pointer<ggml_tensor> -
ggml_soft_max_ext_back(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) → Pointer<ggml_tensor> -
ggml_soft_max_ext_back_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) → Pointer<ggml_tensor> -
ggml_soft_max_ext_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) → Pointer<ggml_tensor> -
ggml_soft_max_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_softplus(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_softplus_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_solve_tri(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, bool left, bool lower, bool uni) → Pointer<ggml_tensor> -
ggml_sqr(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sqr_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sqrt(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sqrt_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_ssm_conv(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> sx, Pointer<ggml_tensor> c) → Pointer<ggml_tensor> -
ggml_ssm_scan(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> s, Pointer<ggml_tensor> x, Pointer<ggml_tensor> dt, Pointer<ggml_tensor> A, Pointer<ggml_tensor> B, Pointer<ggml_tensor> C, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor> -
ggml_status_to_string(
ggml_status status) → Pointer< Char> -
ggml_step(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_step_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sub(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_sub_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_sum(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_sum_rows(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_swiglu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_swiglu_oai(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double alpha, double limit) → Pointer<ggml_tensor> -
ggml_swiglu_split(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor> -
ggml_swiglu_swapped(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_tanh(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_tanh_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_tensor_overhead(
) → int -
ggml_threadpool_params_default(
int n_threads) → ggml_threadpool_params -
ggml_threadpool_params_init(
Pointer< ggml_threadpool_params> p, int n_threads) → void -
ggml_threadpool_params_match(
Pointer< ggml_threadpool_params> p0, Pointer<ggml_threadpool_params> p1) → bool -
ggml_time_init(
) → void -
ggml_time_ms(
) → int -
ggml_time_us(
) → int -
ggml_timestep_embedding(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> timesteps, int dim, int max_period) → Pointer<ggml_tensor> -
ggml_top_k(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int k) → Pointer<ggml_tensor> -
ggml_transpose(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_tri(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_tri_type type) → Pointer<ggml_tensor> -
ggml_trunc(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> - Truncates the fractional part of each element in the tensor (towards zero). For example: trunc(3.7) = 3.0, trunc(-2.9) = -2.0 Similar to std::trunc in C/C++.
-
ggml_trunc_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor> -
ggml_type_name(
ggml_type type) → Pointer< Char> -
ggml_type_size(
ggml_type type) → int -
ggml_type_sizef(
ggml_type type) → double -
ggml_unary(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) → Pointer<ggml_tensor> -
ggml_unary_inplace(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) → Pointer<ggml_tensor> -
ggml_unary_op_name(
ggml_unary_op op) → Pointer< Char> -
ggml_unravel_index(
Pointer< ggml_tensor> tensor, int i, Pointer<Int64> i0, Pointer<Int64> i1, Pointer<Int64> i2, Pointer<Int64> i3) → void -
ggml_upscale(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int scale_factor, ggml_scale_mode mode) → Pointer<ggml_tensor> -
ggml_upscale_ext(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, ggml_scale_mode mode) → Pointer<ggml_tensor> -
ggml_used_mem(
Pointer< ggml_context> ctx) → int -
ggml_validate_row_data(
ggml_type type, Pointer< Void> data, int nbytes) → bool -
ggml_version(
) → Pointer< Char> -
ggml_view_1d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int offset) → Pointer<ggml_tensor> -
ggml_view_2d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int nb1, int offset) → Pointer<ggml_tensor> -
ggml_view_3d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int nb1, int nb2, int offset) → Pointer<ggml_tensor> -
ggml_view_4d(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor> -
ggml_view_tensor(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> src) → Pointer<ggml_tensor> -
ggml_win_part(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int w) → Pointer<ggml_tensor> -
ggml_win_unpart(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, int w0, int h0, int w) → Pointer<ggml_tensor> -
ggml_xielu(
Pointer< ggml_context> ctx, Pointer<ggml_tensor> a, double alpha_n, double alpha_p, double beta, double eps) → Pointer<ggml_tensor> -
llama_adapter_get_alora_invocation_tokens(
Pointer< llama_adapter_lora> adapter) → Pointer<llama_token> -
llama_adapter_get_alora_n_invocation_tokens(
Pointer< llama_adapter_lora> adapter) → int -
llama_adapter_lora_free(
Pointer< llama_adapter_lora> adapter) → void -
llama_adapter_lora_init(
Pointer< llama_model> model, Pointer<Char> path_lora) → Pointer<llama_adapter_lora> -
llama_adapter_meta_count(
Pointer< llama_adapter_lora> adapter) → int -
llama_adapter_meta_key_by_index(
Pointer< llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) → int -
llama_adapter_meta_val_str(
Pointer< llama_adapter_lora> adapter, Pointer<Char> key, Pointer<Char> buf, int buf_size) → int -
llama_adapter_meta_val_str_by_index(
Pointer< llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) → int -
llama_add_bos_token(
Pointer< llama_vocab> vocab) → bool -
llama_add_eos_token(
Pointer< llama_vocab> vocab) → bool -
llama_attach_threadpool(
Pointer< llama_context> ctx, ggml_threadpool_t threadpool, ggml_threadpool_t threadpool_batch) → void -
llama_backend_free(
) → void -
llama_backend_init(
) → void -
llama_batch_free(
llama_batch batch) → void -
llama_batch_get_one(
Pointer< llama_token> tokens, int n_tokens) → llama_batch -
llama_batch_init(
int n_tokens, int embd, int n_seq_max) → llama_batch -
llama_chat_apply_template(
Pointer< Char> tmpl, Pointer<llama_chat_message> chat, int n_msg, bool add_ass, Pointer<Char> buf, int length) → int - Apply chat template. Inspired by hf apply_chat_template() on python.
-
llama_chat_builtin_templates(
Pointer< Pointer< output, int len) → intChar> > -
llama_context_default_params(
) → llama_context_params -
llama_copy_state_data(
Pointer< llama_context> ctx, Pointer<Uint8> dst) → int -
llama_dart_set_log_level(
int level) → void -
llama_decode(
Pointer< llama_context> ctx, llama_batch batch) → int -
llama_detach_threadpool(
Pointer< llama_context> ctx) → void -
llama_detokenize(
Pointer< llama_vocab> vocab, Pointer<llama_token> tokens, int n_tokens, Pointer<Char> text, int text_len_max, bool remove_special, bool unparse_special) → int - @details Convert the provided tokens into text (inverse of llama_tokenize()). @param text The char pointer must be large enough to hold the resulting text. @return Returns the number of chars/bytes on success, no more than text_len_max. @return Returns a negative number on failure - the number of chars/bytes that would have been returned. @param remove_special Allow to remove BOS and EOS tokens if model is configured to do so. @param unparse_special If true, special tokens are rendered in the output.
-
llama_encode(
Pointer< llama_context> ctx, llama_batch batch) → int -
llama_flash_attn_type_name(
llama_flash_attn_type flash_attn_type) → Pointer< Char> -
llama_free(
Pointer< llama_context> ctx) → void -
llama_free_model(
Pointer< llama_model> model) → void -
llama_get_embeddings(
Pointer< llama_context> ctx) → Pointer<Float> -
llama_get_embeddings_ith(
Pointer< llama_context> ctx, int i) → Pointer<Float> -
llama_get_embeddings_seq(
Pointer< llama_context> ctx, int seq_id) → Pointer<Float> -
llama_get_logits(
Pointer< llama_context> ctx) → Pointer<Float> -
llama_get_logits_ith(
Pointer< llama_context> ctx, int i) → Pointer<Float> -
llama_get_memory(
Pointer< llama_context> ctx) → llama_memory_t -
llama_get_model(
Pointer< llama_context> ctx) → Pointer<llama_model> -
llama_get_sampled_candidates_count_ith(
Pointer< llama_context> ctx, int i) → int -
llama_get_sampled_candidates_ith(
Pointer< llama_context> ctx, int i) → Pointer<llama_token> -
llama_get_sampled_logits_count_ith(
Pointer< llama_context> ctx, int i) → int -
llama_get_sampled_logits_ith(
Pointer< llama_context> ctx, int i) → Pointer<Float> -
llama_get_sampled_probs_count_ith(
Pointer< llama_context> ctx, int i) → int -
llama_get_sampled_probs_ith(
Pointer< llama_context> ctx, int i) → Pointer<Float> -
llama_get_sampled_token_ith(
Pointer< llama_context> ctx, int i) → int -
llama_get_state_size(
Pointer< llama_context> ctx) → int -
llama_init_from_model(
Pointer< llama_model> model, llama_context_params params) → Pointer<llama_context> -
llama_load_model_from_file(
Pointer< Char> path_model, llama_model_params params) → Pointer<llama_model> -
llama_load_session_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → bool -
llama_log_get(
Pointer< ggml_log_callback> log_callback, Pointer<Pointer< user_data) → voidVoid> > -
llama_log_set(
ggml_log_callback log_callback, Pointer< Void> user_data) → void -
llama_max_devices(
) → int -
llama_max_parallel_sequences(
) → int -
llama_max_tensor_buft_overrides(
) → int -
llama_memory_breakdown_print(
Pointer< llama_context> ctx) → void -
llama_memory_can_shift(
llama_memory_t mem) → bool -
llama_memory_clear(
llama_memory_t mem, bool data) → void -
llama_memory_seq_add(
llama_memory_t mem, int seq_id, int p0, int p1, int delta) → void -
llama_memory_seq_cp(
llama_memory_t mem, int seq_id_src, int seq_id_dst, int p0, int p1) → void -
llama_memory_seq_div(
llama_memory_t mem, int seq_id, int p0, int p1, int d) → void -
llama_memory_seq_keep(
llama_memory_t mem, int seq_id) → void -
llama_memory_seq_pos_max(
llama_memory_t mem, int seq_id) → int -
llama_memory_seq_pos_min(
llama_memory_t mem, int seq_id) → int -
llama_memory_seq_rm(
llama_memory_t mem, int seq_id, int p0, int p1) → bool -
llama_model_chat_template(
Pointer< llama_model> model, Pointer<Char> name) → Pointer<Char> -
llama_model_cls_label(
Pointer< llama_model> model, int i) → Pointer<Char> -
llama_model_decoder_start_token(
Pointer< llama_model> model) → int -
llama_model_default_params(
) → llama_model_params -
llama_model_desc(
Pointer< llama_model> model, Pointer<Char> buf, int buf_size) → int -
llama_model_free(
Pointer< llama_model> model) → void -
llama_model_get_vocab(
Pointer< llama_model> model) → Pointer<llama_vocab> -
llama_model_has_decoder(
Pointer< llama_model> model) → bool -
llama_model_has_encoder(
Pointer< llama_model> model) → bool -
llama_model_init_from_user(
Pointer< gguf_context> metadata, llama_model_set_tensor_data_t set_tensor_data, Pointer<Void> set_tensor_data_ud, llama_model_params params) → Pointer<llama_model> -
llama_model_is_diffusion(
Pointer< llama_model> model) → bool -
llama_model_is_hybrid(
Pointer< llama_model> model) → bool -
llama_model_is_recurrent(
Pointer< llama_model> model) → bool -
llama_model_load_from_file(
Pointer< Char> path_model, llama_model_params params) → Pointer<llama_model> -
llama_model_load_from_file_ptr(
Pointer< FILE> file, llama_model_params params) → Pointer<llama_model> -
llama_model_load_from_splits(
Pointer< Pointer< paths, int n_paths, llama_model_params params) → Pointer<Char> >llama_model> -
llama_model_meta_count(
Pointer< llama_model> model) → int -
llama_model_meta_key_by_index(
Pointer< llama_model> model, int i, Pointer<Char> buf, int buf_size) → int -
llama_model_meta_key_str(
llama_model_meta_key key) → Pointer< Char> -
llama_model_meta_val_str(
Pointer< llama_model> model, Pointer<Char> key, Pointer<Char> buf, int buf_size) → int -
llama_model_meta_val_str_by_index(
Pointer< llama_model> model, int i, Pointer<Char> buf, int buf_size) → int -
llama_model_n_cls_out(
Pointer< llama_model> model) → int -
llama_model_n_ctx_train(
Pointer< llama_model> model) → int -
llama_model_n_embd(
Pointer< llama_model> model) → int -
llama_model_n_embd_inp(
Pointer< llama_model> model) → int -
llama_model_n_embd_out(
Pointer< llama_model> model) → int -
llama_model_n_head(
Pointer< llama_model> model) → int -
llama_model_n_head_kv(
Pointer< llama_model> model) → int -
llama_model_n_layer(
Pointer< llama_model> model) → int -
llama_model_n_params(
Pointer< llama_model> model) → int -
llama_model_n_swa(
Pointer< llama_model> model) → int -
llama_model_quantize(
Pointer< Char> fname_inp, Pointer<Char> fname_out, Pointer<llama_model_quantize_params> params) → int -
llama_model_quantize_default_params(
) → llama_model_quantize_params -
llama_model_rope_freq_scale_train(
Pointer< llama_model> model) → double -
llama_model_rope_type(
Pointer< llama_model> model) → llama_rope_type -
llama_model_save_to_file(
Pointer< llama_model> model, Pointer<Char> path_model) → void -
llama_model_size(
Pointer< llama_model> model) → int -
llama_n_batch(
Pointer< llama_context> ctx) → int -
llama_n_ctx(
Pointer< llama_context> ctx) → int -
llama_n_ctx_seq(
Pointer< llama_context> ctx) → int -
llama_n_ctx_train(
Pointer< llama_model> model) → int -
llama_n_embd(
Pointer< llama_model> model) → int -
llama_n_head(
Pointer< llama_model> model) → int -
llama_n_layer(
Pointer< llama_model> model) → int -
llama_n_seq_max(
Pointer< llama_context> ctx) → int -
llama_n_threads(
Pointer< llama_context> ctx) → int -
llama_n_threads_batch(
Pointer< llama_context> ctx) → int -
llama_n_ubatch(
Pointer< llama_context> ctx) → int -
llama_n_vocab(
Pointer< llama_vocab> vocab) → int -
llama_new_context_with_model(
Pointer< llama_model> model, llama_context_params params) → Pointer<llama_context> -
llama_numa_init(
ggml_numa_strategy numa) → void -
llama_opt_epoch(
Pointer< llama_context> lctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result_train, ggml_opt_result_t result_eval, int idata_split, ggml_opt_epoch_callback callback_train, ggml_opt_epoch_callback callback_eval) → void -
llama_opt_init(
Pointer< llama_context> lctx, Pointer<llama_model> model, llama_opt_params lopt_params) → void -
llama_opt_param_filter_all(
Pointer< ggml_tensor> tensor, Pointer<Void> userdata) → bool -
llama_params_fit(
Pointer< Char> path_model, Pointer<llama_model_params> mparams, Pointer<llama_context_params> cparams, Pointer<Float> tensor_split, Pointer<llama_model_tensor_buft_override> tensor_buft_overrides, Pointer<Size> margins, int n_ctx_min, ggml_log_level log_level) → llama_params_fit_status -
llama_perf_context(
Pointer< llama_context> ctx) → llama_perf_context_data -
llama_perf_context_print(
Pointer< llama_context> ctx) → void -
llama_perf_context_reset(
Pointer< llama_context> ctx) → void -
llama_perf_sampler(
Pointer< llama_sampler> chain) → llama_perf_sampler_data -
llama_perf_sampler_print(
Pointer< llama_sampler> chain) → void -
llama_perf_sampler_reset(
Pointer< llama_sampler> chain) → void -
llama_pooling_type$1(
Pointer< llama_context> ctx) → llama_pooling_type -
llama_print_system_info(
) → Pointer< Char> -
llama_sampler_accept(
Pointer< llama_sampler> smpl, int token) → void -
llama_sampler_apply(
Pointer< llama_sampler> smpl, Pointer<llama_token_data_array> cur_p) → void -
llama_sampler_chain_add(
Pointer< llama_sampler> chain, Pointer<llama_sampler> smpl) → void -
llama_sampler_chain_default_params(
) → llama_sampler_chain_params -
llama_sampler_chain_get(
Pointer< llama_sampler> chain, int i) → Pointer<llama_sampler> -
llama_sampler_chain_init(
llama_sampler_chain_params params) → Pointer< llama_sampler> -
llama_sampler_chain_n(
Pointer< llama_sampler> chain) → int -
llama_sampler_chain_remove(
Pointer< llama_sampler> chain, int i) → Pointer<llama_sampler> -
llama_sampler_clone(
Pointer< llama_sampler> smpl) → Pointer<llama_sampler> -
llama_sampler_free(
Pointer< llama_sampler> smpl) → void -
llama_sampler_get_seed(
Pointer< llama_sampler> smpl) → int -
llama_sampler_init(
Pointer< llama_sampler_i> iface, llama_sampler_context_t ctx) → Pointer<llama_sampler> -
llama_sampler_init_adaptive_p(
double target, double decay, int seed) → Pointer< llama_sampler> - adaptive-p: select tokens near a configurable target probability over time.
-
llama_sampler_init_dist(
int seed) → Pointer< llama_sampler> - seed == LLAMA_DEFAULT_SEED to use a random seed.
-
llama_sampler_init_dry(
Pointer< llama_vocab> vocab, int n_ctx_train, double dry_multiplier, double dry_base, int dry_allowed_length, int dry_penalty_last_n, Pointer<Pointer< seq_breakers, int num_breakers) → Pointer<Char> >llama_sampler> - @details DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677, porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
-
llama_sampler_init_grammar(
Pointer< llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root) → Pointer<llama_sampler> - @details Initializes a GBNF grammar, see grammars/README.md for details. @param vocab The vocabulary that this grammar will be used with. @param grammar_str The production rules for the grammar, encoded as a string. Returns an empty grammar if empty. Returns NULL if parsing of grammar_str fails. @param grammar_root The name of the start symbol for the grammar.
-
llama_sampler_init_grammar_lazy(
Pointer< llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer< trigger_words, int num_trigger_words, Pointer<Char> >llama_token> trigger_tokens, int num_trigger_tokens) → Pointer<llama_sampler> -
llama_sampler_init_grammar_lazy_patterns(
Pointer< llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer< trigger_patterns, int num_trigger_patterns, Pointer<Char> >llama_token> trigger_tokens, int num_trigger_tokens) → Pointer<llama_sampler> - @details Lazy grammar sampler, introduced in https://github.com/ggml-org/llama.cpp/pull/9639 @param trigger_patterns A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group. @param trigger_tokens A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included.
-
llama_sampler_init_greedy(
) → Pointer< llama_sampler> -
llama_sampler_init_infill(
Pointer< llama_vocab> vocab) → Pointer<llama_sampler> -
llama_sampler_init_logit_bias(
int n_vocab, int n_logit_bias, Pointer< llama_logit_bias> logit_bias) → Pointer<llama_sampler> -
llama_sampler_init_min_p(
double p, int min_keep) → Pointer< llama_sampler> - @details Minimum P sampling as described in https://github.com/ggml-org/llama.cpp/pull/3841
-
llama_sampler_init_mirostat(
int n_vocab, int seed, double tau, double eta, int m) → Pointer< llama_sampler> -
@details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
@param candidates A vector of
llama_token_datacontaining the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to updatemubased on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemuto be updated more quickly, while a smaller learning rate will result in slower updates. @param m The number of tokens considered in the estimation ofs_hat. This is an arbitrary value that is used to calculates_hat, which in turn helps to calculate the value ofk. In the paper, they usem = 100, but you can experiment with different values to see how it affects the performance of the algorithm. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal. -
llama_sampler_init_mirostat_v2(
int seed, double tau, double eta) → Pointer< llama_sampler> -
@details Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
@param candidates A vector of
llama_token_datacontaining the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to updatemubased on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemuto be updated more quickly, while a smaller learning rate will result in slower updates. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal. -
llama_sampler_init_penalties(
int penalty_last_n, double penalty_repeat, double penalty_freq, double penalty_present) → Pointer< llama_sampler> - NOTE: Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
-
llama_sampler_init_temp(
double t) → Pointer< llama_sampler> - #details Updates the logits l_i` = l_i/t. When t <= 0.0f, the maximum logit is kept at it's original value, the rest are set to -inf
-
llama_sampler_init_temp_ext(
double t, double delta, double exponent) → Pointer< llama_sampler> - @details Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
-
llama_sampler_init_top_k(
int k) → Pointer< llama_sampler> - @details Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751 Setting k <= 0 makes this a noop
-
llama_sampler_init_top_n_sigma(
double n) → Pointer< llama_sampler> - @details Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
-
llama_sampler_init_top_p(
double p, int min_keep) → Pointer< llama_sampler> - @details Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
-
llama_sampler_init_typical(
double p, int min_keep) → Pointer< llama_sampler> - @details Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
-
llama_sampler_init_xtc(
double p, double t, int min_keep, int seed) → Pointer< llama_sampler> - @details XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
-
llama_sampler_name(
Pointer< llama_sampler> smpl) → Pointer<Char> -
llama_sampler_reset(
Pointer< llama_sampler> smpl) → void -
llama_sampler_sample(
Pointer< llama_sampler> smpl, Pointer<llama_context> ctx, int idx) → int -
llama_save_session_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) → bool -
llama_set_abort_callback(
Pointer< llama_context> ctx, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data) → void -
llama_set_adapter_cvec(
Pointer< llama_context> ctx, Pointer<Float> data, int len, int n_embd, int il_start, int il_end) → int -
llama_set_adapters_lora(
Pointer< llama_context> ctx, Pointer<Pointer< adapters, int n_adapters, Pointer<llama_adapter_lora> >Float> scales) → int -
llama_set_causal_attn(
Pointer< llama_context> ctx, bool causal_attn) → void -
llama_set_embeddings(
Pointer< llama_context> ctx, bool embeddings) → void -
llama_set_n_threads(
Pointer< llama_context> ctx, int n_threads, int n_threads_batch) → void -
llama_set_sampler(
Pointer< llama_context> ctx, int seq_id, Pointer<llama_sampler> smpl) → bool -
llama_set_state_data(
Pointer< llama_context> ctx, Pointer<Uint8> src) → int -
llama_set_warmup(
Pointer< llama_context> ctx, bool warmup) → void -
llama_split_path(
Pointer< Char> split_path, int maxlen, Pointer<Char> path_prefix, int split_no, int split_count) → int - @details Build a split GGUF final path for this chunk. llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) => split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf"
-
llama_split_prefix(
Pointer< Char> split_prefix, int maxlen, Pointer<Char> split_path, int split_no, int split_count) → int - @details Extract the path prefix from the split_path if and only if the split_no and split_count match. llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) => split_prefix = "/models/ggml-model-q4_0"
-
llama_state_get_data(
Pointer< llama_context> ctx, Pointer<Uint8> dst, int size) → int -
llama_state_get_size(
Pointer< llama_context> ctx) → int -
llama_state_load_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → bool -
llama_state_save_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) → bool -
llama_state_seq_get_data(
Pointer< llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id) → int -
llama_state_seq_get_data_ext(
Pointer< llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id, int flags) → int -
llama_state_seq_get_size(
Pointer< llama_context> ctx, int seq_id) → int -
llama_state_seq_get_size_ext(
Pointer< llama_context> ctx, int seq_id, int flags) → int -
llama_state_seq_load_file(
Pointer< llama_context> ctx, Pointer<Char> filepath, int dest_seq_id, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → int -
llama_state_seq_save_file(
Pointer< llama_context> ctx, Pointer<Char> filepath, int seq_id, Pointer<llama_token> tokens, int n_token_count) → int -
llama_state_seq_set_data(
Pointer< llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id) → int -
llama_state_seq_set_data_ext(
Pointer< llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id, int flags) → int -
llama_state_set_data(
Pointer< llama_context> ctx, Pointer<Uint8> src, int size) → int -
llama_supports_gpu_offload(
) → bool -
llama_supports_mlock(
) → bool -
llama_supports_mmap(
) → bool -
llama_supports_rpc(
) → bool -
llama_synchronize(
Pointer< llama_context> ctx) → void -
llama_time_us(
) → int -
llama_token_bos(
Pointer< llama_vocab> vocab) → int -
llama_token_cls(
Pointer< llama_vocab> vocab) → int -
llama_token_eos(
Pointer< llama_vocab> vocab) → int -
llama_token_eot(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_mid(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_pad(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_pre(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_rep(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_sep(
Pointer< llama_vocab> vocab) → int -
llama_token_fim_suf(
Pointer< llama_vocab> vocab) → int -
llama_token_get_attr(
Pointer< llama_vocab> vocab, Dartllama_token token) → llama_token_attr -
llama_token_get_score(
Pointer< llama_vocab> vocab, int token) → double -
llama_token_get_text(
Pointer< llama_vocab> vocab, int token) → Pointer<Char> -
llama_token_is_control(
Pointer< llama_vocab> vocab, int token) → bool -
llama_token_is_eog(
Pointer< llama_vocab> vocab, int token) → bool -
llama_token_nl(
Pointer< llama_vocab> vocab) → int -
llama_token_pad(
Pointer< llama_vocab> vocab) → int -
llama_token_sep(
Pointer< llama_vocab> vocab) → int -
llama_token_to_piece(
Pointer< llama_vocab> vocab, int token, Pointer<Char> buf, int length, int lstrip, bool special) → int -
llama_tokenize(
Pointer< llama_vocab> vocab, Pointer<Char> text, int text_len, Pointer<llama_token> tokens, int n_tokens_max, bool add_special, bool parse_special) → int - @details Convert the provided text into tokens. @param tokens The tokens pointer must be large enough to hold the resulting tokens. @return Returns the number of tokens on success, no more than n_tokens_max @return Returns a negative number on failure - the number of tokens that would have been returned @return Returns INT32_MIN on overflow (e.g., tokenization result size exceeds int32_t limit) @param add_special Allow to add BOS and EOS tokens if model is configured to do so. @param parse_special Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
-
llama_vocab_bos(
Pointer< llama_vocab> vocab) → int -
llama_vocab_cls(
Pointer< llama_vocab> vocab) → int -
llama_vocab_eos(
Pointer< llama_vocab> vocab) → int -
llama_vocab_eot(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_mid(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_pad(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_pre(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_rep(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_sep(
Pointer< llama_vocab> vocab) → int -
llama_vocab_fim_suf(
Pointer< llama_vocab> vocab) → int -
llama_vocab_get_add_bos(
Pointer< llama_vocab> vocab) → bool -
llama_vocab_get_add_eos(
Pointer< llama_vocab> vocab) → bool -
llama_vocab_get_add_sep(
Pointer< llama_vocab> vocab) → bool -
llama_vocab_get_attr(
Pointer< llama_vocab> vocab, Dartllama_token token) → llama_token_attr -
llama_vocab_get_score(
Pointer< llama_vocab> vocab, int token) → double -
llama_vocab_get_text(
Pointer< llama_vocab> vocab, int token) → Pointer<Char> -
llama_vocab_is_control(
Pointer< llama_vocab> vocab, int token) → bool -
llama_vocab_is_eog(
Pointer< llama_vocab> vocab, int token) → bool -
llama_vocab_mask(
Pointer< llama_vocab> vocab) → int -
llama_vocab_n_tokens(
Pointer< llama_vocab> vocab) → int -
llama_vocab_nl(
Pointer< llama_vocab> vocab) → int -
llama_vocab_pad(
Pointer< llama_vocab> vocab) → int -
llama_vocab_sep(
Pointer< llama_vocab> vocab) → int -
llama_vocab_type$1(
Pointer< llama_vocab> vocab) → llama_vocab_type -
mtmd_bitmap_free(
Pointer< mtmd_bitmap> bitmap) → void -
mtmd_bitmap_get_data(
Pointer< mtmd_bitmap> bitmap) → Pointer<UnsignedChar> -
mtmd_bitmap_get_id(
Pointer< mtmd_bitmap> bitmap) → Pointer<Char> -
mtmd_bitmap_get_n_bytes(
Pointer< mtmd_bitmap> bitmap) → int -
mtmd_bitmap_get_nx(
Pointer< mtmd_bitmap> bitmap) → int -
mtmd_bitmap_get_ny(
Pointer< mtmd_bitmap> bitmap) → int -
mtmd_bitmap_init(
int nx, int ny, Pointer< UnsignedChar> data) → Pointer<mtmd_bitmap> -
mtmd_bitmap_init_from_audio(
int n_samples, Pointer< Float> data) → Pointer<mtmd_bitmap> -
mtmd_bitmap_is_audio(
Pointer< mtmd_bitmap> bitmap) → bool -
mtmd_bitmap_set_id(
Pointer< mtmd_bitmap> bitmap, Pointer<Char> id) → void -
mtmd_context_params_default(
) → mtmd_context_params -
mtmd_decode_use_mrope(
Pointer< mtmd_context> ctx) → bool -
mtmd_decode_use_non_causal(
Pointer< mtmd_context> ctx) → bool -
mtmd_default_marker(
) → Pointer< Char> -
mtmd_encode(
Pointer< mtmd_context> ctx, Pointer<mtmd_image_tokens> image_tokens) → int -
mtmd_encode_chunk(
Pointer< mtmd_context> ctx, Pointer<mtmd_input_chunk> chunk) → int -
mtmd_free(
Pointer< mtmd_context> ctx) → void -
mtmd_get_audio_sample_rate(
Pointer< mtmd_context> ctx) → int -
mtmd_get_output_embd(
Pointer< mtmd_context> ctx) → Pointer<Float> -
mtmd_helper_bitmap_init_from_buf(
Pointer< mtmd_context> ctx, Pointer<UnsignedChar> buf, int len) → Pointer<mtmd_bitmap> -
mtmd_helper_bitmap_init_from_file(
Pointer< mtmd_context> ctx, Pointer<Char> fname) → Pointer<mtmd_bitmap> -
mtmd_helper_decode_image_chunk(
Pointer< mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunk> chunk, Pointer<Float> encoded_embd, int n_past, int seq_id, int n_batch, Pointer<llama_pos> new_n_past) → int -
mtmd_helper_eval_chunk_single(
Pointer< mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunk> chunk, int n_past, int seq_id, int n_batch, bool logits_last, Pointer<llama_pos> new_n_past) → int -
mtmd_helper_eval_chunks(
Pointer< mtmd_context> ctx, Pointer<llama_context> lctx, Pointer<mtmd_input_chunks> chunks, int n_past, int seq_id, int n_batch, bool logits_last, Pointer<llama_pos> new_n_past) → int -
mtmd_helper_get_n_pos(
Pointer< mtmd_input_chunks> chunks) → int -
mtmd_helper_get_n_tokens(
Pointer< mtmd_input_chunks> chunks) → int -
mtmd_helper_log_set(
ggml_log_callback log_callback, Pointer< Void> user_data) → void -
mtmd_image_tokens_get_id(
Pointer< mtmd_image_tokens> image_tokens) → Pointer<Char> -
mtmd_image_tokens_get_n_pos(
Pointer< mtmd_image_tokens> image_tokens) → int -
mtmd_image_tokens_get_n_tokens(
Pointer< mtmd_image_tokens> image_tokens) → int -
mtmd_image_tokens_get_nx(
Pointer< mtmd_image_tokens> image_tokens) → int -
mtmd_image_tokens_get_ny(
Pointer< mtmd_image_tokens> image_tokens) → int -
mtmd_init_from_file(
Pointer< Char> mmproj_fname, Pointer<llama_model> text_model, mtmd_context_params ctx_params) → Pointer<mtmd_context> -
mtmd_input_chunk_copy(
Pointer< mtmd_input_chunk> chunk) → Pointer<mtmd_input_chunk> -
mtmd_input_chunk_free(
Pointer< mtmd_input_chunk> chunk) → void -
mtmd_input_chunk_get_id(
Pointer< mtmd_input_chunk> chunk) → Pointer<Char> -
mtmd_input_chunk_get_n_pos(
Pointer< mtmd_input_chunk> chunk) → int -
mtmd_input_chunk_get_n_tokens(
Pointer< mtmd_input_chunk> chunk) → int -
mtmd_input_chunk_get_tokens_image(
Pointer< mtmd_input_chunk> chunk) → Pointer<mtmd_image_tokens> -
mtmd_input_chunk_get_tokens_text(
Pointer< mtmd_input_chunk> chunk, Pointer<Size> n_tokens_output) → Pointer<llama_token> -
mtmd_input_chunk_get_type(
Pointer< mtmd_input_chunk> chunk) → mtmd_input_chunk_type -
mtmd_input_chunks_free(
Pointer< mtmd_input_chunks> chunks) → void -
mtmd_input_chunks_get(
Pointer< mtmd_input_chunks> chunks, int idx) → Pointer<mtmd_input_chunk> -
mtmd_input_chunks_init(
) → Pointer< mtmd_input_chunks> -
mtmd_input_chunks_size(
Pointer< mtmd_input_chunks> chunks) → int -
mtmd_log_set(
ggml_log_callback log_callback, Pointer< Void> user_data) → void -
mtmd_support_audio(
Pointer< mtmd_context> ctx) → bool -
mtmd_support_vision(
Pointer< mtmd_context> ctx) → bool -
mtmd_test_create_input_chunks(
) → Pointer< mtmd_input_chunks> - //////////////////////////////////////
-
mtmd_tokenize(
Pointer< mtmd_context> ctx, Pointer<mtmd_input_chunks> output, Pointer<mtmd_input_text> text, Pointer<Pointer< bitmaps, int n_bitmaps) → intmtmd_bitmap> >
Typedefs
- Dart__off64_t = int
- Dart__off_t = int
- Dart_IO_lock_t = void
-
Dartggml_abort_callback_tFunction
= void Function(Pointer<
Char> error_message) -
Dartggml_abort_callbackFunction
= bool Function(Pointer<
Void> data) -
Dartggml_backend_eval_callbackFunction
= bool Function(int node_index, Pointer<
ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data) -
Dartggml_backend_sched_eval_callbackFunction
= bool Function(Pointer<
ggml_tensor> t, bool ask, Pointer<Void> user_data) -
Dartggml_backend_set_abort_callback_tFunction
= void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<
Void> abort_callback_data) - Dartggml_backend_set_n_threads_tFunction = void Function(ggml_backend_t backend, int n_threads)
-
Dartggml_backend_split_buffer_type_tFunction
= ggml_backend_buffer_type_t Function(int main_device, Pointer<
Float> tensor_split) -
Dartggml_custom1_op_tFunction
= void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, int ith, int nth, Pointer<Void> userdata) -
Dartggml_custom2_op_tFunction
= void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int ith, int nth, Pointer<Void> userdata) -
Dartggml_custom3_op_tFunction
= void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int ith, int nth, Pointer<Void> userdata) -
Dartggml_custom_op_tFunction
= void Function(Pointer<
ggml_tensor> dst, int ith, int nth, Pointer<Void> userdata) - Dartggml_fp16_t = int
-
Dartggml_from_float_tFunction
= void Function(Pointer<
Float> x, Pointer<Void> y, int k) -
Dartggml_log_callbackFunction
= void Function(ggml_log_level level, Pointer<
Char> text, Pointer<Void> user_data) - Dartggml_opt_epoch_callbackFunction = void Function(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us)
-
Dartggml_to_float_tFunction
= void Function(Pointer<
Void> x, Pointer<Float> y, int k) -
Dartllama_model_set_tensor_data_tFunction
= void Function(Pointer<
ggml_tensor> tensor, Pointer<Void> userdata) -
Dartllama_opt_param_filterFunction
= bool Function(Pointer<
ggml_tensor> tensor, Pointer<Void> userdata) - Dartllama_pos = int
-
Dartllama_progress_callbackFunction
= bool Function(double progress, Pointer<
Void> user_data) - Dartllama_seq_id = int
- Dartllama_state_seq_flags = int
- Dartllama_token = int
- FILE = _IO_FILE
-
ggml_abort_callback
= Pointer<
NativeFunction< ggml_abort_callbackFunction> > -
ggml_abort_callback_t
= Pointer<
NativeFunction< ggml_abort_callback_tFunction> > -
ggml_abort_callback_tFunction
= Void Function(Pointer<
Char> error_message) -
ggml_abort_callbackFunction
= Bool Function(Pointer<
Void> data) -
ggml_backend_buffer_t
= Pointer<
ggml_backend_buffer> -
ggml_backend_buffer_type_t
= Pointer<
ggml_backend_buffer_type> -
ggml_backend_dev_get_extra_bufts_t
= Pointer<
NativeFunction< ggml_backend_dev_get_extra_bufts_tFunction> > -
ggml_backend_dev_get_extra_bufts_tFunction
= Pointer<
ggml_backend_buffer_type_t> Function(ggml_backend_dev_t device) -
ggml_backend_dev_t
= Pointer<
ggml_backend_device> -
ggml_backend_eval_callback
= Pointer<
NativeFunction< ggml_backend_eval_callbackFunction> > -
ggml_backend_eval_callbackFunction
= Bool Function(Int node_index, Pointer<
ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data) -
ggml_backend_event_t
= Pointer<
ggml_backend_event> -
ggml_backend_get_features_t
= Pointer<
NativeFunction< ggml_backend_get_features_tFunction> > -
ggml_backend_get_features_tFunction
= Pointer<
ggml_backend_feature> Function(ggml_backend_reg_t reg) -
ggml_backend_graph_plan_t
= Pointer<
Void> -
ggml_backend_reg_t
= Pointer<
ggml_backend_reg> -
ggml_backend_sched_eval_callback
= Pointer<
NativeFunction< ggml_backend_sched_eval_callbackFunction> > -
ggml_backend_sched_eval_callbackFunction
= Bool Function(Pointer<
ggml_tensor> t, Bool ask, Pointer<Void> user_data) -
ggml_backend_sched_t
= Pointer<
ggml_backend_sched> -
ggml_backend_set_abort_callback_t
= Pointer<
NativeFunction< ggml_backend_set_abort_callback_tFunction> > -
ggml_backend_set_abort_callback_tFunction
= Void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<
Void> abort_callback_data) -
ggml_backend_set_n_threads_t
= Pointer<
NativeFunction< ggml_backend_set_n_threads_tFunction> > - ggml_backend_set_n_threads_tFunction = Void Function(ggml_backend_t backend, Int n_threads)
-
ggml_backend_split_buffer_type_t
= Pointer<
NativeFunction< ggml_backend_split_buffer_type_tFunction> > -
ggml_backend_split_buffer_type_tFunction
= ggml_backend_buffer_type_t Function(Int main_device, Pointer<
Float> tensor_split) -
ggml_backend_t
= Pointer<
ggml_backend> -
ggml_custom1_op_t
= Pointer<
NativeFunction< ggml_custom1_op_tFunction> > -
ggml_custom1_op_tFunction
= Void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, Int ith, Int nth, Pointer<Void> userdata) -
ggml_custom2_op_t
= Pointer<
NativeFunction< ggml_custom2_op_tFunction> > -
ggml_custom2_op_tFunction
= Void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Int ith, Int nth, Pointer<Void> userdata) -
ggml_custom3_op_t
= Pointer<
NativeFunction< ggml_custom3_op_tFunction> > -
ggml_custom3_op_tFunction
= Void Function(Pointer<
ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, Int ith, Int nth, Pointer<Void> userdata) -
ggml_custom_op_t
= Pointer<
NativeFunction< ggml_custom_op_tFunction> > -
ggml_custom_op_tFunction
= Void Function(Pointer<
ggml_tensor> dst, Int ith, Int nth, Pointer<Void> userdata) - ggml_fp16_t = Uint16
-
ggml_from_float_t
= Pointer<
NativeFunction< ggml_from_float_tFunction> > -
ggml_from_float_tFunction
= Void Function(Pointer<
Float> x, Pointer<Void> y, Int64 k) -
ggml_guid_t
= Pointer<
Pointer< Uint8> > -
ggml_log_callback
= Pointer<
NativeFunction< ggml_log_callbackFunction> > -
ggml_log_callbackFunction
= Void Function(UnsignedInt level, Pointer<
Char> text, Pointer<Void> user_data) -
ggml_opt_context_t
= Pointer<
ggml_opt_context> -
ggml_opt_dataset_t
= Pointer<
ggml_opt_dataset> -
ggml_opt_epoch_callback
= Pointer<
NativeFunction< ggml_opt_epoch_callbackFunction> > - ggml_opt_epoch_callbackFunction = Void Function(Bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, Int64 ibatch, Int64 ibatch_max, Int64 t_start_us)
-
ggml_opt_get_optimizer_params
= Pointer<
NativeFunction< ggml_opt_get_optimizer_paramsFunction> > -
ggml_opt_get_optimizer_paramsFunction
= ggml_opt_optimizer_params Function(Pointer<
Void> userdata) -
ggml_opt_result_t
= Pointer<
ggml_opt_result> -
ggml_threadpool_t
= Pointer<
ggml_threadpool> -
ggml_to_float_t
= Pointer<
NativeFunction< ggml_to_float_tFunction> > -
ggml_to_float_tFunction
= Void Function(Pointer<
Void> x, Pointer<Float> y, Int64 k) -
llama_memory_t
= Pointer<
llama_memory_i> -
llama_model_set_tensor_data_t
= Pointer<
NativeFunction< llama_model_set_tensor_data_tFunction> > -
llama_model_set_tensor_data_tFunction
= Void Function(Pointer<
ggml_tensor> tensor, Pointer<Void> userdata) -
llama_opt_param_filter
= Pointer<
NativeFunction< llama_opt_param_filterFunction> > -
llama_opt_param_filterFunction
= Bool Function(Pointer<
ggml_tensor> tensor, Pointer<Void> userdata) - llama_pos = Int32
-
llama_progress_callback
= Pointer<
NativeFunction< llama_progress_callbackFunction> > -
llama_progress_callbackFunction
= Bool Function(Float progress, Pointer<
Void> user_data) -
llama_sampler_context_t
= Pointer<
Void> - llama_seq_id = Int32
- llama_state_seq_flags = Uint32
- llama_token = Int32
- LlamaLogHandler = void Function(LlamaLogRecord record)
- Type definition for custom log handlers.
-
ToolHandler
= Future<
Object?> Function(ToolParams params) - Signature for a tool handler function.
Exceptions / Errors
- LlamaContextException
- Exception thrown when a context operation fails.
- LlamaException
- Base class for all Llama-related exceptions.
- LlamaInferenceException
- Exception thrown during text generation or tokenization.
- LlamaModelException
- Exception thrown when a model fails to load.
- LlamaStateException
- Exception thrown when the engine is in an invalid state.
- LlamaUnsupportedException
- Exception thrown when an operation is not supported on the current platform.