lcpp_ngin library

Classes

ContextParams: A class representing the parameters for context configuration.
ggml_backend
ggml_backend_buffer
ggml_backend_buffer_type
ggml_backend_dev_caps: functionality supported by the device
ggml_backend_dev_props: all the device properties
ggml_backend_device
ggml_backend_event
ggml_backend_feature: Get a list of feature flags supported by the backend (returns a NULL-terminated array)
ggml_backend_graph_copy$1: Utils
ggml_backend_reg
ggml_backend_sched
ggml_bf16_t: google brain half-precision bfloat16
ggml_cgraph
ggml_context
ggml_cplan: the compute plan that needs to be prepared for ggml_graph_compute() since https://github.com/ggml-org/ggml/issues/287
ggml_gallocr
ggml_init_params
ggml_object
ggml_opt_context
ggml_opt_dataset
ggml_opt_optimizer_params: parameters that control which optimizer is used and how said optimizer tries to find the minimal loss
ggml_opt_params: parameters for initializing a new optimization context
ggml_opt_result
ggml_tallocr: Tensor allocator
ggml_tensor: n-dimensional tensor
ggml_threadpool
ggml_threadpool_params: threadpool params Use ggml_threadpool_params_default() or ggml_threadpool_params_init() to populate the defaults
ggml_type_traits
ggml_type_traits_cpu
gguf_context
gguf_init_params
lcpp_common_chat_msg
lcpp_common_chat_msg_content_part
lcpp_common_chat_tool
lcpp_common_chat_tool_call
lcpp_cpu_info
lcpp_data_pvalue
lcpp_gpu_info
lcpp_machine_info
lcpp_memory_info
lcpp_model_filepath
lcpp_model_info
lcpp_model_mem
lcpp_model_rt
lcpp_params: sampling parameters
lcpp_sampling_params
lcpp_system_info
LcppTextStruct
llama_adapter_lora: lora adapter
llama_batch: Input data for llama_encode/llama_decode A llama_batch object can contain input about one or many sequences The provided arrays (i.e. token, embd, pos, etc.) must have size of n_tokens
llama_chat_message: used in chat template
llama_context
llama_context_params: NOTE: changing the default values of parameters marked as EXPERIMENTAL may cause crashes or incorrect results in certain configurations https://github.com/ggml-org/llama.cpp/pull/7544
llama_logit_bias
llama_memory_i
llama_model
llama_model_kv_override
llama_model_params
llama_model_quantize_params: model quantization parameters
llama_model_tensor_buft_override
llama_opt_params
llama_perf_context_data: Performance utils
llama_perf_sampler_data
llama_sampler
llama_sampler_chain_params
llama_sampler_i: user code can implement the interface below in order to create custom llama_sampler
llama_token_data: TODO: simplify (https://github.com/ggml-org/llama.cpp/pull/9294#pullrequestreview-2286561979)
llama_token_data_array
llama_vocab: C interface
LlamaCpp: A class that implements the Llama interface and provides functionality for loading and interacting with a Llama model, context, and sampler.
LlamaCppGpuInfo
LlamaCppMachineInfo
LlamaCppModelInfo
LlamaCppParams: Represents the parameters used for sampling in the model.
LlamaCppSamplingParams
UnnamedStruct
UnnamedStruct$1
UnnamedUnion

Enums

AttentionType: Enum representing different types of attention mechanisms.
ggml_backend_buffer_usage: Backend buffer
ggml_backend_dev_type: Backend device
ggml_ftype: model file types
ggml_glu_op
ggml_log_level
ggml_numa_strategy: numa strategies
ggml_object_type
ggml_op: available tensor operations:
ggml_op_pool
ggml_opt_build_type: ====== Model / Context ======
ggml_opt_loss_type: built-in loss types, i.e. the built-in quantities minimized by the optimizer custom loss types can be defined via mean or sum which simply reduce the outputs for all datapoints to a single value
ggml_opt_optimizer_type
ggml_prec: precision
ggml_scale_flag
ggml_scale_mode
ggml_sched_priority: scheduling priorities
ggml_sort_order: sort rows
ggml_status
ggml_tensor_flag: this tensor...
ggml_tri_type
ggml_type: NOTE: always add types at the end of the enum to keep backward compatibility
ggml_unary_op
GgmlFileType
GgmlType: Enum representing different GGML (General Graphical Modeling Language) types.
gguf_type: types that can be stored as GGUF KV data
lcpp_common_sampler_type: from common.h
lcpp_cpu_endianess
lcpp_finish_reason
lcpp_mirostat_type
lcpp_model_family
lcpp_numa_strategy
lcpp_split_mode
llama_attention_type
llama_flash_attn_type
llama_ftype: model file types
llama_model_kv_override_type
llama_model_meta_key
llama_pooling_type
llama_rope_scaling_type
llama_rope_type
llama_split_mode
llama_token_attr
llama_token_type
llama_vocab_type
LlamaCppEndianness
PoolingType: Enum representing different types of pooling operations.
RopeScalingType: Enum representing different types of rope scaling.

Constants

BUFSIZ → const int
EOF → const int
false$ → const int
FILENAME_MAX → const int
FOPEN_MAX → const int
GGML_DEFAULT_GRAPH_SIZE → const int
GGML_DEFAULT_N_THREADS → const int
GGML_EXIT_ABORTED → const int
GGML_EXIT_SUCCESS → const int
GGML_FILE_MAGIC → const int
GGML_FILE_VERSION → const int
GGML_MAX_DIMS → const int
GGML_MAX_N_THREADS → const int
GGML_MAX_NAME → const int
GGML_MAX_OP_PARAMS → const int
GGML_MAX_PARAMS → const int
GGML_MAX_SRC → const int
GGML_MEM_ALIGN → const int
GGML_MROPE_SECTIONS → const int
GGML_N_TASKS_MAX → const int
GGML_QNT_VERSION → const int
GGML_QNT_VERSION_FACTOR → const int
GGML_ROPE_TYPE_IMROPE → const int
GGML_ROPE_TYPE_MROPE → const int
GGML_ROPE_TYPE_NEOX → const int
GGML_ROPE_TYPE_NORMAL → const int
GGML_ROPE_TYPE_VISION → const int
GGUF_DEFAULT_ALIGNMENT → const int
GGUF_KEY_GENERAL_ALIGNMENT → const String
GGUF_MAGIC → const String
GGUF_VERSION → const int
INT16_MAX → const int
INT16_MIN → const int
INT32_MAX → const int
INT32_MIN → const int
INT64_MAX → const int
INT64_MIN → const int
INT8_MAX → const int
INT8_MIN → const int
INT_FAST16_MAX → const int
INT_FAST16_MIN → const int
INT_FAST32_MAX → const int
INT_FAST32_MIN → const int
INT_FAST64_MAX → const int
INT_FAST64_MIN → const int
INT_FAST8_MAX → const int
INT_FAST8_MIN → const int
INT_LEAST16_MAX → const int
INT_LEAST16_MIN → const int
INT_LEAST32_MAX → const int
INT_LEAST32_MIN → const int
INT_LEAST64_MAX → const int
INT_LEAST64_MIN → const int
INT_LEAST8_MAX → const int
INT_LEAST8_MIN → const int
INTMAX_MAX → const int
INTMAX_MIN → const int
INTPTR_MAX → const int
INTPTR_MIN → const int
L_tmpnam → const int
L_tmpnam_s → const int
LLAMA_DEFAULT_SEED → const int
LLAMA_FILE_MAGIC_GGLA → const int
LLAMA_FILE_MAGIC_GGSN → const int
LLAMA_FILE_MAGIC_GGSQ → const int
LLAMA_SESSION_MAGIC → const int
LLAMA_SESSION_VERSION → const int
LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY → const int
LLAMA_STATE_SEQ_FLAGS_SWA_ONLY → const int
LLAMA_STATE_SEQ_MAGIC → const int
LLAMA_STATE_SEQ_VERSION → const int
LLAMA_TOKEN_NULL → const int
NULL → const int
PTRDIFF_MAX → const int
PTRDIFF_MIN → const int
SEEK_CUR → const int
SEEK_END → const int
SEEK_SET → const int
SIG_ATOMIC_MAX → const int
SIG_ATOMIC_MIN → const int
SIZE_MAX → const int
SYS_OPEN → const int
TMP_MAX → const int
TMP_MAX_S → const int
true$ → const int
UINT16_MAX → const int
UINT32_MAX → const int
UINT64_MAX → const int
UINT8_MAX → const int
UINT_FAST16_MAX → const int
UINT_FAST32_MAX → const int
UINT_FAST64_MAX → const int
UINT_FAST8_MAX → const int
UINT_LEAST16_MAX → const int
UINT_LEAST32_MAX → const int
UINT_LEAST64_MAX → const int
UINT_LEAST8_MAX → const int
UINTMAX_MAX → const int
UINTPTR_MAX → const int
WCHAR_MAX → const int
WCHAR_MIN → const int
WEOF → const int
WINT_MAX → const int
WINT_MIN → const int

Properties

GGML_TENSOR_SIZE → int: final

Functions

clearerr(Pointer<FILE> _Stream) → void
clearerr_s(Pointer<FILE> _Stream) → int
fclose(Pointer<FILE> _Stream) → int
fcloseall() → int
fdopen(int _FileHandle, Pointer<Char> _Format) → Pointer<FILE>
feof(Pointer<FILE> _Stream) → int
ferror(Pointer<FILE> _Stream) → int
fflush(Pointer<FILE> _Stream) → int
fgetc(Pointer<FILE> _Stream) → int
fgetchar() → int
fgetpos(Pointer<FILE> _Stream, Pointer<fpos_t> _Position) → int
fgets(Pointer<Char> _Buffer, int _MaxCount, Pointer<FILE> _Stream) → Pointer<Char>
fgetwc(Pointer<FILE> _Stream) → int
fgetws(Pointer<WChar> _Buffer, int _BufferCount, Pointer<FILE> _Stream) → Pointer<WChar>
fileno(Pointer<FILE> _Stream) → int
flushall() → int
fopen(Pointer<Char> _FileName, Pointer<Char> _Mode) → Pointer<FILE>
fopen_s(Pointer<Pointer<FILE>> _Stream, Pointer<Char> _FileName, Pointer<Char> _Mode) → int
fputc(int _Character, Pointer<FILE> _Stream) → int
fputchar(int _Ch) → int
fputs(Pointer<Char> _Buffer, Pointer<FILE> _Stream) → int
fputwc(int _Character, Pointer<FILE> _Stream) → int
fputws(Pointer<WChar> _Buffer, Pointer<FILE> _Stream) → int
fread(Pointer<Void> _Buffer, int _ElementSize, int _ElementCount, Pointer<FILE> _Stream) → int
fread_s(Pointer<Void> _Buffer, int _BufferSize, int _ElementSize, int _ElementCount, Pointer<FILE> _Stream) → int
freopen(Pointer<Char> _FileName, Pointer<Char> _Mode, Pointer<FILE> _Stream) → Pointer<FILE>
freopen_s(Pointer<Pointer<FILE>> _Stream, Pointer<Char> _FileName, Pointer<Char> _Mode, Pointer<FILE> _OldStream) → int
fseek(Pointer<FILE> _Stream, int _Offset, int _Origin) → int
fsetpos(Pointer<FILE> _Stream, Pointer<fpos_t> _Position) → int
ftell(Pointer<FILE> _Stream) → int
fwrite(Pointer<Void> _Buffer, int _ElementSize, int _ElementCount, Pointer<FILE> _Stream) → int
getc(Pointer<FILE> _Stream) → int
getchar() → int
gets_s(Pointer<Char> _Buffer, int _Size) → Pointer<Char>
getw(Pointer<FILE> _Stream) → int
getwc(Pointer<FILE> _Stream) → int
getwchar() → int
ggml_abort(Pointer<Char> file, int line, Pointer<Char> fmt) → void
ggml_abs(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_abs_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_acc(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor>: dst = a view(dst, nb1, nb2, nb3, offset) += b return dst
ggml_acc_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor>
ggml_add(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_add1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_add1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_add_cast(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_type type) → Pointer<ggml_tensor>
ggml_add_id(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor>: dsti0, i1, i2 = ai0, i1, i2 + b[i0, idsi1, i2]
ggml_add_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_add_rel_pos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) → Pointer<ggml_tensor>: used in sam
ggml_add_rel_pos_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> pw, Pointer<ggml_tensor> ph) → Pointer<ggml_tensor>
ggml_arange(Pointer<ggml_context> ctx, double start, double stop, double step) → Pointer<ggml_tensor>
ggml_are_same_shape(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool
ggml_are_same_stride(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool
ggml_argmax(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: argmax along rows
ggml_argsort(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_sort_order order) → Pointer<ggml_tensor>
ggml_argsort_top_k(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int k) → Pointer<ggml_tensor>: similar to ggml_top_k but implemented as argsort + view
ggml_backend_alloc_buffer(ggml_backend_t backend, int size) → ggml_backend_buffer_t
ggml_backend_alloc_ctx_tensors(Pointer<ggml_context> ctx, ggml_backend_t backend) → Pointer<ggml_backend_buffer>
ggml_backend_alloc_ctx_tensors_from_buft(Pointer<ggml_context> ctx, ggml_backend_buffer_type_t buft) → Pointer<ggml_backend_buffer>
ggml_backend_alloc_ctx_tensors_from_buft_size(Pointer<ggml_context> ctx, ggml_backend_buffer_type_t buft) → int: Utils Create a buffer and allocate all the tensors in a ggml_context ggml_backend_alloc_ctx_tensors_from_buft_size returns the size of the buffer that would be allocated by ggml_backend_alloc_ctx_tensors_from_buft
ggml_backend_buffer_clear(ggml_backend_buffer_t buffer, int value) → void
ggml_backend_buffer_free(ggml_backend_buffer_t buffer) → void
ggml_backend_buffer_get_alignment(ggml_backend_buffer_t buffer) → int
ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor) → int
ggml_backend_buffer_get_base(ggml_backend_buffer_t buffer) → Pointer<Void>
ggml_backend_buffer_get_max_size(ggml_backend_buffer_t buffer) → int
ggml_backend_buffer_get_size(ggml_backend_buffer_t buffer) → int
ggml_backend_buffer_get_type(ggml_backend_buffer_t buffer) → ggml_backend_buffer_type_t
ggml_backend_buffer_get_usage(ggml_backend_buffer_t buffer) → ggml_backend_buffer_usage
ggml_backend_buffer_init_tensor(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor) → ggml_status
ggml_backend_buffer_is_host(ggml_backend_buffer_t buffer) → bool
ggml_backend_buffer_name(ggml_backend_buffer_t buffer) → Pointer<Char>
ggml_backend_buffer_reset(ggml_backend_buffer_t buffer) → void
ggml_backend_buffer_set_usage(ggml_backend_buffer_t buffer, ggml_backend_buffer_usage usage) → void
ggml_backend_buft_alloc_buffer(ggml_backend_buffer_type_t buft, int size) → ggml_backend_buffer_t
ggml_backend_buft_get_alignment(ggml_backend_buffer_type_t buft) → int
ggml_backend_buft_get_alloc_size(ggml_backend_buffer_type_t buft, Pointer<ggml_tensor> tensor) → int
ggml_backend_buft_get_device(ggml_backend_buffer_type_t buft) → ggml_backend_dev_t
ggml_backend_buft_get_max_size(ggml_backend_buffer_type_t buft) → int
ggml_backend_buft_is_host(ggml_backend_buffer_type_t buft) → bool
ggml_backend_buft_name(ggml_backend_buffer_type_t buft) → Pointer<Char>: Backend buffer type
ggml_backend_compare_graph_backend(ggml_backend_t backend1, ggml_backend_t backend2, Pointer<ggml_cgraph> graph, ggml_backend_eval_callback callback, Pointer<Void> user_data, Pointer<ggml_tensor> test_node) → bool: Compare the output of two backends
ggml_backend_cpu_buffer_from_ptr(Pointer<Void> ptr, int size) → ggml_backend_buffer_t: CPU buffer types are always available
ggml_backend_cpu_buffer_type() → ggml_backend_buffer_type_t
ggml_backend_cpu_init() → ggml_backend_t: CPU backend
ggml_backend_cpu_reg() → ggml_backend_reg_t
ggml_backend_cpu_set_abort_callback(ggml_backend_t backend_cpu, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data) → void
ggml_backend_cpu_set_n_threads(ggml_backend_t backend_cpu, int n_threads) → void
ggml_backend_cpu_set_threadpool(ggml_backend_t backend_cpu, ggml_threadpool_t threadpool) → void
ggml_backend_dev_backend_reg(ggml_backend_dev_t device) → ggml_backend_reg_t
ggml_backend_dev_buffer_from_host_ptr(ggml_backend_dev_t device, Pointer<Void> ptr, int size, int max_tensor_size) → ggml_backend_buffer_t
ggml_backend_dev_buffer_type(ggml_backend_dev_t device) → ggml_backend_buffer_type_t
ggml_backend_dev_by_name(Pointer<Char> name) → ggml_backend_dev_t
ggml_backend_dev_by_type(ggml_backend_dev_type type) → ggml_backend_dev_t
ggml_backend_dev_count() → int: Device enumeration
ggml_backend_dev_description(ggml_backend_dev_t device) → Pointer<Char>
ggml_backend_dev_get(int index) → ggml_backend_dev_t
ggml_backend_dev_get_props(ggml_backend_dev_t device, Pointer<ggml_backend_dev_props> props) → void
ggml_backend_dev_host_buffer_type(ggml_backend_dev_t device) → ggml_backend_buffer_type_t
ggml_backend_dev_init(ggml_backend_dev_t device, Pointer<Char> params) → ggml_backend_t
ggml_backend_dev_memory(ggml_backend_dev_t device, Pointer<Size> free, Pointer<Size> total) → void
ggml_backend_dev_name(ggml_backend_dev_t device) → Pointer<Char>
ggml_backend_dev_offload_op(ggml_backend_dev_t device, Pointer<ggml_tensor> op) → bool
ggml_backend_dev_supports_buft(ggml_backend_dev_t device, ggml_backend_buffer_type_t buft) → bool
ggml_backend_dev_supports_op(ggml_backend_dev_t device, Pointer<ggml_tensor> op) → bool
ggml_backend_dev_type$1(ggml_backend_dev_t device) → ggml_backend_dev_type
ggml_backend_device_register(ggml_backend_dev_t device) → void
ggml_backend_event_free(ggml_backend_event_t event) → void
ggml_backend_event_new(ggml_backend_dev_t device) → ggml_backend_event_t: Events
ggml_backend_event_record(ggml_backend_event_t event, ggml_backend_t backend) → void
ggml_backend_event_synchronize(ggml_backend_event_t event) → void
ggml_backend_event_wait(ggml_backend_t backend, ggml_backend_event_t event) → void
ggml_backend_free(ggml_backend_t backend) → void
ggml_backend_get_alignment(ggml_backend_t backend) → int
ggml_backend_get_default_buffer_type(ggml_backend_t backend) → ggml_backend_buffer_type_t
ggml_backend_get_device(ggml_backend_t backend) → ggml_backend_dev_t
ggml_backend_get_max_size(ggml_backend_t backend) → int
ggml_backend_graph_compute(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) → ggml_status
ggml_backend_graph_compute_async(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) → ggml_status
ggml_backend_graph_copy(ggml_backend_t backend, Pointer<ggml_cgraph> graph) → ggml_backend_graph_copy$1: Copy a graph to a different backend
ggml_backend_graph_copy_free(ggml_backend_graph_copy$1 copy) → void
ggml_backend_graph_plan_compute(ggml_backend_t backend, ggml_backend_graph_plan_t plan) → ggml_status
ggml_backend_graph_plan_create(ggml_backend_t backend, Pointer<ggml_cgraph> cgraph) → ggml_backend_graph_plan_t
ggml_backend_graph_plan_free(ggml_backend_t backend, ggml_backend_graph_plan_t plan) → void
ggml_backend_guid(ggml_backend_t backend) → ggml_guid_t: Backend (stream)
ggml_backend_init_best() → ggml_backend_t: = ggml_backend_dev_init(ggml_backend_dev_by_type(GPU) OR ggml_backend_dev_by_type(CPU), NULL)
ggml_backend_init_by_name(Pointer<Char> name, Pointer<Char> params) → ggml_backend_t: Direct backend (stream) initialization = ggml_backend_dev_init(ggml_backend_dev_by_name(name), params)
ggml_backend_init_by_type(ggml_backend_dev_type type, Pointer<Char> params) → ggml_backend_t
ggml_backend_is_cpu(ggml_backend_t backend) → bool
ggml_backend_load(Pointer<Char> path) → ggml_backend_reg_t: Load a backend from a dynamic library and register it
ggml_backend_load_all() → void: Load all known backends from dynamic libraries
ggml_backend_load_all_from_path(Pointer<Char> dir_path) → void
ggml_backend_name(ggml_backend_t backend) → Pointer<Char>
ggml_backend_offload_op(ggml_backend_t backend, Pointer<ggml_tensor> op) → bool
ggml_backend_reg_by_name(Pointer<Char> name) → ggml_backend_reg_t
ggml_backend_reg_count() → int: Backend (reg) enumeration
ggml_backend_reg_dev_count(ggml_backend_reg_t reg) → int
ggml_backend_reg_dev_get(ggml_backend_reg_t reg, int index) → ggml_backend_dev_t
ggml_backend_reg_get(int index) → ggml_backend_reg_t
ggml_backend_reg_get_proc_address(ggml_backend_reg_t reg, Pointer<Char> name) → Pointer<Void>
ggml_backend_reg_name(ggml_backend_reg_t reg) → Pointer<Char>: Backend (reg)
ggml_backend_register(ggml_backend_reg_t reg) → void: Backend registry
ggml_backend_sched_alloc_graph(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) → bool: Allocate and compute graph on the backend scheduler
ggml_backend_sched_free(ggml_backend_sched_t sched) → void
ggml_backend_sched_get_backend(ggml_backend_sched_t sched, int i) → ggml_backend_t
ggml_backend_sched_get_buffer_size(ggml_backend_sched_t sched, ggml_backend_t backend) → int
ggml_backend_sched_get_buffer_type(ggml_backend_sched_t sched, ggml_backend_t backend) → ggml_backend_buffer_type_t
ggml_backend_sched_get_n_backends(ggml_backend_sched_t sched) → int
ggml_backend_sched_get_n_copies(ggml_backend_sched_t sched) → int
ggml_backend_sched_get_n_splits(ggml_backend_sched_t sched) → int: Get the number of splits of the last graph
ggml_backend_sched_get_tensor_backend(ggml_backend_sched_t sched, Pointer<ggml_tensor> node) → ggml_backend_t
ggml_backend_sched_graph_compute(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) → ggml_status
ggml_backend_sched_graph_compute_async(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) → ggml_status
ggml_backend_sched_new(Pointer<ggml_backend_t> backends, Pointer<ggml_backend_buffer_type_t> bufts, int n_backends, int graph_size, bool parallel, bool op_offload) → ggml_backend_sched_t: Initialize a backend scheduler, backends with low index are given priority over backends with high index
ggml_backend_sched_reserve(ggml_backend_sched_t sched, Pointer<ggml_cgraph> measure_graph) → bool
ggml_backend_sched_reserve_size(ggml_backend_sched_t sched, Pointer<ggml_cgraph> measure_graph, Pointer<Size> sizes) → void: Initialize backend buffers from a measure graph
ggml_backend_sched_reset(ggml_backend_sched_t sched) → void: Reset all assignments and allocators - must be called before changing the node backends or allocating a new graph. This in effect deallocates all tensors that were previously allocated and leaves them with dangling pointers. The correct way to use this API is to discard the deallocated tensors and create new ones.
ggml_backend_sched_set_eval_callback(ggml_backend_sched_t sched, ggml_backend_sched_eval_callback callback, Pointer<Void> user_data) → void: Set a callback to be called for each resulting node during graph compute
ggml_backend_sched_set_tensor_backend(ggml_backend_sched_t sched, Pointer<ggml_tensor> node, ggml_backend_t backend) → void
ggml_backend_sched_split_graph(ggml_backend_sched_t sched, Pointer<ggml_cgraph> graph) → void: Split graph without allocating it
ggml_backend_sched_synchronize(ggml_backend_sched_t sched) → void
ggml_backend_supports_buft(ggml_backend_t backend, ggml_backend_buffer_type_t buft) → bool
ggml_backend_supports_op(ggml_backend_t backend, Pointer<ggml_tensor> op) → bool: NOTE: will be removed, use device version instead
ggml_backend_synchronize(ggml_backend_t backend) → void
ggml_backend_tensor_alloc(ggml_backend_buffer_t buffer, Pointer<ggml_tensor> tensor, Pointer<Void> addr) → ggml_status
ggml_backend_tensor_copy(Pointer<ggml_tensor> src, Pointer<ggml_tensor> dst) → void: tensor copy between different backends
ggml_backend_tensor_copy_async(ggml_backend_t backend_src, ggml_backend_t backend_dst, Pointer<ggml_tensor> src, Pointer<ggml_tensor> dst) → void: asynchronous copy the copy is performed after all the currently queued operations in backend_src backend_dst will wait for the copy to complete before performing other operations automatic fallback to sync copy if async is not supported
ggml_backend_tensor_get(Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_tensor_get_async(ggml_backend_t backend, Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_tensor_memset(Pointer<ggml_tensor> tensor, int value, int offset, int size) → void
ggml_backend_tensor_set(Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void: "offset" refers to the offset in tensor->data for setting/getting data
ggml_backend_tensor_set_async(ggml_backend_t backend, Pointer<ggml_tensor> tensor, Pointer<Void> data, int offset, int size) → void
ggml_backend_unload(ggml_backend_reg_t reg) → void: Unload a backend if loaded dynamically and unregister it
ggml_backend_view_init(Pointer<ggml_tensor> tensor) → ggml_status
ggml_bf16_to_fp32(ggml_bf16_t arg0) → double
ggml_bf16_to_fp32_row(Pointer<ggml_bf16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_blck_size(ggml_type type) → int
ggml_build_backward_expand(Pointer<ggml_context> ctx, Pointer<ggml_cgraph> cgraph, Pointer<Pointer<ggml_tensor>> grad_accs) → void
ggml_build_forward_expand(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void: automatic differentiation
ggml_can_repeat(Pointer<ggml_tensor> t0, Pointer<ggml_tensor> t1) → bool
ggml_cast(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_type type) → Pointer<ggml_tensor>
ggml_ceil(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_ceil_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_clamp(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double min, double max) → Pointer<ggml_tensor>: clamp in-place, returns view(a)
ggml_commit() → Pointer<Char>
ggml_concat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int dim) → Pointer<ggml_tensor>: concat a and b along dim used in stable-diffusion
ggml_cont(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: make contiguous
ggml_cont_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) → Pointer<ggml_tensor>: make contiguous, with new shape
ggml_cont_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) → Pointer<ggml_tensor>
ggml_cont_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) → Pointer<ggml_tensor>
ggml_cont_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor>
ggml_conv_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor>
ggml_conv_1d_dw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor>: depthwise TODO: this is very likely wrong for some cases! - needs more testing
ggml_conv_1d_dw_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int d0) → Pointer<ggml_tensor>
ggml_conv_1d_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s, int d) → Pointer<ggml_tensor>: conv_1d with padding = half alias for ggml_conv_1d(a, b, s, a->ne0/2, d)
ggml_conv_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor>
ggml_conv_2d_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor>
ggml_conv_2d_dw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1) → Pointer<ggml_tensor>: depthwise (via im2col and mul_mat)
ggml_conv_2d_dw_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride0, int stride1, int pad0, int pad1, int dilation0, int dilation1) → Pointer<ggml_tensor>: Depthwise 2D convolution may be faster than ggml_conv_2d_dw, but not available in all backends a: KW KH 1 C convolution kernel b: W H C N input data res: W_out H_out C N
ggml_conv_2d_s1_ph(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: kernel size is a->ne0 x a->ne1 stride is 1 padding is half example: a: 3 3 256 256 b: 64 64 256 1 res: 64 64 256 1 used in sam
ggml_conv_2d_sk_p0(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: kernel size is a->ne0 x a->ne1 stride is equal to kernel size padding is zero example: a: 16 16 3 768 b: 1024 1024 3 1 res: 64 64 768 1 used in sam
ggml_conv_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2) → Pointer<ggml_tensor>: a: OC*IC, KD, KH, KW b: N*IC, ID, IH, IW result: N*OC, OD, OH, OW
ggml_conv_3d_direct(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, int n_channels, int n_batch, int n_channels_out) → Pointer<ggml_tensor>
ggml_conv_transpose_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int p0, int d0) → Pointer<ggml_tensor>
ggml_conv_transpose_2d_p0(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int stride) → Pointer<ggml_tensor>
ggml_cos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_cos_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_count_equal(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: count number of equal elements in a and b
ggml_cpu_bf16_to_fp32(Pointer<ggml_bf16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_cpu_fp16_to_fp32(Pointer<ggml_fp16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_cpu_fp32_to_bf16(Pointer<Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void
ggml_cpu_fp32_to_fp16(Pointer<Float> arg0, Pointer<ggml_fp16_t> arg1, int arg2) → void
ggml_cpu_fp32_to_fp32(Pointer<Float> arg0, Pointer<Float> arg1, int arg2) → void
ggml_cpu_fp32_to_i32(Pointer<Float> arg0, Pointer<Int32> arg1, int arg2) → void
ggml_cpu_get_rvv_vlen() → int
ggml_cpu_get_sve_cnt() → int
ggml_cpu_has_amx_int8() → int
ggml_cpu_has_arm_fma() → int
ggml_cpu_has_avx() → int
ggml_cpu_has_avx2() → int
ggml_cpu_has_avx512() → int
ggml_cpu_has_avx512_bf16() → int
ggml_cpu_has_avx512_vbmi() → int
ggml_cpu_has_avx512_vnni() → int
ggml_cpu_has_avx_vnni() → int
ggml_cpu_has_bmi2() → int
ggml_cpu_has_dotprod() → int
ggml_cpu_has_f16c() → int
ggml_cpu_has_fma() → int
ggml_cpu_has_fp16_va() → int
ggml_cpu_has_llamafile() → int
ggml_cpu_has_matmul_int8() → int
ggml_cpu_has_neon() → int: ARM
ggml_cpu_has_riscv_v() → int: other
ggml_cpu_has_sme() → int
ggml_cpu_has_sse3() → int: x86
ggml_cpu_has_ssse3() → int
ggml_cpu_has_sve() → int
ggml_cpu_has_vsx() → int
ggml_cpu_has_vxe() → int
ggml_cpu_has_wasm_simd() → int
ggml_cpu_init() → void
ggml_cpy(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: a -> b, return view(b)
ggml_cross_entropy_loss(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: loss function
ggml_cross_entropy_loss_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor>
ggml_cumsum(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_custom_4d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3, Pointer<Pointer<ggml_tensor>> args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_custom_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<Pointer<ggml_tensor>> args, int n_args, ggml_custom_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_cycles() → int
ggml_cycles_per_ms() → int
ggml_diag(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_diag_mask_inf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor>: set elements above the diagonal to -INF
ggml_diag_mask_inf_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_diag_mask_zero(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor>: set elements above the diagonal to 0
ggml_diag_mask_zero_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_past) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_div(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_div_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_dup(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: operations on tensors with backpropagation
ggml_dup_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_dup_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> src) → Pointer<ggml_tensor>
ggml_element_size(Pointer<ggml_tensor> tensor) → int
ggml_elu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_elu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_exp(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_exp_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_expm1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_expm1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_fill(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double c) → Pointer<ggml_tensor>: Fill tensor a with constant c
ggml_fill_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double c) → Pointer<ggml_tensor>
ggml_flash_attn_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> d, bool masked) → Pointer<ggml_tensor>: TODO: needs to be adapted to ggml_flash_attn_ext
ggml_flash_attn_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> q, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> mask, double scale, double max_bias, double logit_softcap) → Pointer<ggml_tensor>: q: n_embd_k, n_batch, n_head, ne3 k: n_embd_k, n_kv, n_head_kv, ne3 v: n_embd_v, n_kv, n_head_kv, ne3 !! not transposed !! mask: n_kv, n_batch, ne32, ne33 res: n_embd_v, n_head, n_batch, ne3 !! permuted !!
ggml_flash_attn_ext_add_sinks(Pointer<ggml_tensor> a, Pointer<ggml_tensor> sinks) → void
ggml_flash_attn_ext_get_prec(Pointer<ggml_tensor> a) → ggml_prec
ggml_flash_attn_ext_set_prec(Pointer<ggml_tensor> a, ggml_prec prec) → void
ggml_floor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_floor_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_fopen(Pointer<Char> fname, Pointer<Char> mode) → Pointer<FILE>: accepts a UTF-8 path, even on Windows
ggml_format_name(Pointer<ggml_tensor> tensor, Pointer<Char> fmt) → Pointer<ggml_tensor>
ggml_fp16_to_fp32(int arg0) → double
ggml_fp16_to_fp32_row(Pointer<ggml_fp16_t> arg0, Pointer<Float> arg1, int arg2) → void
ggml_fp32_to_bf16(double arg0) → ggml_bf16_t
ggml_fp32_to_bf16_row(Pointer<Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void
ggml_fp32_to_bf16_row_ref(Pointer<Float> arg0, Pointer<ggml_bf16_t> arg1, int arg2) → void
ggml_fp32_to_fp16(double arg0) → int
ggml_fp32_to_fp16_row(Pointer<Float> arg0, Pointer<ggml_fp16_t> arg1, int arg2) → void
ggml_free(Pointer<ggml_context> ctx) → void
ggml_ftype_to_ggml_type(ggml_ftype ftype) → ggml_type
ggml_gallocr_alloc_graph(ggml_gallocr_t galloc, Pointer<ggml_cgraph> graph) → bool: automatic reallocation if the topology changes when using a single buffer returns false if using multiple buffers and a re-allocation is needed (call ggml_gallocr_reserve_n first to set the node buffers)
ggml_gallocr_free(ggml_gallocr_t galloc) → void
ggml_gallocr_get_buffer_size(ggml_gallocr_t galloc, int buffer_id) → int
ggml_gallocr_new(ggml_backend_buffer_type_t buft) → ggml_gallocr_t
ggml_gallocr_new_n(Pointer<ggml_backend_buffer_type_t> bufts, int n_bufs) → ggml_gallocr_t
ggml_gallocr_reserve(ggml_gallocr_t galloc, Pointer<ggml_cgraph> graph) → bool: pre-allocate buffers from a measure graph - does not allocate or modify the graph call with a worst-case graph to avoid buffer reallocations not strictly required for single buffer usage: ggml_gallocr_alloc_graph will reallocate the buffers automatically if needed returns false if the buffer allocation failed ggml_gallocr_resrve_n_size writes the buffer sizes per galloc buffer that would be allocated by ggml_gallocr_reserve_n to sizes
ggml_gallocr_reserve_n(ggml_gallocr_t galloc, Pointer<ggml_cgraph> graph, Pointer<Int> node_buffer_ids, Pointer<Int> leaf_buffer_ids) → bool
ggml_gallocr_reserve_n_size(ggml_gallocr_t galloc, Pointer<ggml_cgraph> graph, Pointer<Int> node_buffer_ids, Pointer<Int> leaf_buffer_ids, Pointer<Size> sizes) → void
ggml_gated_linear_attn(Pointer<ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> q, Pointer<ggml_tensor> g, Pointer<ggml_tensor> state, double scale) → Pointer<ggml_tensor>
ggml_geglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_geglu_erf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_geglu_erf_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_geglu_erf_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_geglu_quick(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_geglu_quick_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_geglu_quick_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_geglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_geglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_gelu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_gelu_erf(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: GELU using erf (error function) when possible some backends may fallback to approximation based on Abramowitz and Stegun formula
ggml_gelu_erf_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_gelu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_gelu_quick(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_gelu_quick_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_get_data(Pointer<ggml_tensor> tensor) → Pointer<Void>
ggml_get_data_f32(Pointer<ggml_tensor> tensor) → Pointer<Float>
ggml_get_f32_1d(Pointer<ggml_tensor> tensor, int i) → double
ggml_get_f32_nd(Pointer<ggml_tensor> tensor, int i0, int i1, int i2, int i3) → double
ggml_get_first_tensor(Pointer<ggml_context> ctx) → Pointer<ggml_tensor>: Context tensor enumeration and lookup
ggml_get_glu_op(Pointer<ggml_tensor> tensor) → ggml_glu_op
ggml_get_i32_1d(Pointer<ggml_tensor> tensor, int i) → int
ggml_get_i32_nd(Pointer<ggml_tensor> tensor, int i0, int i1, int i2, int i3) → int
ggml_get_max_tensor_size(Pointer<ggml_context> ctx) → int
ggml_get_mem_buffer(Pointer<ggml_context> ctx) → Pointer<Void>
ggml_get_mem_size(Pointer<ggml_context> ctx) → int
ggml_get_name(Pointer<ggml_tensor> tensor) → Pointer<Char>
ggml_get_next_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> tensor) → Pointer<ggml_tensor>
ggml_get_no_alloc(Pointer<ggml_context> ctx) → bool
ggml_get_rel_pos(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int qh, int kh) → Pointer<ggml_tensor>: used in sam
ggml_get_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: supports 4D a: a n_embd, ne1, ne2, ne3 b I32 n_rows, ne2, ne3, 1
ggml_get_rows_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor>
ggml_get_tensor(Pointer<ggml_context> ctx, Pointer<Char> name) → Pointer<ggml_tensor>
ggml_get_type_traits(ggml_type type) → Pointer<ggml_type_traits>
ggml_get_type_traits_cpu(ggml_type type) → Pointer<ggml_type_traits_cpu>
ggml_get_unary_op(Pointer<ggml_tensor> tensor) → ggml_unary_op
ggml_glu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_glu_op op, bool swapped) → Pointer<ggml_tensor>
ggml_glu_op_name(ggml_glu_op op) → Pointer<Char>
ggml_glu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_glu_op op) → Pointer<ggml_tensor>
ggml_graph_add_node(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> tensor) → void
ggml_graph_clear(Pointer<ggml_cgraph> cgraph) → void
ggml_graph_compute(Pointer<ggml_cgraph> cgraph, Pointer<ggml_cplan> cplan) → ggml_status
ggml_graph_compute_with_ctx(Pointer<ggml_context> ctx, Pointer<ggml_cgraph> cgraph, int n_threads) → ggml_status
ggml_graph_cpy(Pointer<ggml_cgraph> src, Pointer<ggml_cgraph> dst) → void
ggml_graph_dump_dot(Pointer<ggml_cgraph> gb, Pointer<ggml_cgraph> gf, Pointer<Char> filename) → void: dump the graph into a file using the dot format
ggml_graph_dup(Pointer<ggml_context> ctx, Pointer<ggml_cgraph> cgraph, bool force_grads) → Pointer<ggml_cgraph>
ggml_graph_get_grad(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> node) → Pointer<ggml_tensor>
ggml_graph_get_grad_acc(Pointer<ggml_cgraph> cgraph, Pointer<ggml_tensor> node) → Pointer<ggml_tensor>
ggml_graph_get_tensor(Pointer<ggml_cgraph> cgraph, Pointer<Char> name) → Pointer<ggml_tensor>
ggml_graph_n_nodes(Pointer<ggml_cgraph> cgraph) → int
ggml_graph_node(Pointer<ggml_cgraph> cgraph, int i) → Pointer<ggml_tensor>
ggml_graph_nodes(Pointer<ggml_cgraph> cgraph) → Pointer<Pointer<ggml_tensor>>
ggml_graph_overhead() → int
ggml_graph_overhead_custom(int size, bool grads) → int
ggml_graph_plan(Pointer<ggml_cgraph> cgraph, int n_threads, Pointer<ggml_threadpool> threadpool) → ggml_cplan: ggml_graph_plan() has to be called before ggml_graph_compute() when plan.work_size > 0, caller must allocate memory for plan.work_data
ggml_graph_print(Pointer<ggml_cgraph> cgraph) → void: print info and performance information for the graph
ggml_graph_reset(Pointer<ggml_cgraph> cgraph) → void
ggml_graph_size(Pointer<ggml_cgraph> cgraph) → int
ggml_group_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) → Pointer<ggml_tensor>: group normalize along ne0ne1n_groups used in stable-diffusion
ggml_group_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int n_groups, double eps) → Pointer<ggml_tensor>
ggml_guid_matches(ggml_guid_t guid_a, ggml_guid_t guid_b) → bool
ggml_hardsigmoid(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: hardsigmoid(x) = relu6(x + 3) / 6
ggml_hardswish(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: hardswish(x) = x * relu6(x + 3) / 6
ggml_im2col(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D, ggml_type dst_type) → Pointer<ggml_tensor>
ggml_im2col_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int IC, int s0, int s1, int s2, int p0, int p1, int p2, int d0, int d1, int d2, ggml_type dst_type) → Pointer<ggml_tensor>
ggml_im2col_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<Int64> ne, int s0, int s1, int p0, int p1, int d0, int d1, bool is_2D) → Pointer<ggml_tensor>
ggml_init(ggml_init_params params) → Pointer<ggml_context>: main
ggml_interpolate(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int mode) → Pointer<ggml_tensor>: Up- or downsamples the input to the specified size. 2D scale modes (eg. bilinear) are applied to the first two dimensions.
ggml_is_3d(Pointer<ggml_tensor> tensor) → bool
ggml_is_contiguous(Pointer<ggml_tensor> tensor) → bool: returns whether the tensor elements can be iterated over with a flattened index (no gaps, no permutation)
ggml_is_contiguous_0(Pointer<ggml_tensor> tensor) → bool
ggml_is_contiguous_1(Pointer<ggml_tensor> tensor) → bool
ggml_is_contiguous_2(Pointer<ggml_tensor> tensor) → bool
ggml_is_contiguous_channels(Pointer<ggml_tensor> tensor) → bool: true for tensor that is stored in memory as CxWxHxN and has been permuted to WxHxCxN
ggml_is_contiguous_rows(Pointer<ggml_tensor> tensor) → bool: true if the elements in dimension 0 are contiguous, or there is just 1 block of elements
ggml_is_contiguously_allocated(Pointer<ggml_tensor> tensor) → bool: returns whether the tensor elements are allocated as one contiguous block of memory (no gaps, but permutation ok)
ggml_is_empty(Pointer<ggml_tensor> tensor) → bool
ggml_is_matrix(Pointer<ggml_tensor> tensor) → bool
ggml_is_numa() → bool
ggml_is_permuted(Pointer<ggml_tensor> tensor) → bool
ggml_is_quantized(ggml_type type) → bool
ggml_is_scalar(Pointer<ggml_tensor> tensor) → bool
ggml_is_transposed(Pointer<ggml_tensor> tensor) → bool
ggml_is_vector(Pointer<ggml_tensor> tensor) → bool
ggml_l2_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>: l2 normalize along rows used in rwkv v7
ggml_l2_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>
ggml_leaky_relu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double negative_slope, bool inplace) → Pointer<ggml_tensor>
ggml_log(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_log_get(Pointer<ggml_log_callback> log_callback, Pointer<Pointer<Void>> user_data) → void: Set callback for all future logging events. If this is not called, or NULL is supplied, everything is output on stderr.
ggml_log_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
ggml_map_custom1(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>: n_tasks == GGML_N_TASKS_MAX means to use max number of tasks
ggml_map_custom1_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_custom1_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_map_custom2(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_map_custom2_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, ggml_custom2_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_map_custom3(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_map_custom3_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, ggml_custom3_op_t fun, int n_tasks, Pointer<Void> userdata) → Pointer<ggml_tensor>
ggml_mean(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: mean along rows
ggml_mul(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_mul_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_mul_mat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: A: k columns, n rows => ne03, ne02, n, k B: k columns, m rows (i.e. we transpose it internally) => ne03 * x, ne02 * y, m, k result is n columns, m rows => ne03 * x, ne02 * y, m, n
ggml_mul_mat_id(Pointer<ggml_context> ctx, Pointer<ggml_tensor> as, Pointer<ggml_tensor> b, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor>: indirect matrix multiplication
ggml_mul_mat_set_prec(Pointer<ggml_tensor> a, ggml_prec prec) → void
ggml_n_dims(Pointer<ggml_tensor> tensor) → int
ggml_nbytes(Pointer<ggml_tensor> tensor) → int
ggml_nbytes_pad(Pointer<ggml_tensor> tensor) → int
ggml_neg(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_neg_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_nelements(Pointer<ggml_tensor> tensor) → int
ggml_new_buffer(Pointer<ggml_context> ctx, int nbytes) → Pointer<Void>
ggml_new_f32(Pointer<ggml_context> ctx, double value) → Pointer<ggml_tensor>
ggml_new_graph(Pointer<ggml_context> ctx) → Pointer<ggml_cgraph>: graph allocation in a context
ggml_new_graph_custom(Pointer<ggml_context> ctx, int size, bool grads) → Pointer<ggml_cgraph>
ggml_new_i32(Pointer<ggml_context> ctx, int value) → Pointer<ggml_tensor>
ggml_new_tensor(Pointer<ggml_context> ctx, ggml_type type, int n_dims, Pointer<Int64> ne) → Pointer<ggml_tensor>
ggml_new_tensor_1d(Pointer<ggml_context> ctx, ggml_type type, int ne0) → Pointer<ggml_tensor>
ggml_new_tensor_2d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1) → Pointer<ggml_tensor>
ggml_new_tensor_3d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2) → Pointer<ggml_tensor>
ggml_new_tensor_4d(Pointer<ggml_context> ctx, ggml_type type, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor>
ggml_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>: normalize along rows
ggml_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>
ggml_nrows(Pointer<ggml_tensor> tensor) → int
ggml_numa_init(ggml_numa_strategy numa) → void
ggml_op_desc(Pointer<ggml_tensor> t) → Pointer<Char>
ggml_op_name(ggml_op op) → Pointer<Char>
ggml_op_symbol(ggml_op op) → Pointer<Char>
ggml_opt_alloc(ggml_opt_context_t opt_ctx, bool backward) → void: allocate the next graph for evaluation, either forward or forward + backward must be called exactly once prior to calling ggml_opt_eval
ggml_opt_context_optimizer_type(ggml_opt_context_t arg0) → ggml_opt_optimizer_type
ggml_opt_dataset_data(ggml_opt_dataset_t dataset) → Pointer<ggml_tensor>
ggml_opt_dataset_free(ggml_opt_dataset_t dataset) → void
ggml_opt_dataset_get_batch(ggml_opt_dataset_t dataset, Pointer<ggml_tensor> data_batch, Pointer<ggml_tensor> labels_batch, int ibatch) → void: get batch at position ibatch from dataset and copy the data to data_batch and labels_batch
ggml_opt_dataset_get_batch_host(ggml_opt_dataset_t dataset, Pointer<Void> data_batch, int nb_data_batch, Pointer<Void> labels_batch, int ibatch) → void
ggml_opt_dataset_init(ggml_type type_data, ggml_type type_label, int ne_datapoint, int ne_label, int ndata, int ndata_shard) → ggml_opt_dataset_t
ggml_opt_dataset_labels(ggml_opt_dataset_t dataset) → Pointer<ggml_tensor>
ggml_opt_dataset_ndata(ggml_opt_dataset_t dataset) → int: get underlying tensors that store the data
ggml_opt_dataset_shuffle(ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, int idata) → void: shuffle idata first datapoints from dataset with RNG from opt_ctx, shuffle all datapoints if idata is negative
ggml_opt_default_params(ggml_backend_sched_t backend_sched, ggml_opt_loss_type loss_type) → ggml_opt_params
ggml_opt_epoch(ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result_train, ggml_opt_result_t result_eval, int idata_split, ggml_opt_epoch_callback callback_train, ggml_opt_epoch_callback callback_eval) → void: do training on front of dataset, do evaluation only on back of dataset
ggml_opt_epoch_callback_progress_bar(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us) → void: callback that prints a progress bar on stderr
ggml_opt_eval(ggml_opt_context_t opt_ctx, ggml_opt_result_t result) → void: do forward pass, increment result if not NULL, do backward pass if allocated
ggml_opt_fit(ggml_backend_sched_t backend_sched, Pointer<ggml_context> ctx_compute, Pointer<ggml_tensor> inputs, Pointer<ggml_tensor> outputs, ggml_opt_dataset_t dataset, ggml_opt_loss_type loss_type, ggml_opt_optimizer_type optimizer, ggml_opt_get_optimizer_params get_opt_pars, int nepoch, int nbatch_logical, double val_split, bool silent) → void
ggml_opt_free(ggml_opt_context_t opt_ctx) → void
ggml_opt_get_constant_optimizer_params(Pointer<Void> userdata) → ggml_opt_optimizer_params: casts userdata to ggml_opt_optimizer_params and returns it
ggml_opt_get_default_optimizer_params(Pointer<Void> userdata) → ggml_opt_optimizer_params: returns the default optimizer params (constant, hard-coded values) userdata is not used
ggml_opt_grad_acc(ggml_opt_context_t opt_ctx, Pointer<ggml_tensor> node) → Pointer<ggml_tensor>: get the gradient accumulator for a node from the forward graph
ggml_opt_init(ggml_opt_params params) → ggml_opt_context_t
ggml_opt_inputs(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>: get underlying tensors that store data if not using static graphs these pointers become invalid with the next call to ggml_opt_alloc
ggml_opt_labels(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>
ggml_opt_loss(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>
ggml_opt_ncorrect(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>
ggml_opt_optimizer_name(ggml_opt_optimizer_type arg0) → Pointer<Char>
ggml_opt_outputs(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>
ggml_opt_pred(ggml_opt_context_t opt_ctx) → Pointer<ggml_tensor>
ggml_opt_prepare_alloc(ggml_opt_context_t opt_ctx, Pointer<ggml_context> ctx_compute, Pointer<ggml_cgraph> gf, Pointer<ggml_tensor> inputs, Pointer<ggml_tensor> outputs) → void: if not using static graphs, this function must be called prior to ggml_opt_alloc
ggml_opt_reset(ggml_opt_context_t opt_ctx, bool optimizer) → void: set gradients to zero, initilize loss, and optionally reset the optimizer
ggml_opt_result_accuracy(ggml_opt_result_t result, Pointer<Double> accuracy, Pointer<Double> unc) → void
ggml_opt_result_free(ggml_opt_result_t result) → void
ggml_opt_result_init() → ggml_opt_result_t: ====== Optimization Result ======
ggml_opt_result_loss(ggml_opt_result_t result, Pointer<Double> loss, Pointer<Double> unc) → void
ggml_opt_result_ndata(ggml_opt_result_t result, Pointer<Int64> ndata) → void: get data from result, uncertainties are optional and can be ignored by passing NULL
ggml_opt_result_pred(ggml_opt_result_t result, Pointer<Int32> pred) → void
ggml_opt_result_reset(ggml_opt_result_t result) → void
ggml_opt_static_graphs(ggml_opt_context_t opt_ctx) → bool
ggml_opt_step_adamw(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> m, Pointer<ggml_tensor> v, Pointer<ggml_tensor> adamw_params) → Pointer<ggml_tensor>: AdamW optimizer step Paper: https://arxiv.org/pdf/1711.05101v3.pdf PyTorch: https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html
ggml_opt_step_sgd(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> grad, Pointer<ggml_tensor> sgd_params) → Pointer<ggml_tensor>: stochastic gradient descent step (with weight decay)
ggml_out_prod(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: A: m columns, n rows, B: p columns, n rows, result is m columns, p rows
ggml_pad(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) → Pointer<ggml_tensor>: pad each dimension with zeros: x, ..., x -> x, ..., x, 0, ..., 0
ggml_pad_circular(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1, int p2, int p3) → Pointer<ggml_tensor>: pad each dimension with values on the other side of the torus (looping around)
ggml_pad_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) → Pointer<ggml_tensor>
ggml_pad_ext_circular(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int lp0, int rp0, int lp1, int rp1, int lp2, int rp2, int lp3, int rp3) → Pointer<ggml_tensor>: pad each dimension with values on the other side of the torus (looping around)
ggml_pad_reflect_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int p0, int p1) → Pointer<ggml_tensor>: pad each dimension with reflection: a, b, c, d -> b, a, b, c, d, c
ggml_permute(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int axis0, int axis1, int axis2, int axis3) → Pointer<ggml_tensor>
ggml_pool_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int s0, int p0) → Pointer<ggml_tensor>
ggml_pool_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) → Pointer<ggml_tensor>
ggml_pool_2d_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> af, ggml_op_pool op, int k0, int k1, int s0, int s1, double p0, double p1) → Pointer<ggml_tensor>
ggml_print_object(Pointer<ggml_object> obj) → void
ggml_print_objects(Pointer<ggml_context> ctx) → void
ggml_quantize_chunk(ggml_type type, Pointer<Float> src, Pointer<Void> dst, int start, int nrows, int n_per_row, Pointer<Float> imatrix) → int
ggml_quantize_free() → void
ggml_quantize_init(ggml_type type) → void
ggml_quantize_requires_imatrix(ggml_type type) → bool
ggml_reglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_reglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_reglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_relu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_relu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_repeat(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: if a is the same shape as b, and a is not parameter, return a otherwise, return a new tensor: repeat(a) to fit in b
ggml_repeat_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor>: repeat a to the specified shape
ggml_repeat_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: sums repetitions in a into shape of b
ggml_reset(Pointer<ggml_context> ctx) → void
ggml_reshape(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: return view(a), b specifies the new shape TODO: when we start computing gradient, make a copy instead of view
ggml_reshape_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0) → Pointer<ggml_tensor>: return view(a) TODO: when we start computing gradient, make a copy instead of view
ggml_reshape_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1) → Pointer<ggml_tensor>
ggml_reshape_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2) → Pointer<ggml_tensor>: return view(a) TODO: when we start computing gradient, make a copy instead of view
ggml_reshape_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3) → Pointer<ggml_tensor>
ggml_rms_norm(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>
ggml_rms_norm_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double eps) → Pointer<ggml_tensor>: a - x b - dy
ggml_rms_norm_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double eps) → Pointer<ggml_tensor>
ggml_roll(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int shift0, int shift1, int shift2, int shift3) → Pointer<ggml_tensor>: Move tensor elements by an offset given for each dimension. Elements that are shifted beyond the last position are wrapped around to the beginning.
ggml_rope(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) → Pointer<ggml_tensor>: rotary position embedding if (mode & 1) - skip n_past elements (NOT SUPPORTED) if (mode & GGML_ROPE_TYPE_NEOX) - GPT-NeoX style
ggml_rope_custom(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>
ggml_rope_custom_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>
ggml_rope_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>: custom RoPE c is freq factors (e.g. phi3-128k), (optional)
ggml_rope_ext_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>: rotary position embedding backward, i.e compute dx from dy a - dy
ggml_rope_ext_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_rope_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int n_dims, int mode) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_rope_multi(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>
ggml_rope_multi_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>
ggml_rope_multi_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int n_dims, Pointer<Int> sections, int mode, int n_ctx_orig, double freq_base, double freq_scale, double ext_factor, double attn_factor, double beta_fast, double beta_slow) → Pointer<ggml_tensor>
ggml_rope_yarn_corr_dims(int n_dims, int n_ctx_orig, double freq_base, double beta_fast, double beta_slow, Pointer<Float> dims) → void: compute correction dims for YaRN RoPE scaling
ggml_round(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_round_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_row_size(ggml_type type, int ne) → int
ggml_rwkv_wkv6(Pointer<ggml_context> ctx, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> r, Pointer<ggml_tensor> tf, Pointer<ggml_tensor> td, Pointer<ggml_tensor> state) → Pointer<ggml_tensor>
ggml_rwkv_wkv7(Pointer<ggml_context> ctx, Pointer<ggml_tensor> r, Pointer<ggml_tensor> w, Pointer<ggml_tensor> k, Pointer<ggml_tensor> v, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> state) → Pointer<ggml_tensor>
ggml_scale(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s) → Pointer<ggml_tensor>: operations on tensors without backpropagation
ggml_scale_bias(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) → Pointer<ggml_tensor>: x = s * a + b
ggml_scale_bias_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s, double b) → Pointer<ggml_tensor>
ggml_scale_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double s) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_set(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor>: b -> view(a,offset,nb1,nb2,3), return modified a
ggml_set_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) → Pointer<ggml_tensor>
ggml_set_1d_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int offset) → Pointer<ggml_tensor>
ggml_set_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) → Pointer<ggml_tensor>: b -> view(a,offset,nb1,nb2,3), return modified a
ggml_set_2d_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int offset) → Pointer<ggml_tensor>: b -> view(a,offset,nb1,nb2,3), return view(a)
ggml_set_abort_callback(ggml_abort_callback_t callback) → ggml_abort_callback_t: Set the abort callback (passing null will restore original abort functionality: printing a message to stdout) Returns the old callback for chaining
ggml_set_f32(Pointer<ggml_tensor> tensor, double value) → Pointer<ggml_tensor>
ggml_set_f32_1d(Pointer<ggml_tensor> tensor, int i, double value) → void
ggml_set_f32_nd(Pointer<ggml_tensor> tensor, int i0, int i1, int i2, int i3, double value) → void
ggml_set_i32(Pointer<ggml_tensor> tensor, int value) → Pointer<ggml_tensor>
ggml_set_i32_1d(Pointer<ggml_tensor> tensor, int i, int value) → void
ggml_set_i32_nd(Pointer<ggml_tensor> tensor, int i0, int i1, int i2, int i3, int value) → void
ggml_set_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor>: b -> view(a,offset,nb1,nb2,3), return view(a)
ggml_set_input(Pointer<ggml_tensor> tensor) → void: Tensor flags
ggml_set_loss(Pointer<ggml_tensor> tensor) → void
ggml_set_name(Pointer<ggml_tensor> tensor, Pointer<Char> name) → Pointer<ggml_tensor>
ggml_set_no_alloc(Pointer<ggml_context> ctx, bool no_alloc) → void
ggml_set_output(Pointer<ggml_tensor> tensor) → void
ggml_set_param(Pointer<ggml_tensor> tensor) → void
ggml_set_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c) → Pointer<ggml_tensor>: a TD n_embd, ne1, ne2, ne3 b TS n_embd, n_rows, ne02, ne03 | ne02 == ne2, ne03 == ne3 c I64 n_rows, ne11, ne12, 1 | ci in [0, ne1)
ggml_set_zero(Pointer<ggml_tensor> tensor) → Pointer<ggml_tensor>
ggml_sgn(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sgn_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sigmoid(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sigmoid_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_silu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_silu_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>: a - x b - dy
ggml_silu_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sin(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sin_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_soft_max(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_soft_max_add_sinks(Pointer<ggml_tensor> a, Pointer<ggml_tensor> sinks) → void
ggml_soft_max_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) → Pointer<ggml_tensor>: a ne0, ne01, ne02, ne03 mask ne0, ne11, ne12, ne13 | ne11 >= ne01, F16 or F32, optional
ggml_soft_max_ext_back(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) → Pointer<ggml_tensor>
ggml_soft_max_ext_back_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double scale, double max_bias) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_soft_max_ext_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> mask, double scale, double max_bias) → Pointer<ggml_tensor>
ggml_soft_max_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: in-place, returns view(a)
ggml_softplus(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_softplus_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_solve_tri(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, bool left, bool lower, bool uni) → Pointer<ggml_tensor>: Solves a specific equation of the form Ax=B, where A is a triangular matrix without zeroes on the diagonal (i.e. invertible). B can have any number of columns, but must have the same number of rows as A If A is n, n and B is n, m, then the result will be n, m as well Has O(n^3) complexity (unlike most matrix ops out there), so use on cases where n > 100 sparingly, pre-chunk if necessary.
ggml_sqr(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sqr_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sqrt(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sqrt_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_ssm_conv(Pointer<ggml_context> ctx, Pointer<ggml_tensor> sx, Pointer<ggml_tensor> c) → Pointer<ggml_tensor>
ggml_ssm_scan(Pointer<ggml_context> ctx, Pointer<ggml_tensor> s, Pointer<ggml_tensor> x, Pointer<ggml_tensor> dt, Pointer<ggml_tensor> A, Pointer<ggml_tensor> B, Pointer<ggml_tensor> C, Pointer<ggml_tensor> ids) → Pointer<ggml_tensor>
ggml_status_to_string(ggml_status status) → Pointer<Char>
ggml_step(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_step_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_sub(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_sub_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_sum(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: return scalar
ggml_sum_rows(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: sums along rows, with input shape a,b,c,d return shape 1,b,c,d
ggml_swiglu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_swiglu_oai(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, double alpha, double limit) → Pointer<ggml_tensor>
ggml_swiglu_split(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b) → Pointer<ggml_tensor>
ggml_swiglu_swapped(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_tallocr_alloc(Pointer<ggml_tallocr> talloc, Pointer<ggml_tensor> tensor) → ggml_status
ggml_tallocr_new(ggml_backend_buffer_t buffer) → ggml_tallocr
ggml_tanh(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_tanh_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_tensor_overhead() → int: use this to compute the memory overhead of a tensor
ggml_threadpool_free(Pointer<ggml_threadpool> threadpool) → void
ggml_threadpool_get_n_threads(Pointer<ggml_threadpool> threadpool) → int
ggml_threadpool_new(Pointer<ggml_threadpool_params> params) → Pointer<ggml_threadpool>
ggml_threadpool_params_default(int n_threads) → ggml_threadpool_params
ggml_threadpool_params_init(Pointer<ggml_threadpool_params> p, int n_threads) → void
ggml_threadpool_params_match(Pointer<ggml_threadpool_params> p0, Pointer<ggml_threadpool_params> p1) → bool
ggml_threadpool_pause(Pointer<ggml_threadpool> threadpool) → void
ggml_threadpool_resume(Pointer<ggml_threadpool> threadpool) → void
ggml_time_init() → void
ggml_time_ms() → int
ggml_time_us() → int
ggml_timestep_embedding(Pointer<ggml_context> ctx, Pointer<ggml_tensor> timesteps, int dim, int max_period) → Pointer<ggml_tensor>: Ref: https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L151 timesteps: N, return: N, dim
ggml_top_k(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int k) → Pointer<ggml_tensor>: top k elements per row note: the resulting top k indices are in no particular order
ggml_transpose(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: alias for ggml_permute(ctx, a, 1, 0, 2, 3)
ggml_tri(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_tri_type type) → Pointer<ggml_tensor>
ggml_trunc(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>: Truncates the fractional part of each element in the tensor (towards zero). For example: trunc(3.7) = 3.0, trunc(-2.9) = -2.0 Similar to std::trunc in C/C++.
ggml_trunc_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a) → Pointer<ggml_tensor>
ggml_type_name(ggml_type type) → Pointer<Char>
ggml_type_size(ggml_type type) → int
ggml_type_sizef(ggml_type type) → double
ggml_unary(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) → Pointer<ggml_tensor>
ggml_unary_inplace(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, ggml_unary_op op) → Pointer<ggml_tensor>
ggml_unary_op_name(ggml_unary_op op) → Pointer<Char>
ggml_unravel_index(Pointer<ggml_tensor> tensor, int i, Pointer<Int64> i0, Pointer<Int64> i1, Pointer<Int64> i2, Pointer<Int64> i3) → void: Converts a flat index into coordinates
ggml_upscale(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int scale_factor, ggml_scale_mode mode) → Pointer<ggml_tensor>
ggml_upscale_ext(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, ggml_scale_mode mode) → Pointer<ggml_tensor>
ggml_used_mem(Pointer<ggml_context> ctx) → int
ggml_validate_row_data(ggml_type type, Pointer<Void> data, int nbytes) → bool
ggml_version() → Pointer<Char>: misc
ggml_view_1d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int offset) → Pointer<ggml_tensor>: offset in bytes
ggml_view_2d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int nb1, int offset) → Pointer<ggml_tensor>
ggml_view_3d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int nb1, int nb2, int offset) → Pointer<ggml_tensor>
ggml_view_4d(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int ne0, int ne1, int ne2, int ne3, int nb1, int nb2, int nb3, int offset) → Pointer<ggml_tensor>
ggml_view_tensor(Pointer<ggml_context> ctx, Pointer<ggml_tensor> src) → Pointer<ggml_tensor>
ggml_win_part(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int w) → Pointer<ggml_tensor>: partition into non-overlapping windows with padding if needed example: a: 768 64 64 1 w: 14 res: 768 14 14 25 used in sam
ggml_win_unpart(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, int w0, int h0, int w) → Pointer<ggml_tensor>: reverse of ggml_win_part used in sam
ggml_xielu(Pointer<ggml_context> ctx, Pointer<ggml_tensor> a, double alpha_n, double alpha_p, double beta, double eps) → Pointer<ggml_tensor>: xIELU activation function x = x * (c_a(alpha_n) + c_b(alpha_p, beta) * sigmoid(beta * x)) + eps * (x > 0) where c_a = softplus and c_b(a, b) = softplus(a) + b are constraining functions that constrain the positive and negative source alpha values respectively
gguf_add_tensor(Pointer<gguf_context> ctx, Pointer<ggml_tensor> tensor) → void: add tensor to GGUF context, tensor name must be unique
gguf_find_key(Pointer<gguf_context> ctx, Pointer<Char> key) → int
gguf_find_tensor(Pointer<gguf_context> ctx, Pointer<Char> name) → int
gguf_free(Pointer<gguf_context> ctx) → void: GGML_API struct gguf_context * gguf_init_from_buffer(..);
gguf_get_alignment(Pointer<gguf_context> ctx) → int
gguf_get_arr_data(Pointer<gguf_context> ctx, int key_id) → Pointer<Void>: get raw pointer to the first element of the array with the given key_id for bool arrays, note that they are always stored as int8 on all platforms (usually this makes no difference)
gguf_get_arr_n(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_arr_str(Pointer<gguf_context> ctx, int key_id, int i) → Pointer<Char>: get ith C string from array with given key_id
gguf_get_arr_type(Pointer<gguf_context> ctx, int key_id) → gguf_type
gguf_get_data_offset(Pointer<gguf_context> ctx) → int
gguf_get_key(Pointer<gguf_context> ctx, int key_id) → Pointer<Char>
gguf_get_kv_type(Pointer<gguf_context> ctx, int key_id) → gguf_type
gguf_get_meta_data(Pointer<gguf_context> ctx, Pointer<Void> data) → void: writes the meta data to pointer "data"
gguf_get_meta_size(Pointer<gguf_context> ctx) → int: get the size in bytes of the meta data (header, kv pairs, tensor info) including padding
gguf_get_n_kv(Pointer<gguf_context> ctx) → int
gguf_get_n_tensors(Pointer<gguf_context> ctx) → int
gguf_get_tensor_name(Pointer<gguf_context> ctx, int tensor_id) → Pointer<Char>
gguf_get_tensor_offset(Pointer<gguf_context> ctx, int tensor_id) → int
gguf_get_tensor_size(Pointer<gguf_context> ctx, int tensor_id) → int
gguf_get_tensor_type(Pointer<gguf_context> ctx, int tensor_id) → ggml_type
gguf_get_val_bool(Pointer<gguf_context> ctx, int key_id) → bool
gguf_get_val_data(Pointer<gguf_context> ctx, int key_id) → Pointer<Void>
gguf_get_val_f32(Pointer<gguf_context> ctx, int key_id) → double
gguf_get_val_f64(Pointer<gguf_context> ctx, int key_id) → double
gguf_get_val_i16(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_i32(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_i64(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_i8(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_str(Pointer<gguf_context> ctx, int key_id) → Pointer<Char>
gguf_get_val_u16(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_u32(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_u64(Pointer<gguf_context> ctx, int key_id) → int
gguf_get_val_u8(Pointer<gguf_context> ctx, int key_id) → int: will abort if the wrong type is used for the key
gguf_get_version(Pointer<gguf_context> ctx) → int
gguf_init_empty() → Pointer<gguf_context>
gguf_init_from_file(Pointer<Char> fname, gguf_init_params params) → Pointer<gguf_context>
gguf_remove_key(Pointer<gguf_context> ctx, Pointer<Char> key) → int: removes key if it exists, returns id that the key had prior to removal (-1 if it didn't exist)
gguf_set_arr_data(Pointer<gguf_context> ctx, Pointer<Char> key, gguf_type type, Pointer<Void> data, int n) → void
gguf_set_arr_str(Pointer<gguf_context> ctx, Pointer<Char> key, Pointer<Pointer<Char>> data, int n) → void: creates a new array with n strings and copies the corresponding strings from data
gguf_set_kv(Pointer<gguf_context> ctx, Pointer<gguf_context> src) → void: set or add KV pairs from another context
gguf_set_tensor_data(Pointer<gguf_context> ctx, Pointer<Char> name, Pointer<Void> data) → void: assumes that at least gguf_get_tensor_size bytes can be read from data
gguf_set_tensor_type(Pointer<gguf_context> ctx, Pointer<Char> name, ggml_type type) → void
gguf_set_val_bool(Pointer<gguf_context> ctx, Pointer<Char> key, bool val) → void
gguf_set_val_f32(Pointer<gguf_context> ctx, Pointer<Char> key, double val) → void
gguf_set_val_f64(Pointer<gguf_context> ctx, Pointer<Char> key, double val) → void
gguf_set_val_i16(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_i32(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_i64(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_i8(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_str(Pointer<gguf_context> ctx, Pointer<Char> key, Pointer<Char> val) → void
gguf_set_val_u16(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_u32(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_u64(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void
gguf_set_val_u8(Pointer<gguf_context> ctx, Pointer<Char> key, int val) → void: overrides an existing KV pair or adds a new one, the new KV pair is always at the back
gguf_type_name(gguf_type type) → Pointer<Char>
gguf_write_to_file(Pointer<gguf_context> ctx, Pointer<Char> fname, bool only_meta) → bool: write the entire context to a binary file
lcpp_destroy() → void
lcpp_free_common_chat_msg(Pointer<lcpp_common_chat_msg_t> msg) → void
lcpp_free_machine_info(Pointer<lcpp_machine_info_t> mach_info) → void
lcpp_free_model_info(Pointer<lcpp_model_info_t> model_info) → void
lcpp_free_model_mem(Pointer<lcpp_model_mem_t> model_mem) → void
lcpp_free_model_rt(Pointer<lcpp_model_rt_t> model_rt) → void
lcpp_free_text(Pointer<LcppTextStruct_t> ptr) → void
lcpp_get_machine_info() → Pointer<lcpp_machine_info_t>
lcpp_get_model_info(Pointer<Char> model_file) → Pointer<lcpp_model_info_t>
lcpp_initialize() → void
lcpp_model_details(Pointer<Char> model_path) → Pointer<lcpp_model_rt_t>
lcpp_params_defaults() → lcpp_params_t
lcpp_prompt(lcpp_sampling_params_t sampling_params, Pointer<Pointer<lcpp_common_chat_msg_t>> messages, int n_messages, Pointer<Pointer<lcpp_common_chat_tool_t>> tools, int n_tools) → int
lcpp_reconfigure(llama_context_params_t context_params, lcpp_params_t lcpp_params$1) → void
lcpp_reset() → void
lcpp_sampling_params_defaults() → lcpp_sampling_params_t
lcpp_send_abort_signal(bool abort) → void
lcpp_send_cancel_signal(bool cancel) → void
lcpp_set_chat_message_callback(LppChatMessageCallback chat_msg_callback) → void
lcpp_set_model_load_progress_callback(LppProgressCallback model_loading_callback) → void
lcpp_set_on_abort_callback(LcppOnAbortCallback on_abort_callback) → void
lcpp_set_on_cancel_callback(LcppOnCancelCallback on_cancel_callback) → void
lcpp_set_token_stream_callback(LppTokenStreamCallback newtoken_callback) → void
lcpp_tokenize(Pointer<Char> text, int n_text, bool add_special, bool parse_special, Pointer<Pointer<llama_token>> tokens) → int
lcpp_unload() → void
lcpp_unset_chat_message_callback() → void
lcpp_unset_model_load_progress_callback() → void
lcpp_unset_on_abort_callback() → void
lcpp_unset_on_cancel_callback() → void
lcpp_unset_token_stream_callback() → void
listOfMessagesToNative(List<ChatMessage> messages) → Pointer<Pointer<lcpp_common_chat_msg_t>>
listOfToolsToNative(List<Tool<Object, ToolOptions, Object>> tools) → Pointer<Pointer<lcpp_common_chat_tool_t>>
llama_adapter_get_alora_invocation_tokens(Pointer<llama_adapter_lora> adapter) → Pointer<llama_token>
llama_adapter_get_alora_n_invocation_tokens(Pointer<llama_adapter_lora> adapter) → int: Get the invocation tokens if the current lora is an alora
llama_adapter_lora_free(Pointer<llama_adapter_lora> adapter) → void: Manually free a LoRA adapter NOTE: loaded adapters will be free when the associated model is deleted
llama_adapter_lora_init(Pointer<llama_model> model, Pointer<Char> path_lora) → Pointer<llama_adapter_lora>: Load a LoRA adapter from file
llama_adapter_meta_count(Pointer<llama_adapter_lora> adapter) → int: Get the number of metadata key/value pairs
llama_adapter_meta_key_by_index(Pointer<llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) → int: Get metadata key name by index
llama_adapter_meta_val_str(Pointer<llama_adapter_lora> adapter, Pointer<Char> key, Pointer<Char> buf, int buf_size) → int: Get metadata value as a string by key name
llama_adapter_meta_val_str_by_index(Pointer<llama_adapter_lora> adapter, int i, Pointer<Char> buf, int buf_size) → int: Get metadata value as a string by index
llama_add_bos_token(Pointer<llama_vocab> vocab) → bool
llama_add_eos_token(Pointer<llama_vocab> vocab) → bool
llama_apply_adapter_cvec(Pointer<llama_context> ctx, Pointer<Float> data, int len, int n_embd, int il_start, int il_end) → int: Apply a loaded control vector to a llama_context, or if data is NULL, clear the currently loaded vector. n_embd should be the size of a single layer's control, and data should point to an n_embd x n_layers buffer starting from layer 1. il_start and il_end are the layer range the vector should apply to (both inclusive) See llama_control_vector_load in common to load a control vector.
llama_attach_threadpool(Pointer<llama_context> ctx, ggml_threadpool_t threadpool, ggml_threadpool_t threadpool_batch) → void: Optional: an auto threadpool gets created in ggml if not passed explicitly
llama_backend_free() → void: Call once at the end of the program - currently only used for MPI
llama_backend_init() → void: Initialize the llama + ggml backend If numa is true, use NUMA optimizations Call once at the start of the program
llama_batch_free(llama_batch batch) → void: Frees a batch of tokens allocated with llama_batch_init()
llama_batch_get_one(Pointer<llama_token> tokens, int n_tokens) → llama_batch: Return batch for single sequence of tokens The sequence ID will be fixed to 0 The position of the tokens will be tracked automatically by llama_decode
llama_batch_init(int n_tokens, int embd, int n_seq_max) → llama_batch: Allocates a batch of tokens on the heap that can hold a maximum of n_tokens Each token can be assigned up to n_seq_max sequence ids The batch has to be freed with llama_batch_free() If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float) Otherwise, llama_batch.token will be allocated to store n_tokens llama_token The rest of the llama_batch members are allocated with size n_tokens All members are left uninitialized
llama_chat_apply_template(Pointer<Char> tmpl, Pointer<llama_chat_message> chat, int n_msg, bool add_ass, Pointer<Char> buf, int length) → int: Apply chat template. Inspired by hf apply_chat_template() on python. Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model" NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggml-org/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template @param tmpl A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead. @param chat Pointer to a list of multiple llama_chat_message @param n_msg Number of llama_chat_message in this chat @param add_ass Whether to end the prompt with the token(s) that indicate the start of an assistant message. @param buf A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages) @param length The size of the allocated buffer @return The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
llama_chat_builtin_templates(Pointer<Pointer<Char>> output, int len) → int: Get list of built-in chat templates
llama_clear_adapter_lora(Pointer<llama_context> ctx) → void: Remove all LoRA adapters from given context
llama_context_default_params() → llama_context_params
llama_copy_state_data(Pointer<llama_context> ctx, Pointer<Uint8> dst) → int
llama_decode(Pointer<llama_context> ctx, llama_batch batch) → int: Process a batch of tokens. Requires the context to have a memory. For encode-decoder contexts, processes the batch using the decoder. Positive return values does not mean a fatal error, but rather a warning. Upon fatal-error or abort, the ubatches that managed to be been processed will remain in the memory state of the context To handle this correctly, query the memory state using llama_memory_seq_pos_min() and llama_memory_seq_pos_max() Upon other return values, the memory state is restored to the state before this call 0 - success 1 - could not find a KV slot for the batch (try reducing the size of the batch or increase the context) 2 - aborted (processed ubatches will remain in the context's memory) -1 - invalid input batch < -1 - fatal error (processed ubatches will remain in the context's memory)
llama_detach_threadpool(Pointer<llama_context> ctx) → void
llama_detokenize(Pointer<llama_vocab> vocab, Pointer<llama_token> tokens, int n_tokens, Pointer<Char> text, int text_len_max, bool remove_special, bool unparse_special) → int: @details Convert the provided tokens into text (inverse of llama_tokenize()). @param text The char pointer must be large enough to hold the resulting text. @return Returns the number of chars/bytes on success, no more than text_len_max. @return Returns a negative number on failure - the number of chars/bytes that would have been returned. @param remove_special Allow to remove BOS and EOS tokens if model is configured to do so. @param unparse_special If true, special tokens are rendered in the output.
llama_encode(Pointer<llama_context> ctx, llama_batch batch) → int: Process a batch of tokens. In contrast to llama_decode() - this call does not use KV cache. For encode-decoder contexts, processes the batch using the encoder. Can store the encoder output internally for later use by the decoder's cross-attention layers. 0 - success < 0 - error. the memory state is restored to the state before this call
llama_flash_attn_type_name(llama_flash_attn_type flash_attn_type) → Pointer<Char>
llama_free(Pointer<llama_context> ctx) → void: Frees all allocated memory
llama_free_model(Pointer<llama_model> model) → void
llama_get_embeddings(Pointer<llama_context> ctx) → Pointer<Float>: Get all output token embeddings. when pooling_type == LLAMA_POOLING_TYPE_NONE or when using a generative model, the embeddings for which llama_batch.logitsi != 0 are stored contiguously in the order they have appeared in the batch. shape: n_outputs*n_embd Otherwise, returns NULL. TODO: deprecate in favor of llama_get_embeddings_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
llama_get_embeddings_ith(Pointer<llama_context> ctx, int i) → Pointer<Float>: Get the embeddings for the ith token. For positive indices, Equivalent to: llama_get_embeddings(ctx) + ctx->output_idsi*n_embd Negative indicies can be used to access embeddings in reverse order, -1 is the last embedding. shape: n_embd (1-dimensional) returns NULL for invalid ids.
llama_get_embeddings_seq(Pointer<llama_context> ctx, int seq_id) → Pointer<Float>: Get the embeddings for a sequence id Returns NULL if pooling_type is LLAMA_POOLING_TYPE_NONE when pooling_type == LLAMA_POOLING_TYPE_RANK, returns floatn_cls_out with the rank(s) of the sequence otherwise: floatn_embd (1-dimensional)
llama_get_logits(Pointer<llama_context> ctx) → Pointer<Float>: Token logits obtained from the last call to llama_decode() The logits for which llama_batch.logitsi != 0 are stored contiguously in the order they have appeared in the batch. Rows: number of tokens for which llama_batch.logitsi != 0 Cols: n_vocab TODO: deprecate in favor of llama_get_logits_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
llama_get_logits_ith(Pointer<llama_context> ctx, int i) → Pointer<Float>: Logits for the ith token. For positive indices, Equivalent to: llama_get_logits(ctx) + ctx->output_idsi*n_vocab Negative indicies can be used to access logits in reverse order, -1 is the last logit. returns NULL for invalid ids.
llama_get_memory(Pointer<llama_context> ctx) → llama_memory_t
llama_get_model(Pointer<llama_context> ctx) → Pointer<llama_model>
llama_get_state_size(Pointer<llama_context> ctx) → int
llama_init_from_model(Pointer<llama_model> model, llama_context_params params) → Pointer<llama_context>
llama_load_model_from_file(Pointer<Char> path_model, llama_model_params params) → Pointer<llama_model>
llama_load_session_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → bool
llama_log_get(Pointer<ggml_log_callback> log_callback, Pointer<Pointer<Void>> user_data) → void: Set callback for all future logging events. If this is not called, or NULL is supplied, everything is output on stderr. The logger state is global so these functions are NOT thread safe.
llama_log_set(ggml_log_callback log_callback, Pointer<Void> user_data) → void
llama_max_devices() → int
llama_max_parallel_sequences() → int
llama_max_tensor_buft_overrides() → int
llama_memory_breakdown_print(Pointer<llama_context> ctx) → void: print a breakdown of per-device memory use via LLAMA_LOG:
llama_memory_can_shift(llama_memory_t mem) → bool: Check if the memory supports shifting
llama_memory_clear(llama_memory_t mem, bool data) → void: Clear the memory contents If data == true, the data buffers will also be cleared together with the metadata
llama_memory_seq_add(llama_memory_t mem, int seq_id, int p0, int p1, int delta) → void: Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1) p0 < 0 : 0, p1 p1 < 0 : [p0, inf)
llama_memory_seq_cp(llama_memory_t mem, int seq_id_src, int seq_id_dst, int p0, int p1) → void: Copy all tokens that belong to the specified sequence to another sequence p0 < 0 : 0, p1 p1 < 0 : [p0, inf)
llama_memory_seq_div(llama_memory_t mem, int seq_id, int p0, int p1, int d) → void: Integer division of the positions by factor of d > 1 p0 < 0 : 0, p1 p1 < 0 : [p0, inf)
llama_memory_seq_keep(llama_memory_t mem, int seq_id) → void: Removes all tokens that do not belong to the specified sequence
llama_memory_seq_pos_max(llama_memory_t mem, int seq_id) → int: Returns the largest position present in the memory for the specified sequence Note that all positions in the range pos_min, pos_max are guaranteed to be present in the memory Return -1 if the sequence is empty
llama_memory_seq_pos_min(llama_memory_t mem, int seq_id) → int: Returns the smallest position present in the memory for the specified sequence This is typically non-zero only for SWA caches Note that all positions in the range pos_min, pos_max are guaranteed to be present in the memory Return -1 if the sequence is empty
llama_memory_seq_rm(llama_memory_t mem, int seq_id, int p0, int p1) → bool: Removes all tokens that belong to the specified sequence and have positions in [p0, p1) Returns false if a partial sequence cannot be removed. Removing a whole sequence never fails seq_id < 0 : match any sequence p0 < 0 : 0, p1 p1 < 0 : [p0, inf)
llama_model_chat_template(Pointer<llama_model> model, Pointer<Char> name) → Pointer<Char>: Get the default chat template. Returns nullptr if not available If name is NULL, returns the default chat template
llama_model_cls_label(Pointer<llama_model> model, int i) → Pointer<Char>: Returns label of classifier output by index (<n_cls_out). Returns nullptr if no label provided
llama_model_decoder_start_token(Pointer<llama_model> model) → int: For encoder-decoder models, this function returns id of the token that must be provided to the decoder to start generating output sequence. For other models, it returns -1.
llama_model_default_params() → llama_model_params: Helpers for getting default parameters TODO: update API to start accepting pointers to params structs (https://github.com/ggml-org/llama.cpp/discussions/9172)
llama_model_desc(Pointer<llama_model> model, Pointer<Char> buf, int buf_size) → int: Get a string describing the model type
llama_model_free(Pointer<llama_model> model) → void
llama_model_get_vocab(Pointer<llama_model> model) → Pointer<llama_vocab>
llama_model_has_decoder(Pointer<llama_model> model) → bool: Returns true if the model contains a decoder that requires llama_decode() call
llama_model_has_encoder(Pointer<llama_model> model) → bool: Returns true if the model contains an encoder that requires llama_encode() call
llama_model_is_diffusion(Pointer<llama_model> model) → bool: Returns true if the model is diffusion-based (like LLaDA, Dream, etc.)
llama_model_is_hybrid(Pointer<llama_model> model) → bool: Returns true if the model is hybrid (like Jamba, Granite, etc.)
llama_model_is_recurrent(Pointer<llama_model> model) → bool: Returns true if the model is recurrent (like Mamba, RWKV, etc.)
llama_model_load_from_file(Pointer<Char> path_model, llama_model_params params) → Pointer<llama_model>: Load the model from a file If the file is split into multiple parts, the file name must follow this pattern:
llama_model_load_from_splits(Pointer<Pointer<Char>> paths, int n_paths, llama_model_params params) → Pointer<llama_model>: Load the model from multiple splits (support custom naming scheme) The paths must be in the correct order
llama_model_meta_count(Pointer<llama_model> model) → int: Get the number of metadata key/value pairs
llama_model_meta_key_by_index(Pointer<llama_model> model, int i, Pointer<Char> buf, int buf_size) → int: Get metadata key name by index
llama_model_meta_key_str(llama_model_meta_key key) → Pointer<Char>
llama_model_meta_val_str(Pointer<llama_model> model, Pointer<Char> key, Pointer<Char> buf, int buf_size) → int: Get metadata value as a string by key name
llama_model_meta_val_str_by_index(Pointer<llama_model> model, int i, Pointer<Char> buf, int buf_size) → int: Get metadata value as a string by index
llama_model_n_cls_out(Pointer<llama_model> model) → int: Returns the number of classifier outputs (only valid for classifier models) Undefined behavior for non-classifier models
llama_model_n_ctx_train(Pointer<llama_model> model) → int
llama_model_n_embd(Pointer<llama_model> model) → int
llama_model_n_embd_inp(Pointer<llama_model> model) → int
llama_model_n_head(Pointer<llama_model> model) → int
llama_model_n_head_kv(Pointer<llama_model> model) → int
llama_model_n_layer(Pointer<llama_model> model) → int
llama_model_n_params(Pointer<llama_model> model) → int: Returns the total number of parameters in the model
llama_model_n_swa(Pointer<llama_model> model) → int
llama_model_quantize(Pointer<Char> fname_inp, Pointer<Char> fname_out, Pointer<llama_model_quantize_params> params) → int: Returns 0 on success
llama_model_quantize_default_params() → llama_model_quantize_params
llama_model_rope_freq_scale_train(Pointer<llama_model> model) → double: Get the model's RoPE frequency scaling factor
llama_model_rope_type(Pointer<llama_model> model) → llama_rope_type
llama_model_save_to_file(Pointer<llama_model> model, Pointer<Char> path_model) → void
llama_model_size(Pointer<llama_model> model) → int: Returns the total size of all the tensors in the model in bytes
llama_n_batch(Pointer<llama_context> ctx) → int
llama_n_ctx(Pointer<llama_context> ctx) → int: NOTE: After creating a llama_context, it is recommended to query the actual values using these functions In some cases the requested values via llama_context_params may differ from the actual values used by the context ref: https://github.com/ggml-org/llama.cpp/pull/17046#discussion_r2503085732
llama_n_ctx_seq(Pointer<llama_context> ctx) → int
llama_n_ctx_train(Pointer<llama_model> model) → int
llama_n_embd(Pointer<llama_model> model) → int
llama_n_head(Pointer<llama_model> model) → int
llama_n_layer(Pointer<llama_model> model) → int
llama_n_seq_max(Pointer<llama_context> ctx) → int
llama_n_threads(Pointer<llama_context> ctx) → int: Get the number of threads used for generation of a single token.
llama_n_threads_batch(Pointer<llama_context> ctx) → int: Get the number of threads used for prompt and batch processing (multiple token).
llama_n_ubatch(Pointer<llama_context> ctx) → int
llama_n_vocab(Pointer<llama_vocab> vocab) → int
llama_new_context_with_model(Pointer<llama_model> model, llama_context_params params) → Pointer<llama_context>
llama_numa_init(ggml_numa_strategy numa) → void
llama_opt_epoch(Pointer<llama_context> lctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result_train, ggml_opt_result_t result_eval, int idata_split, ggml_opt_epoch_callback callback_train, ggml_opt_epoch_callback callback_eval) → void
llama_opt_init(Pointer<llama_context> lctx, Pointer<llama_model> model, llama_opt_params lopt_params) → void
llama_opt_param_filter_all(Pointer<ggml_tensor> tensor, Pointer<Void> userdata) → bool: always returns true
llama_params_fit(Pointer<Char> path_model, Pointer<llama_model_params> mparams, Pointer<llama_context_params> cparams, Pointer<Float> tensor_split, Pointer<llama_model_tensor_buft_override> tensor_buft_overrides, int margin, int n_ctx_min, ggml_log_level log_level) → bool
llama_perf_context(Pointer<llama_context> ctx) → llama_perf_context_data
llama_perf_context_print(Pointer<llama_context> ctx) → void
llama_perf_context_reset(Pointer<llama_context> ctx) → void
llama_perf_sampler(Pointer<llama_sampler> chain) → llama_perf_sampler_data: NOTE: the following work only with samplers constructed via llama_sampler_chain_init
llama_perf_sampler_print(Pointer<llama_sampler> chain) → void
llama_perf_sampler_reset(Pointer<llama_sampler> chain) → void
llama_pooling_type$1(Pointer<llama_context> ctx) → llama_pooling_type
llama_print_system_info() → Pointer<Char>: Print system information
llama_rm_adapter_lora(Pointer<llama_context> ctx, Pointer<llama_adapter_lora> adapter) → int: Remove a specific LoRA adapter from given context Return -1 if the adapter is not present in the context
llama_sampler_accept(Pointer<llama_sampler> smpl, int token) → void
llama_sampler_apply(Pointer<llama_sampler> smpl, Pointer<llama_token_data_array> cur_p) → void
llama_sampler_chain_add(Pointer<llama_sampler> chain, Pointer<llama_sampler> smpl) → void: important: takes ownership of the sampler object and will free it when llama_sampler_free is called
llama_sampler_chain_default_params() → llama_sampler_chain_params
llama_sampler_chain_get(Pointer<llama_sampler> chain, int i) → Pointer<llama_sampler>
llama_sampler_chain_init(llama_sampler_chain_params params) → Pointer<llama_sampler>: llama_sampler_chain a type of llama_sampler that can chain multiple samplers one after another
llama_sampler_chain_n(Pointer<llama_sampler> chain) → int
llama_sampler_chain_remove(Pointer<llama_sampler> chain, int i) → Pointer<llama_sampler>: after removing a sampler, the chain will no longer own it, and it will not be freed when the chain is freed
llama_sampler_clone(Pointer<llama_sampler> smpl) → Pointer<llama_sampler>
llama_sampler_free(Pointer<llama_sampler> smpl) → void: important: do not free if the sampler has been added to a llama_sampler_chain (via llama_sampler_chain_add)
llama_sampler_get_seed(Pointer<llama_sampler> smpl) → int: Returns the seed used by the sampler if applicable, LLAMA_DEFAULT_SEED otherwise
llama_sampler_init(Pointer<llama_sampler_i> iface, llama_sampler_context_t ctx) → Pointer<llama_sampler>: mirror of llama_sampler_i:
llama_sampler_init_dist(int seed) → Pointer<llama_sampler>
llama_sampler_init_dry(Pointer<llama_vocab> vocab, int n_ctx_train, double dry_multiplier, double dry_base, int dry_allowed_length, int dry_penalty_last_n, Pointer<Pointer<Char>> seq_breakers, int num_breakers) → Pointer<llama_sampler>: @details DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677, porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
llama_sampler_init_grammar(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root) → Pointer<llama_sampler>: @details Intializes a GBNF grammar, see grammars/README.md for details. @param vocab The vocabulary that this grammar will be used with. @param grammar_str The production rules for the grammar, encoded as a string. Returns an empty grammar if empty. Returns NULL if parsing of grammar_str fails. @param grammar_root The name of the start symbol for the grammar.
llama_sampler_init_grammar_lazy(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer<Char>> trigger_words, int num_trigger_words, Pointer<llama_token> trigger_tokens, int num_trigger_tokens) → Pointer<llama_sampler>
llama_sampler_init_grammar_lazy_patterns(Pointer<llama_vocab> vocab, Pointer<Char> grammar_str, Pointer<Char> grammar_root, Pointer<Pointer<Char>> trigger_patterns, int num_trigger_patterns, Pointer<llama_token> trigger_tokens, int num_trigger_tokens) → Pointer<llama_sampler>: @details Lazy grammar sampler, introduced in https://github.com/ggml-org/llama.cpp/pull/9639 @param trigger_patterns A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group. @param trigger_tokens A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included.
llama_sampler_init_greedy() → Pointer<llama_sampler>: available samplers:
llama_sampler_init_infill(Pointer<llama_vocab> vocab) → Pointer<llama_sampler>: this sampler is meant to be used for fill-in-the-middle infilling it's supposed to be used after top_k + top_p sampling
llama_sampler_init_logit_bias(int n_vocab, int n_logit_bias, Pointer<llama_logit_bias> logit_bias) → Pointer<llama_sampler>
llama_sampler_init_min_p(double p, int min_keep) → Pointer<llama_sampler>: @details Minimum P sampling as described in https://github.com/ggml-org/llama.cpp/pull/3841
llama_sampler_init_mirostat(int n_vocab, int seed, double tau, double eta, int m) → Pointer<llama_sampler>: @details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words. @param candidates A vector of llama_token_data containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates. @param m The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.
llama_sampler_init_mirostat_v2(int seed, double tau, double eta) → Pointer<llama_sampler>: @details Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words. @param candidates A vector of llama_token_data containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.
llama_sampler_init_penalties(int penalty_last_n, double penalty_repeat, double penalty_freq, double penalty_present) → Pointer<llama_sampler>: NOTE: Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
llama_sampler_init_temp(double t) → Pointer<llama_sampler>: #details Updates the logits l_i` = l_i/t. When t <= 0.0f, the maximum logit is kept at it's original value, the rest are set to -inf
llama_sampler_init_temp_ext(double t, double delta, double exponent) → Pointer<llama_sampler>: @details Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
llama_sampler_init_top_k(int k) → Pointer<llama_sampler>: @details Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751 Setting k <= 0 makes this a noop
llama_sampler_init_top_n_sigma(double n) → Pointer<llama_sampler>: @details Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
llama_sampler_init_top_p(double p, int min_keep) → Pointer<llama_sampler>: @details Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
llama_sampler_init_typical(double p, int min_keep) → Pointer<llama_sampler>: @details Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
llama_sampler_init_xtc(double p, double t, int min_keep, int seed) → Pointer<llama_sampler>: @details XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
llama_sampler_name(Pointer<llama_sampler> smpl) → Pointer<Char>
llama_sampler_reset(Pointer<llama_sampler> smpl) → void
llama_sampler_sample(Pointer<llama_sampler> smpl, Pointer<llama_context> ctx, int idx) → int: @details Sample and accept a token from the idx-th output of the last evaluation
llama_save_session_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) → bool
llama_set_abort_callback(Pointer<llama_context> ctx, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data) → void: Set abort callback
llama_set_adapter_lora(Pointer<llama_context> ctx, Pointer<llama_adapter_lora> adapter, double scale) → int: Add a loaded LoRA adapter to given context This will not modify model's weight
llama_set_causal_attn(Pointer<llama_context> ctx, bool causal_attn) → void: Set whether to use causal attention or not If set to true, the model will only attend to the past tokens
llama_set_embeddings(Pointer<llama_context> ctx, bool embeddings) → void: Set whether the context outputs embeddings or not TODO: rename to avoid confusion with llama_get_embeddings()
llama_set_n_threads(Pointer<llama_context> ctx, int n_threads, int n_threads_batch) → void: Set the number of threads used for decoding n_threads is the number of threads used for generation (single token) n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
llama_set_state_data(Pointer<llama_context> ctx, Pointer<Uint8> src) → int
llama_set_warmup(Pointer<llama_context> ctx, bool warmup) → void: Set whether the model is in warmup mode or not If true, all model tensors are activated during llama_decode() to load and cache their weights.
llama_split_path(Pointer<Char> split_path, int maxlen, Pointer<Char> path_prefix, int split_no, int split_count) → int: @details Build a split GGUF final path for this chunk. llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) => split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf" Returns the split_path length.
llama_split_prefix(Pointer<Char> split_prefix, int maxlen, Pointer<Char> split_path, int split_no, int split_count) → int: @details Extract the path prefix from the split_path if and only if the split_no and split_count match. llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) => split_prefix = "/models/ggml-model-q4_0" Returns the split_prefix length.
llama_state_get_data(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size) → int: Copies the state to the specified destination address. Destination needs to have allocated enough memory. Returns the number of bytes copied
llama_state_get_size(Pointer<llama_context> ctx) → int: Returns the actual size in bytes of the state (logits, embedding and memory) Only use when saving the state, not when restoring it, otherwise the size may be too small.
llama_state_load_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → bool: Save/load session file
llama_state_save_file(Pointer<llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) → bool
llama_state_seq_get_data(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id) → int: Copy the state of a single sequence into the specified buffer
llama_state_seq_get_data_ext(Pointer<llama_context> ctx, Pointer<Uint8> dst, int size, int seq_id, int flags) → int
llama_state_seq_get_size(Pointer<llama_context> ctx, int seq_id) → int: Get the exact size needed to copy the state of a single sequence
llama_state_seq_get_size_ext(Pointer<llama_context> ctx, int seq_id, int flags) → int
llama_state_seq_load_file(Pointer<llama_context> ctx, Pointer<Char> filepath, int dest_seq_id, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → int
llama_state_seq_save_file(Pointer<llama_context> ctx, Pointer<Char> filepath, int seq_id, Pointer<llama_token> tokens, int n_token_count) → int
llama_state_seq_set_data(Pointer<llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id) → int: Copy the sequence data (originally copied with llama_state_seq_get_data) into the specified sequence Returns:
llama_state_seq_set_data_ext(Pointer<llama_context> ctx, Pointer<Uint8> src, int size, int dest_seq_id, int flags) → int
llama_state_set_data(Pointer<llama_context> ctx, Pointer<Uint8> src, int size) → int: Set the state reading from the specified address Returns the number of bytes read
llama_supports_gpu_offload() → bool
llama_supports_mlock() → bool
llama_supports_mmap() → bool
llama_supports_rpc() → bool
llama_synchronize(Pointer<llama_context> ctx) → void: Wait until all computations are finished This is automatically done when using one of the functions below to obtain the computation results and is not necessary to call it explicitly in most cases
llama_time_us() → int
llama_token_bos(Pointer<llama_vocab> vocab) → int
llama_token_cls(Pointer<llama_vocab> vocab) → int
llama_token_eos(Pointer<llama_vocab> vocab) → int
llama_token_eot(Pointer<llama_vocab> vocab) → int
llama_token_fim_mid(Pointer<llama_vocab> vocab) → int
llama_token_fim_pad(Pointer<llama_vocab> vocab) → int
llama_token_fim_pre(Pointer<llama_vocab> vocab) → int
llama_token_fim_rep(Pointer<llama_vocab> vocab) → int
llama_token_fim_sep(Pointer<llama_vocab> vocab) → int
llama_token_fim_suf(Pointer<llama_vocab> vocab) → int
llama_token_get_attr(Pointer<llama_vocab> vocab, Dartllama_token token) → llama_token_attr
llama_token_get_score(Pointer<llama_vocab> vocab, int token) → double
llama_token_get_text(Pointer<llama_vocab> vocab, int token) → Pointer<Char>
llama_token_is_control(Pointer<llama_vocab> vocab, int token) → bool
llama_token_is_eog(Pointer<llama_vocab> vocab, int token) → bool
llama_token_nl(Pointer<llama_vocab> vocab) → int
llama_token_pad(Pointer<llama_vocab> vocab) → int
llama_token_sep(Pointer<llama_vocab> vocab) → int
llama_token_to_piece(Pointer<llama_vocab> vocab, int token, Pointer<Char> buf, int length, int lstrip, bool special) → int: Token Id -> Piece. Uses the vocabulary in the provided context. Does not write null terminator to the buffer. User can skip up to 'lstrip' leading spaces before copying (useful when encoding/decoding multiple tokens with 'add_space_prefix') @param special If true, special tokens are rendered in the output.
llama_tokenize(Pointer<llama_vocab> vocab, Pointer<Char> text, int text_len, Pointer<llama_token> tokens, int n_tokens_max, bool add_special, bool parse_special) → int: @details Convert the provided text into tokens. @param tokens The tokens pointer must be large enough to hold the resulting tokens. @return Returns the number of tokens on success, no more than n_tokens_max @return Returns a negative number on failure - the number of tokens that would have been returned @return Returns INT32_MIN on overflow (e.g., tokenization result size exceeds int32_t limit) @param add_special Allow to add BOS and EOS tokens if model is configured to do so. @param parse_special Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
llama_vocab_bos(Pointer<llama_vocab> vocab) → int: Special tokens
llama_vocab_cls(Pointer<llama_vocab> vocab) → int: CLS is equivalent to BOS
llama_vocab_eos(Pointer<llama_vocab> vocab) → int
llama_vocab_eot(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_mid(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_pad(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_pre(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_rep(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_sep(Pointer<llama_vocab> vocab) → int
llama_vocab_fim_suf(Pointer<llama_vocab> vocab) → int
llama_vocab_get_add_bos(Pointer<llama_vocab> vocab) → bool
llama_vocab_get_add_eos(Pointer<llama_vocab> vocab) → bool
llama_vocab_get_add_sep(Pointer<llama_vocab> vocab) → bool
llama_vocab_get_attr(Pointer<llama_vocab> vocab, Dartllama_token token) → llama_token_attr
llama_vocab_get_score(Pointer<llama_vocab> vocab, int token) → double
llama_vocab_get_text(Pointer<llama_vocab> vocab, int token) → Pointer<Char>: Vocab
llama_vocab_is_control(Pointer<llama_vocab> vocab, int token) → bool: Identify if Token Id is a control token or a render-able token
llama_vocab_is_eog(Pointer<llama_vocab> vocab, int token) → bool: Check if the token is supposed to end generation (end-of-generation, eg. EOS, EOT, etc.)
llama_vocab_mask(Pointer<llama_vocab> vocab) → int
llama_vocab_n_tokens(Pointer<llama_vocab> vocab) → int
llama_vocab_nl(Pointer<llama_vocab> vocab) → int
llama_vocab_pad(Pointer<llama_vocab> vocab) → int
llama_vocab_sep(Pointer<llama_vocab> vocab) → int
llama_vocab_type$1(Pointer<llama_vocab> vocab) → llama_vocab_type
perror(Pointer<Char> _ErrorMessage) → void
putc(int _Character, Pointer<FILE> _Stream) → int
putchar(int _Character) → int
puts(Pointer<Char> _Buffer) → int
putw(int _Ch, Pointer<FILE> _Stream) → int
putwc(int _Character, Pointer<FILE> _Stream) → int
putwchar(int _Character) → int
remove(Pointer<Char> _FileName) → int
rename(Pointer<Char> _OldFileName, Pointer<Char> _NewFileName) → int
rewind(Pointer<FILE> _Stream) → void
rmtmp() → int
setbuf(Pointer<FILE> _Stream, Pointer<Char> _Buffer) → void
setvbuf(Pointer<FILE> _Stream, Pointer<Char> _Buffer, int _Mode, int _Size) → int
tempnam(Pointer<Char> _Directory, Pointer<Char> _FilePrefix) → Pointer<Char>
tmpfile() → Pointer<FILE>
tmpfile_s(Pointer<Pointer<FILE>> _Stream) → int
tmpnam(Pointer<Char> _Buffer) → Pointer<Char>
tmpnam_s(Pointer<Char> _Buffer, int _Size) → int
ungetc(int _Character, Pointer<FILE> _Stream) → int
ungetwc(int _Character, Pointer<FILE> _Stream) → int
unlink(Pointer<Char> _FileName) → int

Typedefs

Dart__time32_t = int
Dart__time64_t = int
Darterrno_t = int
Dartfpos_t = int
Dartggml_abort_callback_tFunction = void Function(Pointer<Char> error_message)
Dartggml_abort_callbackFunction = bool Function(Pointer<Void> data)
Dartggml_backend_eval_callbackFunction = bool Function(int node_index, Pointer<ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data)
Dartggml_backend_sched_eval_callbackFunction = bool Function(Pointer<ggml_tensor> t, bool ask, Pointer<Void> user_data)
Dartggml_backend_set_abort_callback_tFunction = void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data)
Dartggml_backend_set_n_threads_tFunction = void Function(ggml_backend_t backend, int n_threads)
Dartggml_backend_split_buffer_type_tFunction = ggml_backend_buffer_type_t Function(int main_device, Pointer<Float> tensor_split)
Dartggml_custom1_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom2_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom3_op_tFunction = void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, int ith, int nth, Pointer<Void> userdata)
Dartggml_custom_op_tFunction = void Function(Pointer<ggml_tensor> dst, int ith, int nth, Pointer<Void> userdata)
Dartggml_fp16_t = int
Dartggml_from_float_tFunction = void Function(Pointer<Float> x, Pointer<Void> y, int k)
Dartggml_log_callbackFunction = void Function(ggml_log_level level, Pointer<Char> text, Pointer<Void> user_data)
Dartggml_opt_epoch_callbackFunction = void Function(bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, int ibatch, int ibatch_max, int t_start_us)
Dartggml_to_float_tFunction = void Function(Pointer<Void> x, Pointer<Float> y, int k)
Dartggml_vec_dot_tFunction = void Function(int n, Pointer<Float> s, int bs, Pointer<Void> x, int bx, Pointer<Void> y, int by, int nrc)
Dartint_fast16_t = int
Dartint_fast32_t = int
Dartint_fast64_t = int
Dartint_fast8_t = int
Dartint_least16_t = int
Dartint_least32_t = int
Dartint_least64_t = int
Dartint_least8_t = int
Dartintmax_t = int
DartLcppOnAbortCallbackFunction = void Function(int)
DartLcppOnCancelCallbackFunction = void Function(int)
Dartllama_opt_param_filterFunction = bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
Dartllama_pos = int
Dartllama_progress_callbackFunction = bool Function(double progress, Pointer<Void> user_data)
Dartllama_seq_id = int
Dartllama_state_seq_flags = int
Dartllama_token = int
DartLppChatMessageCallbackFunction = void Function(Pointer<lcpp_common_chat_msg_t>)
DartLppProgressCallbackFunction = void Function(double)
DartLppTokenStreamCallbackFunction = void Function(Pointer<LcppTextStruct_t>)
Dartptrdiff_t = int
Dartrsize_t = int
Dartuint_fast16_t = int
Dartuint_fast32_t = int
Dartuint_fast64_t = int
Dartuint_fast8_t = int
Dartuint_least16_t = int
Dartuint_least32_t = int
Dartuint_least64_t = int
Dartuint_least8_t = int
Dartuintmax_t = int
Dartwctype_t = int
Dartwint_t = int
errno_t = Int
FILE = _iobuf
fpos_t = LongLong
ggml_abort_callback = Pointer<NativeFunction<ggml_abort_callbackFunction>>: Abort callback If not NULL, called before ggml computation If it returns true, the computation is aborted
ggml_abort_callback_t = Pointer<NativeFunction<ggml_abort_callback_tFunction>>: Function type used in fatal error callbacks
ggml_abort_callback_tFunction = Void Function(Pointer<Char> error_message)
ggml_abort_callbackFunction = Bool Function(Pointer<Void> data)
ggml_backend_buffer_t = Pointer<ggml_backend_buffer>
ggml_backend_buffer_type_t = Pointer<ggml_backend_buffer_type>
ggml_backend_dev_get_extra_bufts_t = Pointer<NativeFunction<ggml_backend_dev_get_extra_bufts_tFunction>>: Get additional buffer types provided by the device (returns a NULL-terminated array)
ggml_backend_dev_get_extra_bufts_tFunction = Pointer<ggml_backend_buffer_type_t> Function(ggml_backend_dev_t device)
ggml_backend_dev_t = Pointer<ggml_backend_device>
ggml_backend_eval_callback = Pointer<NativeFunction<ggml_backend_eval_callbackFunction>>
ggml_backend_eval_callbackFunction = Bool Function(Int node_index, Pointer<ggml_tensor> t1, Pointer<ggml_tensor> t2, Pointer<Void> user_data)
ggml_backend_event_t = Pointer<ggml_backend_event>
ggml_backend_get_features_t = Pointer<NativeFunction<ggml_backend_get_features_tFunction>>
ggml_backend_get_features_tFunction = Pointer<ggml_backend_feature> Function(ggml_backend_reg_t reg)
ggml_backend_graph_plan_t = Pointer<Void>
ggml_backend_reg_t = Pointer<ggml_backend_reg>
ggml_backend_sched_eval_callback = Pointer<NativeFunction<ggml_backend_sched_eval_callbackFunction>>: Evaluation callback for each node in the graph (set with ggml_backend_sched_set_eval_callback) when ask == true, the scheduler wants to know if the user wants to observe this node this allows the scheduler to batch nodes together in order to evaluate them in a single call
ggml_backend_sched_eval_callbackFunction = Bool Function(Pointer<ggml_tensor> t, Bool ask, Pointer<Void> user_data)
ggml_backend_sched_t = Pointer<ggml_backend_sched>: The backend scheduler allows for multiple backend devices to be used together Handles compute buffer allocation, assignment of tensors to backends, and copying of tensors between backends The backends are selected based on:
ggml_backend_set_abort_callback_t = Pointer<NativeFunction<ggml_backend_set_abort_callback_tFunction>>: Set the abort callback for the backend
ggml_backend_set_abort_callback_tFunction = Void Function(ggml_backend_t backend, ggml_abort_callback abort_callback, Pointer<Void> abort_callback_data)
ggml_backend_set_n_threads_t = Pointer<NativeFunction<ggml_backend_set_n_threads_tFunction>>: Set the number of threads for the backend
ggml_backend_set_n_threads_tFunction = Void Function(ggml_backend_t backend, Int n_threads)
ggml_backend_split_buffer_type_t = Pointer<NativeFunction<ggml_backend_split_buffer_type_tFunction>>: Split buffer type for tensor parallelism
ggml_backend_split_buffer_type_tFunction = ggml_backend_buffer_type_t Function(Int main_device, Pointer<Float> tensor_split)
ggml_backend_t = Pointer<ggml_backend>
ggml_custom1_op_t = Pointer<NativeFunction<ggml_custom1_op_tFunction>>: custom operators
ggml_custom1_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom2_op_t = Pointer<NativeFunction<ggml_custom2_op_tFunction>>
ggml_custom2_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom3_op_t = Pointer<NativeFunction<ggml_custom3_op_tFunction>>
ggml_custom3_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Pointer<ggml_tensor> a, Pointer<ggml_tensor> b, Pointer<ggml_tensor> c, Int ith, Int nth, Pointer<Void> userdata)
ggml_custom_op_t = Pointer<NativeFunction<ggml_custom_op_tFunction>>
ggml_custom_op_tFunction = Void Function(Pointer<ggml_tensor> dst, Int ith, Int nth, Pointer<Void> userdata)
ggml_fp16_t = Uint16: ieee 754-2008 half-precision float16 todo: make this not an integral type
ggml_from_float_t = Pointer<NativeFunction<ggml_from_float_tFunction>>
ggml_from_float_tFunction = Void Function(Pointer<Float> x, Pointer<Void> y, Int64 k)
ggml_gallocr_t = Pointer<ggml_gallocr>: special tensor flags for use with the graph allocator: ggml_set_input(): all input tensors are allocated at the beginning of the graph in non-overlapping addresses ggml_set_output(): output tensors are never freed and never overwritten
ggml_guid_t = Pointer<Pointer<Uint8>>
ggml_log_callback = Pointer<NativeFunction<ggml_log_callbackFunction>>: TODO these functions were sandwiched in the old optimization interface, is there a better place for them?
ggml_log_callbackFunction = Void Function(UnsignedInt level, Pointer<Char> text, Pointer<Void> user_data)
ggml_opt_context_t = Pointer<ggml_opt_context>
ggml_opt_dataset_t = Pointer<ggml_opt_dataset>
ggml_opt_epoch_callback = Pointer<NativeFunction<ggml_opt_epoch_callbackFunction>>: signature for a callback while evaluating opt_ctx on dataset, called after an evaluation
ggml_opt_epoch_callbackFunction = Void Function(Bool train, ggml_opt_context_t opt_ctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result, Int64 ibatch, Int64 ibatch_max, Int64 t_start_us)
ggml_opt_get_optimizer_params = Pointer<NativeFunction<ggml_opt_get_optimizer_paramsFunction>>: callback to calculate optimizer parameters prior to a backward pass userdata can be used to pass arbitrary data
ggml_opt_get_optimizer_paramsFunction = ggml_opt_optimizer_params Function(Pointer<Void> userdata)
ggml_opt_result_t = Pointer<ggml_opt_result>
ggml_threadpool_t = Pointer<ggml_threadpool>
ggml_to_float_t = Pointer<NativeFunction<ggml_to_float_tFunction>>
ggml_to_float_tFunction = Void Function(Pointer<Void> x, Pointer<Float> y, Int64 k)
ggml_vec_dot_t = Pointer<NativeFunction<ggml_vec_dot_tFunction>>: Internal types and functions exposed for tests and benchmarks
ggml_vec_dot_tFunction = Void Function(Int n, Pointer<Float> s, Size bs, Pointer<Void> x, Size bx, Pointer<Void> y, Size by, Int nrc)
int_fast16_t = Int
int_fast32_t = Int
int_fast64_t = LongLong
int_fast8_t = SignedChar
int_least16_t = Short
int_least32_t = Int
int_least64_t = LongLong
int_least8_t = SignedChar
intmax_t = LongLong
lcpp_common_chat_msg_content_part_t = lcpp_common_chat_msg_content_part
lcpp_common_chat_msg_t = lcpp_common_chat_msg
lcpp_common_chat_tool_call_t = lcpp_common_chat_tool_call
lcpp_common_chat_tool_t = lcpp_common_chat_tool
lcpp_cpu_info_t = lcpp_cpu_info
lcpp_data_pvalue_t = lcpp_data_pvalue
lcpp_gpu_info_t = lcpp_gpu_info
lcpp_machine_info_t = lcpp_machine_info
lcpp_memory_info_t = lcpp_memory_info
lcpp_model_filepath_t = lcpp_model_filepath
lcpp_model_info_t = lcpp_model_info
lcpp_model_mem_t = lcpp_model_mem
lcpp_model_rt_t = lcpp_model_rt
lcpp_params_t = lcpp_params
lcpp_sampling_params_t = lcpp_sampling_params
lcpp_system_info_t = lcpp_system_info
LcppOnAbortCallback = Pointer<NativeFunction<LcppOnAbortCallbackFunction>>
LcppOnAbortCallbackFunction = Void Function(Int32)
LcppOnCancelCallback = Pointer<NativeFunction<LcppOnCancelCallbackFunction>>
LcppOnCancelCallbackFunction = Void Function(Int32)
LcppTextStruct_t = LcppTextStruct
llama_context_params_t = llama_context_params
llama_memory_t = Pointer<llama_memory_i>
llama_opt_param_filter = Pointer<NativeFunction<llama_opt_param_filterFunction>>: function that returns whether or not a given tensor contains trainable parameters
llama_opt_param_filterFunction = Bool Function(Pointer<ggml_tensor> tensor, Pointer<Void> userdata)
llama_pos = Int32
llama_progress_callback = Pointer<NativeFunction<llama_progress_callbackFunction>>
llama_progress_callbackFunction = Bool Function(Float progress, Pointer<Void> user_data)
llama_sampler_context_t = Pointer<Void>: Sampling API
llama_seq_id = Int32
llama_state_seq_flags = Uint32
llama_token = Int32
LppChatMessageCallback = Pointer<NativeFunction<LppChatMessageCallbackFunction>>
LppChatMessageCallbackFunction = Void Function(Pointer<lcpp_common_chat_msg_t>)
LppProgressCallback = Pointer<NativeFunction<LppProgressCallbackFunction>>
LppProgressCallbackFunction = Void Function(Double)
LppTokenStreamCallback = Pointer<NativeFunction<LppTokenStreamCallbackFunction>>
LppTokenStreamCallbackFunction = Void Function(Pointer<LcppTextStruct_t>)
mbstate_t = _Mbstatet
ptrdiff_t = LongLong
ReasoningFamily = ({int family, bool reasoning})
rsize_t = Size
time_t = __time64_t
uint_fast16_t = UnsignedInt
uint_fast32_t = UnsignedInt
uint_fast64_t = UnsignedLongLong
uint_fast8_t = UnsignedChar
uint_least16_t = UnsignedShort
uint_least32_t = UnsignedInt
uint_least64_t = UnsignedLongLong
uint_least8_t = UnsignedChar
uintmax_t = UnsignedLongLong
va_list = Pointer<Char>
wctype_t = UnsignedShort
wint_t = UnsignedShort

Exceptions / Errors

LlamaException: A custom exception class for handling errors specific to the Llama application.

lcpp_ngin library

Classes

Enums

Constants

Properties

Functions

Typedefs

Exceptions / Errors

lcpp_ngin package

lcpp_ngin library