AubAiBindings class
Bindings for src/aub_ai.h
.
Regenerate bindings with dart run ffigen --config ffigen.yaml
.
Constructors
- AubAiBindings(DynamicLibrary dynamicLibrary)
-
The symbols are looked up in
dynamicLibrary
. -
AubAiBindings.fromLookup(Pointer<
T> lookup<T extends NativeType>(String symbolName) ) -
The symbols are looked up with
lookup
.
Properties
- hashCode → int
-
The hash code for this object.
no setterinherited
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
Methods
-
llama_add_bos_token(
Pointer< llama_model> model) → int - Returns -1 if unknown, 1 for true or 0 for false.
-
llama_add_eos_token(
Pointer< llama_model> model) → int - Returns -1 if unknown, 1 for true or 0 for false.
-
llama_apply_lora_from_file(
Pointer< llama_context> ctx, Pointer<Char> path_lora, double scale, Pointer<Char> path_base_model, int n_threads) → int - Apply a LoRA adapter to a loaded model path_base_model is the path to a higher quality model to use as a base for the layers modified by the adapter. Can be NULL to use the current loaded model. The model needs to be reloaded before applying a new adapter, otherwise the adapter will be applied on top of the previous one Returns 0 on success
-
llama_backend_free(
) → void - Call once at the end of the program - currently only used for MPI
-
llama_backend_init(
bool numa) → void - Initialize the llama + ggml backend If numa is true, use NUMA optimizations Call once at the start of the program
-
llama_batch_free(
llama_batch batch) → void - Frees a batch of tokens allocated with llama_batch_init()
-
llama_batch_get_one(
Pointer< llama_token> tokens, int n_tokens, int pos_0, int seq_id) → llama_batch - Return batch for single sequence of tokens starting at pos_0
-
llama_batch_init(
int n_tokens, int embd, int n_seq_max) → llama_batch - Allocates a batch of tokens on the heap that can hold a maximum of n_tokens Each token can be assigned up to n_seq_max sequence ids The batch has to be freed with llama_batch_free() If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float) Otherwise, llama_batch.token will be allocated to store n_tokens llama_token The rest of the llama_batch members are allocated with size n_tokens All members are left uninitialized
-
llama_beam_search(
Pointer< llama_context> ctx, llama_beam_search_callback_fn_t callback, Pointer<Void> callback_data, int n_beams, int n_past, int n_predict) → void - @details Deterministically returns entire sentence constructed by a beam search. @param ctx Pointer to the llama_context. @param callback Invoked for each iteration of the beam_search loop, passing in beams_state. @param callback_data A pointer that is simply passed back to callback. @param n_beams Number of beams to use. @param n_past Number of tokens already evaluated. @param n_predict Maximum number of tokens to predict. EOS may occur earlier.
-
llama_context_default_params(
) → llama_context_params -
llama_copy_state_data(
Pointer< llama_context> ctx, Pointer<Uint8> dst) → int - Copies the state to the specified destination address. Destination needs to have allocated enough memory. Returns the number of bytes copied
-
llama_decode(
Pointer< llama_context> ctx, llama_batch batch) → int - Positive return values does not mean a fatal error, but rather a warning. 0 - success 1 - could not find a KV slot for the batch (try reducing the size of the batch or increase the context) < 0 - error
-
llama_dump_timing_info_yaml(
Pointer< FILE> stream, Pointer<llama_context> ctx) → void -
llama_eval(
Pointer< llama_context> ctx, Pointer<llama_token> tokens, int n_tokens, int n_past) → int - Run the llama inference to obtain the logits and probabilities for the next token(s). tokens + n_tokens is the provided batch of new tokens to process n_past is the number of tokens to use from previous eval calls Returns 0 on success DEPRECATED: use llama_decode() instead
-
llama_eval_embd(
Pointer< llama_context> ctx, Pointer<Float> embd, int n_tokens, int n_past) → int - Same as llama_eval, but use float matrix input directly. DEPRECATED: use llama_decode() instead
-
llama_free(
Pointer< llama_context> ctx) → void - Frees all allocated memory
-
llama_free_model(
Pointer< llama_model> model) → void -
llama_get_embeddings(
Pointer< llama_context> ctx) → Pointer<Float> -
Get the embeddings for the input
shape:
n_embd
(1-dimensional) -
llama_get_kv_cache_token_count(
Pointer< llama_context> ctx) → int - Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times
-
llama_get_kv_cache_used_cells(
Pointer< llama_context> ctx) → int - Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
-
llama_get_logits(
Pointer< llama_context> ctx) → Pointer<Float> -
Token logits obtained from the last call to llama_eval()
The logits for the last token are stored in the last row
Logits for which llama_batch.logits
i
== 0 are undefined Rows: n_tokens provided with llama_batch Cols: n_vocab -
llama_get_logits_ith(
Pointer< llama_context> ctx, int i) → Pointer<Float> - Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
-
llama_get_model(
Pointer< llama_context> ctx) → Pointer<llama_model> -
llama_get_model_tensor(
Pointer< llama_model> model, Pointer<Char> name) → Pointer<ggml_tensor> - Get a llama model tensor
-
llama_get_state_size(
Pointer< llama_context> ctx) → int - Returns the maximum size in bytes of the state (rng, logits, embedding and kv_cache) - will often be smaller after compacting tokens
-
llama_get_timings(
Pointer< llama_context> ctx) → llama_timings - Performance information
-
llama_grammar_accept_token(
Pointer< llama_context> ctx, Pointer<llama_grammar> grammar, int token) → void - @details Accepts the sampled token into the grammar
-
llama_grammar_copy(
Pointer< llama_grammar> grammar) → Pointer<llama_grammar> -
llama_grammar_free(
Pointer< llama_grammar> grammar) → void -
llama_grammar_init(
Pointer< Pointer< rules, int n_rules, int start_rule_index) → Pointer<llama_grammar_element> >llama_grammar> - Grammar
-
llama_kv_cache_clear(
Pointer< llama_context> ctx) → void - Clear the KV cache
-
llama_kv_cache_seq_cp(
Pointer< llama_context> ctx, int seq_id_src, int seq_id_dst, int p0, int p1) → void -
Copy all tokens that belong to the specified sequence to another sequence
Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
p0 < 0 :
0, p1
p1 < 0 : [p0, inf) -
llama_kv_cache_seq_keep(
Pointer< llama_context> ctx, int seq_id) → void - Removes all tokens that do not belong to the specified sequence
-
llama_kv_cache_seq_rm(
Pointer< llama_context> ctx, int seq_id, int p0, int p1) → void -
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
seq_id < 0 : match any sequence
p0 < 0 :
0, p1
p1 < 0 : [p0, inf) -
llama_kv_cache_seq_shift(
Pointer< llama_context> ctx, int seq_id, int p0, int p1, int delta) → void -
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1)
If the KV cache is RoPEd, the KV data is updated accordingly
p0 < 0 :
0, p1
p1 < 0 : [p0, inf) -
llama_kv_cache_view_free(
Pointer< llama_kv_cache_view> view) → void - Free a KV cache view. (use only for debugging purposes)
-
llama_kv_cache_view_init(
Pointer< llama_context> ctx, int n_max_seq) → llama_kv_cache_view - Create an empty KV cache view. (use only for debugging purposes)
-
llama_kv_cache_view_update(
Pointer< llama_context> ctx, Pointer<llama_kv_cache_view> view) → void - Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
-
llama_load_model_from_file(
Pointer< Char> path_model, llama_model_params params) → Pointer<llama_model> -
llama_load_session_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens_out, int n_token_capacity, Pointer<Size> n_token_count_out) → bool - Save/load session file
-
llama_log_set(
ggml_log_callback log_callback, Pointer< Void> user_data) → void - Set callback for all future logging events. If this is not called, or NULL is supplied, everything is output on stderr.
-
llama_max_devices(
) → int -
llama_mlock_supported(
) → bool -
llama_mmap_supported(
) → bool -
llama_model_apply_lora_from_file(
Pointer< llama_model> model, Pointer<Char> path_lora, double scale, Pointer<Char> path_base_model, int n_threads) → int -
llama_model_default_params(
) → llama_model_params - Helpers for getting default parameters
-
llama_model_desc(
Pointer< llama_model> model, Pointer<Char> buf, int buf_size) → int - Get a string describing the model type
-
llama_model_meta_count(
Pointer< llama_model> model) → int - Get the number of metadata key/value pairs
-
llama_model_meta_key_by_index(
Pointer< llama_model> model, int i, Pointer<Char> buf, int buf_size) → int - Get metadata key name by index
-
llama_model_meta_val_str(
Pointer< llama_model> model, Pointer<Char> key, Pointer<Char> buf, int buf_size) → int - Get metadata value as a string by key name
-
llama_model_meta_val_str_by_index(
Pointer< llama_model> model, int i, Pointer<Char> buf, int buf_size) → int - Get metadata value as a string by index
-
llama_model_n_params(
Pointer< llama_model> model) → int - Returns the total number of parameters in the model
-
llama_model_quantize(
Pointer< Char> fname_inp, Pointer<Char> fname_out, Pointer<llama_model_quantize_params> params) → int - Returns 0 on success
-
llama_model_quantize_default_params(
) → llama_model_quantize_params -
llama_model_size(
Pointer< llama_model> model) → int - Returns the total size of all the tensors in the model in bytes
-
llama_n_ctx(
Pointer< llama_context> ctx) → int -
llama_n_ctx_train(
Pointer< llama_model> model) → int -
llama_n_embd(
Pointer< llama_model> model) → int -
llama_n_vocab(
Pointer< llama_model> model) → int -
llama_new_context_with_model(
Pointer< llama_model> model, llama_context_params params) → Pointer<llama_context> -
llama_print_system_info(
) → Pointer< Char> - Print system information
-
llama_print_timings(
Pointer< llama_context> ctx) → void -
llama_reset_timings(
Pointer< llama_context> ctx) → void -
llama_rope_freq_scale_train(
Pointer< llama_model> model) → double - Get the model's RoPE frequency scaling factor
-
llama_sample_classifier_free_guidance(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, Pointer<llama_context> guidance_ctx, double scale) → void -
@details Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
@param candidates A vector of
llama_token_data
containing the candidate tokens, the logits must be directly extracted from the original generation context without being sorted. @params guidance_ctx A separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context. @params scale Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance. -
llama_sample_grammar(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, Pointer<llama_grammar> grammar) → void - @details Apply constraints from grammar
-
llama_sample_min_p(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double p, int min_keep) → void - @details Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
-
llama_sample_repetition_penalties(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, Pointer<llama_token> last_tokens, int penalty_last_n, double penalty_repeat, double penalty_freq, double penalty_present) → void - @details Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix. @details Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
-
llama_sample_softmax(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates) → void - @details Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
-
llama_sample_tail_free(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double z, int min_keep) → void - @details Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
-
llama_sample_temp(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double temp) → void -
llama_sample_temperature(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double temp) → void -
llama_sample_token(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates) → int - @details Randomly selects a token from the candidates based on their probabilities.
-
llama_sample_token_greedy(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates) → int - @details Selects the token with the highest probability. Does not compute the token probabilities. Use llama_sample_softmax() instead.
-
llama_sample_token_mirostat(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double tau, double eta, int m, Pointer<Float> mu) → int -
@details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
@param candidates A vector of
llama_token_data
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to updatemu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemu
to be updated more quickly, while a smaller learning rate will result in slower updates. @param m The number of tokens considered in the estimation ofs_hat
. This is an arbitrary value that is used to calculates_hat
, which in turn helps to calculate the value ofk
. In the paper, they usem = 100
, but you can experiment with different values to see how it affects the performance of the algorithm. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal. -
llama_sample_token_mirostat_v2(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double tau, double eta, Pointer<Float> mu) → int -
@details Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
@param candidates A vector of
llama_token_data
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text. @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text. @param eta The learning rate used to updatemu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemu
to be updated more quickly, while a smaller learning rate will result in slower updates. @param mu Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal. -
llama_sample_top_k(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, int k, int min_keep) → void - @details Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
-
llama_sample_top_p(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double p, int min_keep) → void - @details Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
-
llama_sample_typical(
Pointer< llama_context> ctx, Pointer<llama_token_data_array> candidates, double p, int min_keep) → void - @details Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
-
llama_save_session_file(
Pointer< llama_context> ctx, Pointer<Char> path_session, Pointer<llama_token> tokens, int n_token_count) → bool -
llama_set_n_threads(
Pointer< llama_context> ctx, int n_threads, int n_threads_batch) → void - Set the number of threads used for decoding n_threads is the number of threads used for generation (single token) n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
-
llama_set_rng_seed(
Pointer< llama_context> ctx, int seed) → void - Sets the current rng seed.
-
llama_set_state_data(
Pointer< llama_context> ctx, Pointer<Uint8> src) → int - Set the state reading from the specified address Returns the number of bytes read
-
llama_time_us(
) → int -
llama_token_bos(
Pointer< llama_model> model) → int - Special tokens
-
llama_token_eos(
Pointer< llama_model> model) → int -
llama_token_eot(
Pointer< llama_model> model) → int -
llama_token_get_score(
Pointer< llama_model> model, int token) → double -
llama_token_get_text(
Pointer< llama_model> model, int token) → Pointer<Char> - Vocab
-
llama_token_get_type(
Pointer< llama_model> model, int token) → int -
llama_token_middle(
Pointer< llama_model> model) → int -
llama_token_nl(
Pointer< llama_model> model) → int -
llama_token_prefix(
Pointer< llama_model> model) → int - codellama infill tokens
-
llama_token_suffix(
Pointer< llama_model> model) → int -
llama_token_to_piece(
Pointer< llama_model> model, int token, Pointer<Char> buf, int length) → int - Token Id -> Piece. Uses the vocabulary in the provided context. Does not write null terminator to the buffer. User code is responsible to remove the leading whitespace of the first non-BOS token when decoding multiple tokens.
-
llama_tokenize(
Pointer< llama_model> model, Pointer<Char> text, int text_len, Pointer<llama_token> tokens, int n_max_tokens, bool add_bos, bool special) → int - @details Convert the provided text into tokens. @param tokens The tokens pointer must be large enough to hold the resulting tokens. @return Returns the number of tokens on success, no more than n_max_tokens @return Returns a negative number on failure - the number of tokens that would have been returned @param special Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
-
llama_vocab_type1(
Pointer< llama_model> model) → int -
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited