dart_cuda - Dart API docs

🚀 G-Tensor: High-Performance Dart & CUDA Deep Learning EngineG-Tensor is a custom deep learning framework that combines the developer productivity of Dart with the raw computational power of NVIDIA CUDA.Unlike standard wrappers, G-Tensor features a custom Autoregressive Functional Transformer (AFT) implementation, a manual Autograd engine, and hand-optimized CUDA kernels for operations like Causal Masking, Layer Normalization, and Cross-Entropy with Label Smoothing.🏗 System ArchitectureThe engine is split into three distinct layers:Dart API (Frontend): High-level Tensor class with operator overloading (+, -, *, matmul) and Module classes for building neural networks.FFI Bridge: A low-level Dart FFI (Foreign Function Interface) layer that handles memory addresses and dispatches calls to compiled C++/CUDA binaries.CUDA Kernels (Backend): Hand-written .cu kernels optimized for parallel execution on the GPU, featuring custom broadcasting logic and stable gradient calculations.🧠 The AFT Causal Mechanism (Mathematical Derivation)The core of this engine is the Attention Free Transformer (AFT). Unlike standard Multi-Head Attention which has $O(T^2)$ complexity, AFT reduces this to $O(Td)$ by re-arranging the interaction between Queries, Keys, and Values.The FormulationIn your implementation, the attention-like operation is defined as:$$Z_t = \sigma(Q_t) \odot \frac{\sum_{i=1}^t \exp(K_i + w_{t,i}) \odot V_i}{\sum_{i=1}^t \exp(K_i + w_{t,i})}$$Where:$\sigma$ is the Sigmoid activation.$\odot$ is the Element-wise (Hadamard) product (successfully verified in our test_tensor2.dart).$w_{t,i}$ represents the Learned Pairwise Position Bias.The Causal MaskTo ensure the model cannot "cheat" by looking at future tokens, we apply a triangular causal mask. In G-Tensor, this is handled by a specialized engine.mulTensors call during the forward pass, ensuring that for any time $t$, the gradients from $t+1 \dots T$ are exactly zero.✨ Key FeaturesCustom Autograd: Fully functional backpropagation through computational graphs.Efficient Memory Management: Explicit tracker and dispose system to prevent VRAM leaks in Dart's garbage-collected environment.Broadcasting: Support for adding row-vector biases to activation matrices via custom CUDA indexing (e.g., adding 1, 128 bias to 64, 128 activations).Advanced Loss Kernels: Stable Cross-Entropy with built-in LogSoftmax and Label Smoothing ($\epsilon = 0.1$).🛠 Installation & SetupPrerequisitesDart SDK (v3.0+)NVIDIA CUDA Toolkit (v11.0+)CMake (for building the C++ backend)Building the BackendNavigate to the src directory.Compile the CUDA shared library:Bashmkdir build && cd build cmake .. make Ensure the generated .so or .dll is in your LD_LIBRARY_PATH.💻 Usage Example1. Training with Memory ManagementBecause Dart is garbage collected but CUDA memory is not, you must use the tracker pattern:Dartfor (int step = 0; step < 1000; step++) { List

// Forward pass final logits = gpt.forward(inputIdx, dummyEnc, tracker); final loss = logits.crossEntropy(targetIds);

// Backward pass loss.backward(); optimizer.step();

// CLEANUP: Free intermediate tensors to prevent CUDA OOM for (var t in tracker) { if (!gpt.parameters().contains(t)) t.dispose(); } loss.dispose(); } 2. Autoregressive GenerationThe engine supports greedy and nucleus sampling for text generation:Dartvoid generate(String prompt) { List

// Fetch only the last row for prediction List

Libraries

adam
aft
aft_cross_attention
aft_multi_head_attention
aft_multi_head_cross_attention
aft_muzero_transformer_decoder
aft_text_decoder_block
aft_transformer_decoder
aft_transformer_decoder_block
aft_transformer_encoder
aft_transformer_encoder_block
aft_vit_backbone
aft_vit_face_embeding
apps/face_embeddings
apps/face_training
apps/images
apps/triplet_loader
apps/triplet_loader2
audio_transformer
chess/mcts
chess/uci
core/engine
core/matrix
core/tensor
dart_cuda
dataset/chess
dataset/dataset
example_audio_video
feed_forward
gpu_tensor
hungarian_algorithm
layer_norm
main_face_gpu
mlp
mlp2
mlp3
mlp_learn
mu_zero/example
mu_zero/example2
mu_zero/example3
mu_zero/mu_zero_greedy_agent2
mu_zero/muzero_greedy_agent
mu_zero/shakespear_example
mu_zero/training
multi_modal_transformer
multi_modal_transformer2
multi_modal_trnasformer_encoder
network_utils
nn
nn/conv_2d
open_cv/open_cv
optimizers/cross_entropy
optimizers/stochastic_grad_desc
overfit
persistence
tests/tensor/mat_mul
tests/test_tensor
tests/test_tensor2
text_decoder
text_transformer
train_xor
train_xor_2
train_xor_3
triplet_loss
video_transformer
vit_object_detector

Libraries

dart_cuda package