dart_tensor_preprocessing 0.8.0
dart_tensor_preprocessing: ^0.8.0 copied to clipboard
High-performance tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime inference.
Changelog #
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.8.0 - 2026-02-15 #
Added #
-
CoordinateTransformModeenum - ONNX-compatible coordinate transformation modes forResizeOp:halfPixel- PyTorch default ((x + 0.5) * scale - 0.5)alignCorners- PyTorchalign_corners=Trueasymmetric- TensorFlow default (x * inSize / outSize)pytorchHalfPixel- Same as halfPixel but maps to 0 when outSize == 1- New
coordinateModeparameter onResizeOp(backward compatible with existingalignCornersbool)
-
OperationCapabilitiesexpanded - 5 new metadata fields for framework compatibility:supportsBroadcast- Whether the operation supports tensor broadcastingsupportedDTypes- Set of supported data types (default:{float32, float64})pytorchEquivalent- Equivalent PyTorch operation nameonnxOpType- Equivalent ONNX operator typeonnxOpsetVersion- Minimum ONNX opset version required
Changed #
- File decomposition - Large operation files split into focused modules:
activation_op.dart(1067 lines) →activation/subdirectory with 7 focused files:relu_ops.dart(ReLUOp, LeakyReLUOp)sigmoid_ops.dart(SigmoidOp, HardsigmoidOp, TanhOp)softmax_op.dart(SoftmaxOp)gelu_op.dart(GELUOp)swish_ops.dart(SiLUOp, SwishOp, HardswishOp)mish_op.dart(MishOp)elu_op.dart(ELUOp)
CenterCropOpextracted fromresize_op.darttocrop_op.dart- Barrel re-exports maintain backward compatibility for existing imports
Migration Notes #
ResizeOp: The newcoordinateModeparameter defaults tonull, preserving existing behavior viaalignCornersbool. No code changes needed for existing users.OperationCapabilities: All new fields have default values. Existingconst OperationCapabilities(...)calls remain valid.- File split:
activation_op.dartandresize_op.dartre-export all symbols. Existingimportstatements continue to work.
0.7.0 - 2026-02-02 #
Added #
-
New Activation Functions (PyTorch compatible):
GELUOp- Gaussian Error Linear Unit, standard in Transformers (BERT, GPT, ViT)- Supports exact computation and
tanhapproximation modes
- Supports exact computation and
SiLUOp(Swish) - Sigmoid Linear Unit, used in EfficientNet and YOLOv5SwishOp- Alias forSiLUOpHardsigmoidOp- Hardware-efficient sigmoid approximation for MobileNetV3HardswishOp- Hardware-efficient swish approximation for MobileNetV3MishOp- Self-regularizing activation used in YOLOv4+ELUOp- Exponential Linear Unit with configurable alpha
-
stack()Function - Stack tensors along a new dimension (torch.stack equivalent)- Supports arbitrary dimension insertion with negative indexing
- All input tensors must have identical shapes
- Dtype-specialized for Float32/Float64 performance
-
New Normalization Operations (PyTorch compatible):
InstanceNormOp- Instance normalization for style transfer and GANs- Normalizes per sample per channel (each spatial region independently)
- Supports 3D
[C,H,W]and 4D[N,C,H,W]tensors InstanceNormOp.fromStateDict()factory for loading PyTorch weights- Equivalent to
torch.nn.InstanceNorm2d
RMSNormOp- Root Mean Square normalization for modern LLMs- More efficient than LayerNorm (no mean subtraction)
- Used in LLaMA, Gemma, and other modern transformers
- Factory presets:
llama7B,llama13B,llama70B,gemma2B RMSNormOp.fromStateDict()factory for loading weights- Equivalent to
torch.nn.RMSNorm(PyTorch 2.4+)
Documentation #
- Updated PyTorch compatibility table in README.md
- Added new activation functions to Available Operations list
0.6.5 - 2026-02-02 #
Added #
-
OperationCapabilities metadata - All operations with
InPlaceTransformandRequiresContiguousmixins now overridecapabilitiesgetter:ReLUOp,LeakyReLUOp,SigmoidOp,TanhOp,SoftmaxOpUnaryMathOp(AbsOp, NegOp, SqrtOp, ExpOp, LogOp)ArithmeticOp(AddOp, SubOp, MulOp, DivOp),PowOpBatchNormOp,LayerNormOp,GroupNormOpNormalizeOp,ScaleOp,ClipOpResizeOp,CenterCropOp,RandomCropOp,GaussianBlurOp,PadOpTypeCastOp,ToTensorOp,ToImageOp
-
NaN/Infinity edge case tests - 23 new tests in
simd_ops_test.dart:- Float32/Float64 NaN handling for clip, abs, sqrt, normalize, relu operations
- Float32/Float64 Infinity handling for clip, abs, sqrt, normalize, relu operations
- Op-level NaN/Inf handling tests for ClipOp, AbsOp, SqrtOp, ReLUOp
Changed #
- Code consistency - Standardized
cloneForModification()usage across all in-place operations:BatchNormOp,LayerNormOp,ClipOp,ArithmeticOp,PowOpnow usecloneForModification()- Eliminates potential double-copy issues from manual contiguity checks
Performance #
PowOpdtype specialization - Added Float32/Float64 specialized loops for direct TypedList access
Documentation #
- Time/space complexity - Added Big-O complexity documentation to key operations:
ResizeOp- Complexity table for all interpolation modes (nearest, bilinear, bicubic, area, lanczos)NormalizeOp- O(n) time with SIMD accelerationBatchNormOp- O(n) time with pre-computed coefficientsLayerNormOp- O(n) time with Welford's algorithmGroupNormOp- O(n) time with per-group normalizationSoftmaxOp- O(n) time with 3-pass algorithmGaussianBlurOp- O(C×H×W×k) time using separable convolutionResizeNormalizeFusedOp- O(C×H_out×W_out) with no intermediate tensor
Tests #
- Total test count: 897 (23 new NaN/Inf edge case tests)
0.6.4 - 2026-01-28 #
Added #
ResizeNormalizeFusedOp- Fused resize + normalize operation that eliminates intermediate tensor allocation- Combines bilinear resize and per-channel normalization in a single pass
factory ResizeNormalizeFusedOp.imagenet(...)convenience constructor- Supports 3D
[C, H, W]and 4D[N, C, H, W]inputs - Cache-friendly 64x64 blocking for optimal L1 cache usage
Changed #
- Cache-friendly blocking for bilinear resize - Applied 64x64 blocking pattern to
_resizeBilinear()for both Float32-specialized and generic fallback paths - Cache-friendly blocking for area resize - Applied 64x64 blocking pattern to
_resizeArea()for both Float32-specialized and generic fallback paths ResizeNormalizeFusedOp.name- Now includesalignCornersparameter for better debugging visibility- Generic path style consistency - Pre-computes
oneMinusFy/oneMinusFxin_bilinearNormalizeGenericmatching Float32 path style
Tests #
- Added edge case tests for
ResizeNormalizeFusedOp: 1x1 input, same-size resize, alignCorners with dim=1, 25x upscale - Added validation tests: negative width, 5D input rejection,
alignCornersin name - Added path coverage tests: 4D+alignCorners, 4D+Float64 generic fallback, factory default alignCorners
- Added shape coverage tests: 4D non-contiguous, 4+ channel, batch=1 4D, computeOutputShape 2D behavior
0.6.3 - 2026-01-28 #
Added #
TensorBuffer.uninitialized()factory - Creates tensor buffer without zero-fill for cases where all elements will be immediately overwritten- Supports all DTypes and MemoryFormat options
- Semantically signals intent to overwrite, avoiding redundant initialization
Changed #
-
Uninitialized buffer usage - Operations that fully overwrite output now use
TensorBuffer.uninitialized()instead ofzeros():ResizeOp(3D/4D),CenterCropOp(3D/4D),concat(),SliceOp,RandomCropOp(3D/4D),GaussianBlurOp(3D/4D),PadOp(all modes)
-
BufferPool integration in GaussianBlurOp - Temporary
Float64Listbuffers now acquired fromBufferPooland properly released viatry/finallyto prevent leaks on exceptions
0.6.2 - 2026-01-20 #
Internal #
-
TensorBuffer Factory Separation - Moved factory methods to separate file:
tensor_buffer_factory.dartcontains: zeros, ones, full, random, randn, eye, linspace, arange, fromFloat32List, fromFloat64List, fromUint8List- Reduces
tensor_buffer.dartfrom ~840 lines to ~530 lines - No API changes
-
OpValidator - Added centralized operation validation (
validation_utils.dart):OpValidator.validateRank()- Validates tensor rank rangeOpValidator.validateAxis()- Validates and normalizes axis (supports negative indexing)OpValidator.validateChannels()- Validates channel countOpValidator.validatePositiveDimension()- Validates positive dimensionOpValidator.validateListLength()- Validates list length
-
OperationCapabilities - Added operation metadata (
transform_op.dart):supportsInPlace- Whether op can modify tensor in placerequiresContiguous- Whether op requires contiguous memorypreservesShape- Whether op preserves input shapemodifiesDType- Whether op may change data type- Default
capabilitiesgetter onTransformOp
0.6.1 - 2026-01-17 #
Added #
-
Float64 SIMD Operations - Vectorized operations for Float64 tensors (
simd_ops.dart):SimdOps.clipF64()- Clips values using Float64x2.clamp()SimdOps.absF64()- Absolute value using Float64x2.abs()SimdOps.sqrtF64()- Square root using Float64x2.sqrt()SimdOps.normalizeF64()- Mean/std normalization with SIMD- Uses Float64x2List.view() for aligned data (16-byte alignment)
- Scalar fallback for unaligned data to avoid object creation overhead
- ~2.5x speedup for aligned Float64 data vs scalar
-
SIMD Microbenchmark - Performance verification for SIMD operations (
benchmark/simd_microbenchmark.dart):- Direct SimdOps performance measurement (clip, abs, sqrt, normalize)
- Aligned vs unaligned data comparison (~4.4x performance difference)
- Float32 SIMD vs Float64 SIMD comparison
- Edge case testing for non-multiple-of-4 lengths
-
SIMD Tests - 64 tests in
simd_ops_test.dart:- Float32 SIMD tests with alignment edge cases
- Float64 SIMD tests (clipF64, absF64, sqrtF64, normalizeF64)
- Op integration tests for both Float32 and Float64
Changed #
- SimdOps.abs() and SimdOps.sqrt() - Now applied to
AbsOpandSqrtOpfor Float32 tensors - SimdOps.clip() - Now used in
ClipOpfor Float32 tensors - SimdOps.normalize() - Now used in
NormalizeOpfor Float32 tensors (per-channel) - NegOp - Now uses
SimdOps.multiplyScalar(-1)for Float32 tensors - ClipOp - Now uses
SimdOps.clipF64()for Float64 tensors - AbsOp - Now uses
SimdOps.absF64()for Float64 tensors - SqrtOp - Now uses
SimdOps.sqrtF64()for Float64 tensors - NormalizeOp - Now uses
SimdOps.normalizeF64()for Float64 tensors (3D and 4D)
Performance #
- Float32 SIMD (aligned): ~6.2 GE/s
- Float64 SIMD (aligned): ~3.3 GE/s (53% of Float32, expected due to Float64x2 vs Float32x4)
- Unaligned fallback: ~1.3-1.5 GE/s
Internal #
- Integrated SIMD microbenchmark into
benchmark/run_all.dart
0.6.0 - 2026-01-16 #
Added #
-
Multi-axis Reductions - Reduce along multiple axes at once (
tensor_buffer_reduce.dart):sumAxes(List<int> axes, {bool keepDims})- Sum along multiple axesmeanAxes(List<int> axes, {bool keepDims})- Mean along multiple axesminAxes(List<int> axes, {bool keepDims})- Min along multiple axesmaxAxes(List<int> axes, {bool keepDims})- Max along multiple axes- Supports negative axis indexing
- Validates duplicate axes
-
GroupNormOp - Group normalization for modern CNNs (
group_norm_op.dart):- Full PyTorch-compatible
torch.nn.GroupNormimplementation - Normalizes across groups of channels (used in U-Net, modern CNNs with small batch sizes)
- Supports 3D
[C,H,W]and 4D[N,C,H,W]tensors GroupNormOp.withAffine()factory for PyTorch-style initializationGroupNormOp.fromStateDict()factory for loading PyTorch weights- Welford's algorithm for numerically stable mean/variance computation
- Dtype-specialized loops for Float32/Float64
- In-place support via
applyInPlace()
- Full PyTorch-compatible
-
SIMD Operations - Vectorized tensor operations (
simd_ops.dart):- Uses Float32x4 SIMD instructions for 2-4x speedup on Float32 tensors
SimdOps.multiplyScalar(),SimdOps.addScalar(),SimdOps.subtractScalar()- Scalar operationsSimdOps.add(),SimdOps.subtract(),SimdOps.multiply(),SimdOps.divide()- Element-wise binary operationsSimdOps.relu(),SimdOps.leakyRelu()- Activation functionsSimdOps.normalize()- Mean/std normalizationSimdOps.copy(),SimdOps.fill(),SimdOps.sum(),SimdOps.clip()- Handles both aligned and unaligned memory
-
Interpolation Modes - Additional resize algorithms (
resize_op.dart):InterpolationMode.area- Weighted area averaging for high-quality downsampling with anti-aliasing (OpenCV INTER_AREA equivalent)InterpolationMode.lanczos- Lanczos3 (6x6 kernel) for high-quality resize with sinc-based interpolation
Changed #
- BREAKING: Reduction operations moved to extension (
TensorBufferReduce)sum(),mean(),min(),max()- Full tensor reductionssumAxis(),meanAxis(),minAxis(),maxAxis()- Single-axis reductionstoList()- Data extraction- Existing code using these methods will work unchanged, but users importing only
tensor_buffer.dartmust now also importtensor_buffer_reduce.dartor the main library
Performance #
- SIMD-accelerated operations:
ScaleOp,ReLUOp,LeakyReLUOpnow use SIMD for Float32 tensors - SIMD-accelerated ArithmeticOp:
AddOp,SubOp,MulOp,DivOpnow use SIMD for Float32 tensors (both scalar and tensor modes) - Cache-friendly bicubic resize: 64x64 block processing for better L1 cache utilization on large tensors
Internal #
- Extracted reduction operations from
tensor_buffer.dart(1170 → 740 lines) totensor_buffer_reduce.dart - Added 14 new tests for multi-axis reductions
- Added 61 new tests for SIMD operations, GroupNormOp, and resize modes
PyTorch Compatibility #
| Operation | PyTorch Equivalent |
|---|---|
GroupNormOp |
torch.nn.GroupNorm |
0.5.1 - 2026-01-13 #
Added #
-
BufferPool - Memory pooling API for buffer reuse (
buffer_pool.dart):- Singleton
BufferPool.instancefor global buffer reuse - Power-of-2 size bucketing for efficient allocation
- Per-dtype buffer pools (Float32, Float64, Int32, Uint8, etc.)
acquire(minSize, dtype)andrelease(buffer)methodsacquireFloat32(),acquireFloat64(), etc. convenience extensions- Max buffers per bucket limit (8) to prevent unbounded memory growth
pooledCountandpooledBytesfor monitoring
- Singleton
-
TypedData Views - Zero-copy tensor view utilities (
typed_data_views.dart):TypedDataViews.float32SublistView()- Zero-copy Float32List slicingTypedDataViews.float64SublistView()- Zero-copy Float64List slicingTypedDataViews.viewAs()- Create typed view from ByteBuffer at offsetTensorViewExtensionon TensorBuffer:sliceFirst(start, end)- Zero-copy slice along first dimensionisViewable- Check if tensor can be used as a viewtoChannelsLast()- NCHW to NHWC without copyingtoChannelsFirst()- NHWC to NCHW without copyingflatten()- 1D view of contiguous tensorunbind(dim)- Split tensor into views along dimensionselect(dim, index)- Select single index with reduced ranknarrow(dim, start, length)- Narrow dimension without copying
-
Utility Libraries (
lib/src/utils/):dtype_dispatcher.dart- DTypeDispatcher for dtype-specialized dispatchtensor_indexing.dart- TensorIndexer for index calculations (index2D, index3D, index4D, linearToCoords, coordsToLinear, computeStrides)
-
TensorBuffer/TensorStorage Factory Methods:
TensorBuffer.fromFloat64List()- Create tensor from Float64ListTensorStorage.fromFloat64List()- Create storage from Float64List
Changed #
-
SoftmaxOp Optimization: Now preserves input dtype (Float32/Float64) instead of always using Float64. Added dtype-specialized implementations for better performance.
-
Double-copy elimination: Operations now use
cloneForModification()pattern (input.isContiguous ? input.clone() : input.contiguous()) to avoid unnecessary copies:ReLUOp,LeakyReLUOp,SigmoidOp,TanhOp,SoftmaxOpAbsOp,NegOp,SqrtOp,ExpOp,LogOp(UnaryMathOp)NormalizeOp,ScaleOp
Internal #
- Added
cloneForModification()helper toRequiresContiguousmixin intransform_op.dart - Integrated
DTypeDispatcherinto activation ops (ReLUOp,LeakyReLUOp,SigmoidOp,TanhOp) for dtype-specialized loops - Integrated
DTypeDispatcherintoScaleOpfor consistent dtype handling - Replaced stride computation with
TensorIndexer.computeStrides()inSoftmaxOp(removed 3x code duplication)
0.5.0 - 2026-01-10 #
Added #
-
BatchNormOp - Batch normalization for CNN inference (
batch_norm_op.dart):- Full PyTorch-compatible
torch.nn.BatchNorm2dimplementation - Pre-computed scale/shift coefficients for efficient inference:
y = x * scale + shift - Supports 3D
[C,H,W]and 4D[N,C,H,W]tensors BatchNormOp.fromStateDict()factory for loading PyTorch weights- Dtype-specialized loops for Float32/Float64
- In-place support via
applyInPlace()
- Full PyTorch-compatible
-
LayerNormOp - Layer normalization for Transformer inference (
layer_norm_op.dart):- Full PyTorch-compatible
torch.nn.LayerNormimplementation - Normalizes over last N dimensions (e.g.,
[768]for BERT) - Welford's algorithm for numerically stable mean/variance computation
LayerNormOp.bert()andLayerNormOp.bertLarge()factory presetsLayerNormOp.fromStateDict()factory for loading PyTorch weights- Dtype-specialized loops for Float32/Float64
- In-place support via
applyInPlace()
- Full PyTorch-compatible
PyTorch Compatibility #
| Operation | PyTorch Equivalent |
|---|---|
BatchNormOp |
torch.nn.BatchNorm2d (inference) |
LayerNormOp |
torch.nn.LayerNorm |
0.4.1 - 2026-01-09 #
Performance Optimizations #
-
Dtype-specialized loops: Hot paths in transform operations now use dtype-specific code paths with direct
Float32List/Float64Listaccess, avoiding per-element switch overhead:NormalizeOp._normalize3D(),NormalizeOp._normalize4D()ScaleOp._scale()ClipOp._clip()GaussianBlurOp._applySeparableBlur()ResizeOp._resizeNearest(),_resizeBilinear(),_resizeBicubic()CenterCropOp._crop3D(),_crop4D()concat()with optimized axis=0 bulk copy
-
Clone-Before-Modify optimization:
ClipOp.apply()now avoids double copy by checkingisContiguousbefore deciding whether toclone()orcontiguous() -
Isolate threshold:
TensorPipeline.runAsync()now accepts optionalisolateThresholdparameter (default: 100,000 elements). Small tensors skip isolate overhead and run synchronously -
Buffer reuse:
GaussianBlurOpnow pre-allocates and reuses temp buffer across channels, reducing allocations -
Concat linear copy:
concat()now uses pre-computed strides for linear index calculation instead of recursive index computation. Axis=0 concatenation of contiguous tensors uses bulksetRange()copy -
Loop unrolling:
ResizeOp._resizeBicubic()unrolls 4x4 kernel with pre-computed weights and indices
0.4.0 - 2026-01-09 #
Added #
- Arithmetic Operations (
arithmetic_op.dart):AddOp- Element-wise addition (scalar or tensor)SubOp- Element-wise subtraction (scalar or tensor)MulOp- Element-wise multiplication (scalar or tensor)DivOp- Element-wise division (scalar or tensor)PowOp- Element-wise power operation
- Math Operations (
math_op.dart):AbsOp- Element-wise absolute valueNegOp- Element-wise negationSqrtOp- Element-wise square rootExpOp- Element-wise exponential (e^x)LogOp- Element-wise natural logarithm
- Activation Functions (
activation_op.dart):ReLUOp- Rectified Linear UnitLeakyReLUOp- Leaky ReLU with configurable negative slopeSigmoidOp- Sigmoid activationTanhOp- Hyperbolic tangent activationSoftmaxOp- Softmax along specified axis
- TensorBuffer Factory Methods:
TensorBuffer.full()- Create tensor filled with specified valueTensorBuffer.random()- Create tensor with uniform random values [0, 1)TensorBuffer.randn()- Create tensor with standard normal distributionTensorBuffer.eye()- Create identity matrix (supports rectangular)TensorBuffer.linspace()- Create tensor with evenly spaced valuesTensorBuffer.arange()- Create tensor with sequence values
- Utility Libraries (
lib/src/utils/):index_utils.dart- Index manipulation utilities (reflectIndex, replicateIndex, circularIndex)validation_utils.dart- Common tensor validation patterns
Changed #
- Exception Consistency:
TensorStorage._checkBounds()now throwsIndexOutOfBoundsExceptioninstead ofRangeErrorfor consistent exception handling across the library
Internal #
- Extracted duplicate
_reflectIndexcode frompad_op.dartandaugmentation_op.dartinto shared utility - Added
TensorValidationextension withrequireRank3Or4(),requireExactRank(),requireMinRank()methods
0.3.1 - 2026-01-08 #
Added #
- Performance benchmark suite (
benchmark/directory):tensor_creation_benchmark.dart- Tensor creation performancetensor_ops_benchmark.dart- Zero-copy and copy operationspipeline_benchmark.dart- Pipeline sync/async comparisonmemory_benchmark.dart- Memory usage measurementrun_all.dart- Unified benchmark runnerutils/benchmark_utils.dart- Benchmark utilities
Fixed #
- Removed unused variables in benchmark files
- Fixed lint issues in benchmark files
0.3.0 - 2026-01-08 #
Added #
ClipOp- Element-wise value clamping with factory presets (unit, symmetric, uint8)PadOp- Padding with multiple modes (constant, reflect, replicate, circular)SliceOp- Python-like tensor slicing with support for negative indices and stepsRandomCropOp- Random cropping for data augmentation with deterministic seed supportGaussianBlurOp- Gaussian blur using separable convolution with factory presetsconcat()- Utility function for tensor concatenation along specified axis
Fixed #
concat()axis-based copy logic now correctly handles multi-axis concatenation
Changed #
- BREAKING: Unified exception handling across the library
- All exceptions now extend
TensorExceptionsealed class ArgumentError→ShapeMismatchException,InvalidParameterExceptionRangeError→IndexOutOfBoundsException
- All exceptions now extend
0.2.0 - 2026-01-04 #
Added #
IndexOutOfBoundsException- Thrown when an index or axis is out of valid rangeDTypeMismatchException- Thrown when tensor data types do not match
Changed #
- BREAKING: Unified exception handling across the library
- All exceptions now extend
TensorExceptionsealed class ArgumentError→ShapeMismatchException,InvalidParameterExceptionRangeError→IndexOutOfBoundsExceptionStateError→NonContiguousException,DTypeMismatchException
- All exceptions now extend
- Shape validation now happens before buffer creation in
zeros()andones()
Migration Guide #
If you were catching standard Dart exceptions, update your code:
| Before | After |
|---|---|
on RangeError |
on IndexOutOfBoundsException |
on ArgumentError |
on ShapeMismatchException or on InvalidParameterException |
on StateError |
on NonContiguousException or on DTypeMismatchException |
0.1.4 - 2026-01-04 #
Added #
- Reduction operations for
TensorBuffer:sum()- Returns the sum of all elementsmean()- Returns the arithmetic mean of all elementsmin()- Returns the minimum valuemax()- Returns the maximum value
- Axis-wise reduction operations:
sumAxis(int axis, {bool keepDims})- Sum along a specific axismeanAxis(int axis, {bool keepDims})- Mean along a specific axisminAxis(int axis, {bool keepDims})- Min along a specific axismaxAxis(int axis, {bool keepDims})- Max along a specific axis
- Support for negative axis indexing in axis-wise operations
- Comprehensive test coverage for all reduction operations (49 tests)
0.1.3 - 2026-01-03 #
0.1.1 - 2025-12-27 #
Added #
- Comprehensive dartdoc comments for all public API elements
- Library-level documentation with usage examples
0.1.0 - 2025-12-27 #
Added #
-
Core tensor operations
TensorBufferwith shape, strides, and view/storage separationTensorStoragefor immutable typed data wrapperDTypeenum with ONNX-compatible data types
-
Transform operations
ResizeOpwith nearest, bilinear, bicubic interpolationResizeShortestOpfor aspect-ratio preserving resizeCenterCropOpfor center croppingNormalizeOpwith ImageNet, CIFAR-10, symmetric presetsScaleOpfor value scalingPermuteOpfor axis reorderingToTensorOpfor HWC uint8 to CHW float32 conversionToImageOpfor CHW float32 to HWC uint8 conversionUnsqueezeOp,SqueezeOp,ReshapeOp,FlattenOpfor shape manipulationTypeCastOpfor dtype conversion
-
Pipeline system
TensorPipelinefor chaining operationsPipelinePresetswith ImageNet, ResNet, YOLO, CLIP, ViT, MobileNet presets- Async execution via
Isolate.run
-
Zero-copy operations
transpose()via stride manipulationsqueeze(),unsqueeze()as shape-only changes