dart_tensor_preprocessing 0.7.0 copy "dart_tensor_preprocessing: ^0.7.0" to clipboard
dart_tensor_preprocessing: ^0.7.0 copied to clipboard

High-performance tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime inference.

dart_tensor_preprocessing #

Dart License PyTorch

Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.

Features #

  • PyTorch Compatible: Matches PyTorch/torchvision tensor operations
  • Non-blocking: Isolate-based async execution prevents UI jank
  • Type-safe: ONNX-compatible tensor types (Float32, Int64, Uint8, etc.)
  • Zero-copy: View/stride manipulation for reshape/transpose operations
  • Declarative: Chain operations into reusable pipelines
  • SIMD Accelerated: Float32/Float64 vectorized operations for 2-4x speedup
  • Memory Efficient: Buffer pooling, uninitialized allocation, fused operations

Installation #

dependencies:
  dart_tensor_preprocessing: ^0.7.0

Quick Start #

import 'package:dart_tensor_preprocessing/dart_tensor_preprocessing.dart';

// Create a tensor from image data (HWC format, Uint8)
final imageData = Uint8List.fromList([/* RGBA pixel data */]);
final tensor = TensorBuffer.fromUint8List(imageData, [height, width, channels]);

// Use a preset pipeline for ImageNet models
final pipeline = PipelinePresets.imagenetClassification();
final result = await pipeline.runAsync(tensor);

// result.shape: [1, 3, 224, 224] (NCHW, Float32, normalized)

Pipeline Presets #

Preset Output Shape Use Case
imagenetClassification() [1, 3, 224, 224] ResNet, VGG, etc.
objectDetection() [1, 3, 640, 640] YOLO, SSD
faceRecognition() [1, 3, 112, 112] ArcFace, FaceNet
clip() [1, 3, 224, 224] CLIP models
mobileNet() [1, 3, 224, 224] MobileNet family

Custom Pipeline #

final pipeline = TensorPipeline([
  ResizeOp(height: 224, width: 224),
  ToTensorOp(normalize: true),  // HWC -> CHW, scale to [0,1]
  NormalizeOp.imagenet(),       // ImageNet mean/std
  UnsqueezeOp.batch(),          // Add batch dimension
]);

// Sync execution
final result = pipeline.run(input);

// Async execution (runs in isolate)
final result = await pipeline.runAsync(input);

// Async with custom isolate threshold (default: 100,000 elements)
// Small tensors skip isolate overhead and run synchronously
final result = await pipeline.runAsync(input, isolateThreshold: 50000);

Available Operations #

Resize & Crop #

  • ResizeOp - Resize to fixed dimensions (nearest, bilinear, bicubic, area, lanczos)
  • ResizeShortestOp - Resize preserving aspect ratio
  • CenterCropOp - Center crop to fixed dimensions
  • ClipOp - Element-wise value clamping (presets: unit, symmetric, uint8)
  • PadOp - Padding with multiple modes (constant, reflect, replicate, circular)
  • SliceOp - Python-like tensor slicing with negative index support

Normalization #

  • NormalizeOp - Channel-wise normalization (presets: ImageNet, CIFAR-10, symmetric)
  • ScaleOp - Scale values (e.g., [0-255] to [0-1])
  • BatchNormOp - Batch normalization for CNN inference (PyTorch compatible)
  • LayerNormOp - Layer normalization for Transformer inference (presets: BERT, BERT-Large)
  • GroupNormOp - Group normalization for modern CNNs (PyTorch compatible)
  • InstanceNormOp - Instance normalization for style transfer and GANs (PyTorch compatible)
  • RMSNormOp - Root Mean Square normalization for LLMs (LLaMA, Gemma)

Layout #

  • PermuteOp - Axis reordering (e.g., HWC to CHW)
  • ToTensorOp - HWC uint8 to CHW float32 with optional scaling
  • ToImageOp - CHW float32 to HWC uint8

Data Augmentation #

  • RandomCropOp - Random cropping with deterministic seed support
  • GaussianBlurOp - Gaussian blur using separable convolution

Fused Operations #

  • ResizeNormalizeFusedOp - Combines resize + normalize in single pass (eliminates intermediate tensor)

Activation Functions #

  • ReLUOp - Rectified Linear Unit (SIMD accelerated)
  • LeakyReLUOp - Leaky ReLU with configurable slope (SIMD accelerated)
  • GELUOp - Gaussian Error Linear Unit (Transformers: BERT, GPT, ViT)
  • SiLUOp / SwishOp - Sigmoid Linear Unit (EfficientNet, YOLOv5)
  • HardsigmoidOp - Hardware-efficient sigmoid (MobileNetV3)
  • HardswishOp - Hardware-efficient swish (MobileNetV3)
  • MishOp - Self-regularizing activation (YOLOv4+)
  • ELUOp - Exponential Linear Unit
  • SigmoidOp - Sigmoid activation
  • TanhOp - Hyperbolic tangent activation
  • SoftmaxOp - Softmax along specified axis

Math Operations #

  • AbsOp - Absolute value (SIMD accelerated)
  • NegOp - Negation (SIMD accelerated)
  • SqrtOp - Square root (SIMD accelerated)
  • ExpOp - Exponential (e^x)
  • LogOp - Natural logarithm
  • PowOp - Power operation

Arithmetic Operations #

  • AddOp / SubOp - Element-wise addition/subtraction (SIMD accelerated)
  • MulOp / DivOp - Element-wise multiplication/division (SIMD accelerated)

Utility #

  • concat() - Concatenates tensors along specified axis
  • stack() - Stacks tensors along a new dimension

Shape #

  • UnsqueezeOp - Add dimension
  • SqueezeOp - Remove size-1 dimensions
  • ReshapeOp - Reshape tensor (supports -1 for inference)
  • FlattenOp - Flatten dimensions

Type #

  • TypeCastOp - Convert between data types

Core Classes #

TensorBuffer #

Tensor with shape and stride metadata over physical storage.

// Create tensors
final zeros = TensorBuffer.zeros([3, 224, 224]);
final ones = TensorBuffer.ones([3, 224, 224], dtype: DType.float32);
final fromData = TensorBuffer.fromFloat32List(data, [3, 224, 224]);

// Access elements
final value = tensor[[0, 100, 100]];

// Zero-copy operations
final transposed = tensor.transpose([2, 0, 1]);  // Changes strides only
final squeezed = tensor.squeeze();

// Copy operations
final contiguous = tensor.contiguous();  // Force contiguous memory
final cloned = tensor.clone();

DType #

ONNX-compatible data types with onnxId for runtime integration.

DType.float32  // ONNX ID: 1
DType.int64    // ONNX ID: 7
DType.uint8    // ONNX ID: 2

BufferPool #

Memory pooling for buffer reuse, reducing GC pressure in hot paths.

final pool = BufferPool.instance;

// Acquire buffer (reuses from pool if available)
final buffer = pool.acquireFloat32(1000);

// ... use buffer ...

// Release back to pool for reuse
pool.release(buffer);

// Monitor pool usage
print('Pooled: ${pool.pooledCount} buffers, ${pool.pooledBytes} bytes');

Zero-Copy View Operations #

TensorBuffer extension methods for zero-copy tensor manipulation:

// Slice along first dimension (batch slicing)
final batch = tensor.sliceFirst(2, 5);  // Views elements 2..4

// Split tensor into views
final items = tensor.unbind(0);  // List of views along dim 0

// Select single index (reduces rank)
final first = tensor.select(0, 0);  // First item, shape reduced

// Narrow dimension
final narrowed = tensor.narrow(0, 1, 3);  // 3 elements starting at 1

// Format conversion without copying
final nhwc = nchwTensor.toChannelsLast();   // NCHW -> NHWC view
final nchw = nhwcTensor.toChannelsFirst();  // NHWC -> NCHW view

// Flatten to 1D view
final flat = tensor.flatten();

In-Place Operations #

Many operations support in-place modification to avoid allocation overhead:

// In-place operations (modify tensor directly)
ReLUOp().applyInPlace(tensor);
NormalizeOp.imagenet().applyInPlace(tensor);
ClipOp(min: 0, max: 1).applyInPlace(tensor);
BatchNormOp(...).applyInPlace(tensor);

// Query operation capabilities
final op = ReLUOp();
print(op.capabilities.supportsInPlace);    // true
print(op.capabilities.requiresContiguous); // true
print(op.capabilities.preservesShape);     // true

Operations supporting in-place: ReLUOp, LeakyReLUOp, SigmoidOp, TanhOp, AbsOp, NegOp, SqrtOp, ExpOp, LogOp, PowOp, AddOp, SubOp, MulOp, DivOp, ClipOp, NormalizeOp, ScaleOp, BatchNormOp, LayerNormOp, GroupNormOp, InstanceNormOp, RMSNormOp.

Memory Formats #

Format Layout Strides (for [1,3,224,224])
contiguous NCHW [150528, 50176, 224, 1]
channelsLast NHWC [150528, 1, 672, 3]

PyTorch Compatibility #

This library is designed to produce identical results to PyTorch/torchvision operations:

Operation PyTorch Equivalent
TensorBuffer.zeros() torch.zeros()
TensorBuffer.ones() torch.ones()
tensor.transpose() tensor.permute()
tensor.reshape() tensor.reshape()
tensor.squeeze() tensor.squeeze()
tensor.unsqueeze() tensor.unsqueeze()
tensor.sum() / sumAxis() tensor.sum()
tensor.sumAxes([...]) tensor.sum(dim=[...])
tensor.mean() / meanAxis() tensor.mean()
tensor.meanAxes([...]) tensor.mean(dim=[...])
tensor.min() / max() tensor.min() / max()
tensor.minAxes([...]) tensor.amin(dim=[...])
tensor.maxAxes([...]) tensor.amax(dim=[...])
NormalizeOp.imagenet() transforms.Normalize(mean, std)
ResizeOp(mode: bilinear) F.interpolate(mode='bilinear')
ResizeOp(mode: area) F.interpolate(mode='area')
ResizeOp(mode: lanczos) Lanczos3 interpolation
ToTensorOp() transforms.ToTensor()
ClipOp(min, max) torch.clamp(min, max)
PadOp(mode: reflect) F.pad(mode='reflect')
SliceOp([(start, end, step)]) tensor[start:end:step]
concat(tensors, axis) torch.cat(tensors, dim)
stack(tensors, dim) torch.stack(tensors, dim)
RandomCropOp transforms.RandomCrop()
GaussianBlurOp transforms.GaussianBlur()
AddOp / SubOp torch.add() / torch.sub()
MulOp / DivOp torch.mul() / torch.div()
PowOp torch.pow()
AbsOp / NegOp torch.abs() / torch.neg()
SqrtOp / ExpOp / LogOp torch.sqrt() / exp() / log()
ReLUOp / LeakyReLUOp F.relu() / F.leaky_relu()
GELUOp F.gelu()
SiLUOp / SwishOp F.silu()
HardsigmoidOp F.hardsigmoid()
HardswishOp F.hardswish()
MishOp F.mish()
ELUOp F.elu()
SigmoidOp / TanhOp torch.sigmoid() / torch.tanh()
SoftmaxOp F.softmax()
BatchNormOp torch.nn.BatchNorm2d (inference)
LayerNormOp torch.nn.LayerNorm
GroupNormOp torch.nn.GroupNorm
InstanceNormOp torch.nn.InstanceNorm2d
RMSNormOp torch.nn.RMSNorm (PyTorch 2.4+)
TensorBuffer.full() torch.full()
TensorBuffer.random() torch.rand()
TensorBuffer.randn() torch.randn()
TensorBuffer.eye() torch.eye()
TensorBuffer.linspace() torch.linspace()
TensorBuffer.arange() torch.arange()
tensor.select(dim, index) tensor.select(dim, index)
tensor.narrow(dim, start, len) tensor.narrow(dim, start, len)
tensor.unbind(dim) tensor.unbind(dim)
tensor.flatten() tensor.flatten()
ResizeNormalizeFusedOp F.interpolate() + transforms.Normalize() (fused)

Performance Benchmarks #

Run benchmarks with dart run benchmark/run_all.dart.

SIMD Acceleration #

Operations with Float32x4/Float64x2 SIMD vectorization:

Operation SIMD Throughput Speedup
ClipOp ~6.2 GE/s (Float32) ~4x
AbsOp ~6.2 GE/s (Float32) ~4x
SqrtOp ~6.2 GE/s (Float32) ~4x
NormalizeOp ~6.2 GE/s (Float32) ~4x
ReLUOp / LeakyReLUOp ~6.2 GE/s (Float32) ~4x
ScaleOp ~6.2 GE/s (Float32) ~4x
AddOp / SubOp / MulOp / DivOp ~6.2 GE/s (Float32) ~4x

GE/s = Giga Elements per second. Float64 SIMD achieves ~53% of Float32 performance due to Float64x2 vs Float32x4.

Operation Complexity #

Operation Time Complexity Space Complexity
ResizeOp (bilinear) O(C × H × W) O(C × H × W)
ResizeOp (bicubic) O(C × H × W × 16) O(C × H × W)
ResizeOp (lanczos) O(C × H × W × 36) O(C × H × W)
NormalizeOp O(n) O(n) or O(1) in-place
BatchNormOp O(n) O(n) or O(1) in-place
LayerNormOp O(n) O(n) or O(1) in-place
GaussianBlurOp O(C × H × W × k) O(C × H × W)
ResizeNormalizeFusedOp O(C × H × W) O(C × H × W)

Zero-Copy Operations (O(1)) #

Operation Time Ops/sec
transpose() ~1µs 700K+
reshape() ~1µs 1.6M+
squeeze() <1µs 3.2M+
unsqueeze() ~1µs 780K+

Pipeline Performance #

Pipeline Input Shape Time
Simple (Normalize + Unsqueeze) [3, 224, 224] ~3.4ms
ImageNet Classification [3, 224, 224] ~3.0ms
Object Detection [3, 640, 640] ~25ms

Sync vs Async #

Execution 224x224 640x640
run() (sync) ~3.5ms ~29ms
runAsync() (isolate) ~11ms ~93ms
Isolate overhead ~7ms ~64ms

Note: Use runAsync() for large tensors or when UI responsiveness is critical.

Requirements #

  • Dart SDK ^3.0.0

License #

MIT

3
likes
0
points
630
downloads

Publisher

verified publisherbrodykim.work

Weekly Downloads

High-performance tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime inference.

Repository (GitHub)
View/report issues

License

unknown (license)

Dependencies

image

More

Packages that depend on dart_tensor_preprocessing