GpuDelegateOptionsV2 constructor

GpuDelegateOptionsV2({
  1. bool isPrecisionLossAllowed = false,
  2. int inferencePreference = TfLiteGpuInferenceUsage.TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER,
  3. int inferencePriority1 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION,
  4. int inferencePriority2 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
  5. int inferencePriority3 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
  6. List<int> experimentalFlags = const [TfLiteGpuExperimentalFlags.TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT],
  7. int maxDelegatePartitions = 1,
})

Creates GpuDelegateOptionsV2 with specified parameters

isPrecisionLossAllowed When set to zero, computations are carried out in maximal possible precision. Otherwise, the GPU may quantify tensors, downcast values, process in FP16 to increase performance. For most models precision loss is warranted.

inferencePreference Preference is defined in TfLiteGpuInferenceUsage.

inferencePriority1 inferencePriority2 inferencePriority3 Ordered priorities provide better control over desired semantics, where priority(n) is more important than priority(n+1), therefore, each time inference engine needs to make a decision, it uses ordered priorities to do so.

For example: MAX_PRECISION at priority1 would not allow to decrease precision, but moving it to priority2 or priority3 would result in F16 calculation.

Priority is defined in TfLiteGpuInferencePriority.

AUTO priority can only be used when higher priorities are fully specified.

For example: VALID: priority1 = MIN_LATENCY, priority2 = AUTO, priority3 = AUTO

VALID: priority1 = MIN_LATENCY, priority2 = MAX_PRECISION, priority3 = AUTO

INVALID: priority1 = AUTO, priority2 = MIN_LATENCY, priority3 = AUTO

INVALID: priority1 = MIN_LATENCY, priority2 = AUTO, priority3 = MAX_PRECISION

Invalid priorities will result in error.

experimentalFlags List of flags to enable. See the comments in TfLiteGpuExperimentalFlags.

maxDelegatePartitions A graph could have multiple partitions that can be delegated to the GPU. This limits the maximum number of partitions to be delegated. By default, it's set to 1 in TfLiteGpuDelegateOptionsV2Default().

Implementation

factory GpuDelegateOptionsV2({
  bool isPrecisionLossAllowed = false,
  int inferencePreference = TfLiteGpuInferenceUsage
      .TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER,
  int inferencePriority1 =
      TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION,
  int inferencePriority2 =
      TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
  int inferencePriority3 =
      TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
  List<int> experimentalFlags = const [
    TfLiteGpuExperimentalFlags.TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT
  ],
  int maxDelegatePartitions = 1,
}) {
  final options = calloc<TfLiteGpuDelegateOptionsV2>();
  options.ref
    ..is_precision_loss_allowed = isPrecisionLossAllowed ? 1 : 0
    ..inference_preference = inferencePreference
    ..inference_priority1 = inferencePriority1
    ..inference_priority2 = inferencePriority2
    ..inference_priority3 = inferencePriority3
    ..experimental_flags =
        _TfLiteGpuExperimentalFlagsUtil.getBitmask(experimentalFlags)
    ..max_delegated_partitions = maxDelegatePartitions;

  return GpuDelegateOptionsV2._(options);
}