GpuDelegateOptionsV2 constructor
- bool isPrecisionLossAllowed = false,
- int inferencePreference = TfLiteGpuInferenceUsage.TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER,
- int inferencePriority1 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION,
- int inferencePriority2 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
- int inferencePriority3 = TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
- List<
int> experimentalFlags = const [TfLiteGpuExperimentalFlags.TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT], - int maxDelegatePartitions = 1,
Creates GpuDelegateOptionsV2 with specified parameters
isPrecisionLossAllowed
When set to zero, computations are carried out in maximal possible
precision. Otherwise, the GPU may quantify tensors, downcast values,
process in FP16 to increase performance. For most models precision loss is
warranted.
inferencePreference
Preference is defined in TfLiteGpuInferenceUsage
.
inferencePriority1
inferencePriority2
inferencePriority3
Ordered priorities provide better control over desired semantics,
where priority(n) is more important than priority(n+1), therefore,
each time inference engine needs to make a decision, it uses
ordered priorities to do so.
For example: MAX_PRECISION at priority1 would not allow to decrease precision, but moving it to priority2 or priority3 would result in F16 calculation.
Priority is defined in TfLiteGpuInferencePriority.
AUTO priority can only be used when higher priorities are fully specified.
For example: VALID: priority1 = MIN_LATENCY, priority2 = AUTO, priority3 = AUTO
VALID: priority1 = MIN_LATENCY, priority2 = MAX_PRECISION, priority3 = AUTO
INVALID: priority1 = AUTO, priority2 = MIN_LATENCY, priority3 = AUTO
INVALID: priority1 = MIN_LATENCY, priority2 = AUTO, priority3 = MAX_PRECISION
Invalid priorities will result in error.
experimentalFlags
List of flags to enable.
See the comments in TfLiteGpuExperimentalFlags
.
maxDelegatePartitions
A graph could have multiple partitions that can be
delegated to the GPU.
This limits the maximum number of partitions to be delegated. By default,
it's set to 1 in TfLiteGpuDelegateOptionsV2Default().
Implementation
factory GpuDelegateOptionsV2({
bool isPrecisionLossAllowed = false,
int inferencePreference = TfLiteGpuInferenceUsage
.TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER,
int inferencePriority1 =
TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION,
int inferencePriority2 =
TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
int inferencePriority3 =
TfLiteGpuInferencePriority.TFLITE_GPU_INFERENCE_PRIORITY_AUTO,
List<int> experimentalFlags = const [
TfLiteGpuExperimentalFlags.TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT
],
int maxDelegatePartitions = 1,
}) {
final options = calloc<TfLiteGpuDelegateOptionsV2>();
options.ref
..is_precision_loss_allowed = isPrecisionLossAllowed ? 1 : 0
..inference_preference = inferencePreference
..inference_priority1 = inferencePriority1
..inference_priority2 = inferencePriority2
..inference_priority3 = inferencePriority3
..experimental_flags =
_TfLiteGpuExperimentalFlagsUtil.getBitmask(experimentalFlags)
..max_delegated_partitions = maxDelegatePartitions;
return GpuDelegateOptionsV2._(options);
}