PrePackWeight property - OrtKernelImpl class - onnxruntime_generated library

\brief Optional function to pre-pack a constant tensor (i.e., a weight) to the kernel's preferred data layout.

For example, a Conv kernel can define this function to pack input W to the channel-last data layout before inference.

Pre-packing can operate in three different modes: no pre-packing mode, sharing mode, and non-sharing mode.

No pre-packing mode: The kernel can forgo any weight pre-packing for the given input_index by setting is_packed to false and returning a successful OrtStatus. In this mode, the kernel's OrtKernelImpl::SetSharedPrePackedWeight() function is not called for that specific input_index.
Sharing mode: Sharing is allowed if the prepacked_weight_cache argument is not NULL and the EP stores weight data in CPU-accessible memory. In this case, the kernel can optionally choose to share the packed weight with other kernels that use the same weight (compared by content hash). To do so, the kernel must allocate the packed weight with the provided allocator, then it stores the packed weight data into prepacked_weight_cache via SharedPrePackedWeightCache_StoreWeightData(), sets is_packed to true, and returns a successful OrtStatus. ORT will subsequently call OrtKernelImpl::SetSharedPrePackedWeight() to provide this kernel with the actual shared weight data, whose memory location could differ (i.e., if shared data was allocated by a previously processed kernel).
Non-sharing mode: In non-sharing mode, the prepacked_weight_cache argument is ignored. In this mode, the implementation allocates the packed data with the provided allocator, sets is_packed to true, and returns a successful OrtStatus. The kernel is ultimately responsible for releasing the packed data for the weight with allocator. ORT may release the original (unpacked) weight, which must not be accessed in OrtKernelImpl::Compute(). Note that in this mode, the kernel's OrtKernelImpl::SetSharedPrePackedWeight() function is not called by ORT for that specific input_index.

\note This function is based on the internal OpKernel::PrePack() virtual function used within ORT.

\paramin this_ptr The OrtKernelImpl instance. \paramin tensor The OrtValue instance representing the constant tensor (weight). Do not cache in the kernel. \paramin input_index The input index of the tensor in this kernel. \paramin allocator Allocator for allocating the pre-packed data. Its use is required in sharing mode and recommended, but not required, in the non-sharing mode. This will be an allocator set by the application for the session/environment (e.g., via CreateAndRegisterAllocatorV2 or RegisterAllocator), or an allocator on the OrtEpDevice (read-only or default) otherwise. The allocator remains valid throughout the lifetime of the OrtKernelImpl instance. \paramin prepacked_weight_cache May be NULL. If not NULL, the kernel may choose to share a packed weight by first storing it in the OrtSharedPrePackedWeightCache instance and then receiving the actual shared weight data in the call to OrtKernelImpl::SetSharedPrePackedWeight(). See the above description for "sharing mode". \paramout is_packed Output parameter that the implementation sets to true if the kernel packed the tensor data.

\snippet{doc} snippets.dox OrtStatus Return Value

\note Implementation of this function is optional. If not implemented (set to NULL), ORT assumes the kernel does not pre-pack weight data (i.e., is_packed defaults to false).

\since Version 1.24.