windows_gpu_recovery

pub.dev pub points license: MIT platform: Windows

Flutter Windows plugin that recovers from EGL_CONTEXT_LOST / DXGI_ERROR_DEVICE_REMOVED after system sleep, Hyper-V save/restore, TDR timeout, or GPU driver reset. Without this plugin, the app freezes permanently with a white screen.

The recovery does NOT work when a debugger is attached (VS Code flutter run, F5, or any external debugger). This is a Win32 debugging model limitation — the debugger intercepts exceptions before the plugin can handle them. Run the compiled exe directly for testing and production.

Demo

https://github.com/user-attachments/assets/b769f469-1701-487c-9459-facba1dde675

What it does

After GPU device loss, a Flutter Windows app freezes forever — rendering stops, the window goes white, Windows marks it "Not Responding". CPU logic (Dart isolate, timers, method channels) continues working, but no frames are drawn.

This plugin:

  1. Detects GPU device loss via a persistent sentinel D3D11 device (2-second polling)
  2. Destroys the dead Flutter engine with crash protection (Vectored Exception Handler catches ANGLE cleanup crashes)
  3. Recreates a fresh FlutterViewController with new ANGLE display, new D3D device, new Skia context
  4. Preserves state — app saves state to SharedPreferences before recreation, restores on startup

The window stays open. The process stays alive. User sees a brief white flash (~1-3 seconds), then the app is back with restored state.

Integration

1. Add the dependency

dependencies:
  windows_gpu_recovery:
    path: ../windows_gpu_recovery  # or git URL

2. Modify windows/runner/flutter_window.cpp

Add the WM_GPU_RECOVERY handler. This is the only C++ change needed:

#include <windows_gpu_recovery/gpu_recovery_message.h>

// In FlutterWindow::MessageHandler, BEFORE the Flutter message dispatch:
if (message == WM_GPU_RECOVERY) {
  // Write a recovery marker file for the new Dart VM.
  // (Your state-saving logic here — or use SharedPreferences from Dart side)

  // Destroy old engine — VEH catches ANGLE cleanup crashes.
  // eglTerminate clears the per-process Display singleton.
  flutter_controller_.reset();

  // Poll until GPU is ready (adapter recovered).
  // D3D11CreateDevice every 200ms, max 10 seconds.
  // (See example_with_recovery for full polling implementation)
  Sleep(3000);  // simple version

  // Create fresh engine.
  RECT frame = GetClientArea();
  flutter_controller_ = std::make_unique<flutter::FlutterViewController>(
      frame.right - frame.left, frame.bottom - frame.top, project_);
  RegisterPlugins(flutter_controller_->engine());
  SetChildContent(flutter_controller_->view()->GetNativeWindow());
  flutter_controller_->ForceRedraw();
  return 0;
}

3. Persist state on Dart side

The Dart VM restarts on recreation (new isolate). SharedPreferences (on disk) survives:

// Save state continuously (every significant change):
final prefs = await SharedPreferences.getInstance();
await prefs.setInt('counter', _counter);

// On startup, check for recovery marker:
final exeDir = File(Platform.resolvedExecutable).parent.path;
final marker = File('$exeDir/gpu_recovery.marker');
if (marker.existsSync()) {
  marker.deleteSync();
  final savedCounter = prefs.getInt('counter') ?? 0;
  // Restore state, show banner, etc.
}

4. No Dart-side initialization needed

The plugin works entirely at the native C++ level. Just adding the dependency is enough — the sentinel device and VEH are set up automatically on plugin registration.

Testing

Recovery does NOT work under a debugger. Always test by running the exe directly.

Triggering GPU device loss

dxcap -forcetdr (recommended — clean, repeatable):

:: Run as Administrator. Part of Windows SDK / Visual Studio.
dxcap -forcetdr

Install: C:\Program Files (x86)\Windows Kits\10\bin\<version>\x64\dxcap.exe.

Other methods:

  • System sleep — close laptop lid or Start > Sleep (real hardware only, not VMs)
  • Hyper-V — Save VM + Start VM from host (no WM_POWERBROADCAST delivered)
  • Device Manager — Disable/Enable GPU adapter (not over RDP)
  • PowerShellDisable-PnpDevice / Enable-PnpDevice (as Administrator)

A/B test procedure

  1. flutter build windows --release in both example and example_without_recovery
  2. Launch both exe files directly (not via VS Code)
  3. Click the counter several times
  4. Run dxcap -forcetdr as Administrator
  5. Observe:
    • Without plugin (red theme): frozen white screen, counter lost
    • With plugin (green theme): brief white flash, counter restored, green banner

Logs

The example app writes to gpu_recovery.log next to the exe:

[GPU_RECOVERY] Sentinel D3D device created
[GPU_RECOVERY] Plugin initialized
[GPU_RECOVERY] Watchdog started (2 s interval)
[GPU_RECOVERY] Device loss detected — activating exception handler
[GPU_RECOVERY] Posting WM_GPU_RECOVERY
[GPU_RECOVERY] WM_GPU_RECOVERY received
[GPU_RECOVERY] Destroying old engine...
[GPU_RECOVERY] Old engine destroyed
[GPU_RECOVERY] GPU ready after 200 ms
[GPU_RECOVERY] Creating new engine...
[GPU_RECOVERY] Plugin initialized
[GPU_RECOVERY] Engine recreated successfully

What survives vs what restarts

SURVIVES (same process):            RESTARTS (new engine):
  Win32 window (same HWND)            Dart VM (new isolate)
  C++ process (same PID)              Riverpod/Provider state
  SharedPreferences (on disk)         Navigation stack
  FlutterSecureStorage (on disk)      In-memory caches
  Files, network connections          Stream subscriptions

Project structure

windows_gpu_recovery/
  lib/windows_gpu_recovery.dart           — library declaration (no Dart API needed)
  windows/
    windows_gpu_recovery_plugin.cpp       — sentinel device + VEH + watchdog
    windows_gpu_recovery_plugin.h
    include/.../gpu_recovery_message.h    — WM_GPU_RECOVERY constant
    CMakeLists.txt                        — links dxgi, d3d11
  example/                               — counter app + plugin + state restore
  example_without_recovery/               — counter app, no plugin (freezes)

How it works internally

Detection: sentinel D3D11 device

A persistent D3D11 device is created on the same DXGI adapter that ANGLE uses (FlutterDesktopViewGetGraphicsAdapter). This sentinel device stays alive for the plugin's lifetime. When the GPU resets, GetDeviceRemovedReason() on the sentinel returns DXGI_ERROR_DEVICE_REMOVED — permanently, even after the adapter recovers (~100ms).

A freshly created test device would succeed (the adapter is back), so polling with temporary devices gives false negatives. The sentinel device shares the fate of ANGLE's internal device — both get the same DEVICE_REMOVED status.

Recovery: VEH-protected engine recreation

The problem

flutter_controller_.reset() calls the engine destructor → eglTerminate → ANGLE cleanup → Renderer11::release() → tries to Release() dead D3D COM objects → ACCESS_VIOLATION crash.

Without crash protection, the process dies. With .release() (leak without destructor), eglTerminate is never called → the per-process ANGLE EGLDisplay singleton stays in mInitialized = true state → new engine gets the same dead display.

The solution

A Vectored Exception Handler (VEH) is installed via AddVectoredExceptionHandler. It catches EXCEPTION_ACCESS_VIOLATION inside flutter_windows.dll's address range and skips the faulting instruction:

  1. Decodes x64 instruction length (handles REX prefixes, ModRM, SIB, displacements)
  2. Advances RIP past the crashing instruction
  3. Sets RAX=0, RDX=0 (fake null/success return values)
  4. Sets EFLAGS ZF=1 (dead pointers treated as null in subsequent conditional jumps)

With VEH active, flutter_controller_.reset() runs the full destructor. ANGLE's cleanup partially crashes (VEH catches), but eglTerminate reaches Display::mInitialized = false — clearing the singleton. The new FlutterViewController calls eglInitialize, which sees mInitialized == false and performs a full reinitialization: new Renderer11, new D3D11Device, new EGL contexts.

Why approach 6 (without VEH) failed

In approach 6, .release() was used to avoid crashes. The engine was leaked, eglTerminate never ran, the Display singleton stayed poisoned. The VEH (developed for approach 9's binary patching experiments) turned out to be the critical enabler — it lets the destructor run safely, which clears the singleton.

Debugger limitation

When a debugger is attached, Windows sends first-chance exceptions to the debugger before the VEH. The Dart debug adapter treats any unhandled exception as fatal and terminates the process. This is a Win32 debugging model constraint, not a plugin bug. The plugin works correctly when no debugger is attached.


Investigation log — for those who enjoy a detective story

The problem

Flutter Windows applications become permanently unresponsive after any event that triggers a D3D11 device removal. The Flutter engine has no recovery path. Issue flutter#124194 has been open since April 2023 at P2 priority with no fix or assigned engineer.

We conducted an exhaustive investigation: 11 approaches, 5 binary patches to flutter_windows.dll, a custom x64 instruction length decoder, a Vectored Exception Handler, and interactive reverse engineering with x64dbg.

The rendering stack

Layer 7: Vsync scheduler (DWM)     — DEAD (swap chain gone)
Layer 6: Flutter frame scheduling   — IDLE (no vsync = no frames)
Layer 5: Skia GrDirectContext       — ABANDONED (auto-detected)
Layer 4: EGL Context (mContextLost) — PERMANENTLY LOST
Layer 3: EGL Display (mDeviceLost)  — PERMANENTLY LOST
Layer 2: ANGLE Renderer11 (D3D11)   — REMOVED
Layer 1: D3D11 adapter (hardware)   — RECOVERED in ~100ms

Approach 1: InvalidateRect

Force repaint. Result: eglMakeCurrent returns EGL_CONTEXT_LOST. Can't paint with dead context.

Approach 2: WM_POWERBROADCAST + WM_SIZE

Intercept resume from sleep, force resize. Result: swap chain resized but EGL context is dead. Also, WM_POWERBROADCAST not delivered in Hyper-V.

Approach 3: RunOnSeparateThread

UIThreadPolicy::RunOnSeparateThread. Result: fixes performance regression (#178916), not context loss.

Approach 4: SW_HIDE/SW_SHOW on Flutter child HWND

Plugin accesses FlutterDesktopViewGetHWND. Hide/show triggers surface recreation. Result: new surface on dead context → still fails.

Approach 5: Sentinel D3D11 device

Create persistent device on ANGLE's adapter, poll GetDeviceRemovedReason(). Works for detection. Temporary test devices give false negatives (adapter recovers in ~100ms).

Approach 6: FlutterViewController recreation

Destroy old controller, create new. Finding: .reset() crashes (ANGLE cleanup). .release() leaks but new engine gets same dead EGLDisplay singleton. Dead end without VEH.

Approach 7: Binary patching — ANGLE recovery (5 patches)

ANGLE has restoreLostDevice() infrastructure. Blocked by two checks:

Patches 1-2: NOP isResetNotificationEnabled check in restoreLostDevice:

Pattern: 89 c1 b0 01 ba 0e 30 00 00 84 c9 75
Patch:   75 XX → 90 90

Patches 3-5: Skip isContextLost checks in ValidateContext (block before ValidateDisplay):

Patch 3: 0x00b33247: 74 → EB
Patch 4: 0x00b3585b: 74 → EB
Patch 5: 0x00b387f5: 74 → EB

Result: restoreLostDevice reached but Renderer11::resetDevice()release() crashes on dead COM objects.

Approach 8: ForceRedraw loop

After device loss, DWM stops vsync. ForceRedraw bypasses via ScheduleFrame. Result: frames scheduled but all fail on dead ANGLE context. With patches, release() crashes.

Approach 9: Vectored Exception Handler

Custom x64 instruction decoder catches ACCESS_VIOLATION inside flutter_windows.dll. Skips faulting instructions with ZF=1. Result: 300+ AVs caught, app stays alive, but release() + initialize() compiler-inlined → skipping release skips initialize too.

Approach 10: x64dbg reverse engineering

Found GpuSurfaceGLSkia::AcquireFrame by error string reference. Traced to GrDirectContext* at [this+0x10]. Found fAbandoned at [sub+0xD8] — already true. Found releaseResourcesAndAbandonContext — never called. Manually invoked via register injection — early return (already abandoned). Nobody creates replacement GrDirectContext.

Approach 10b-c: Code cave + JMP patch

Found release() + initialize() inlined into 630-byte function. Crash at +431 (cmp [r13+0x38E0], 0). Found conditional je +0x23 at +424 that skips to initialize. Patched to jmp (one byte). Result: release skipped, but initialize reads dead Renderer11 fields → VEH returns 0 → D3D11CreateDevice gets wrong params.

9 barriers deep

# Barrier How we broke it
1 restoreLostDevice blocked Patch: NOP jnz
2 isContextLost blocks first Patch: je → jmp
3 Vsync dead ForceRedraw loop
4 release() crashes VEH
5 release+initialize inlined Patch: skip release
6 initialize reads dead fields VEH returns 0
7 Skia auto-abandoned x64dbg confirmed
8 EGLDisplay singleton ANGLE architecture
9 Renderer11 layout unknown No PDB/RTTI

Approach 11: The breakthrough

Instead of fixing ANGLE from below (9 barriers), attack from above: destroy the entire engine and create a new one. Approaches 6 failed because .release() skipped eglTerminate. But the VEH from approach 9 lets .reset() run safely — the destructor calls eglTerminate, which clears Display::mInitialized, which lets the new engine reinitialize ANGLE from scratch.

The VEH that failed for approach 9 (couldn't fix ANGLE internally) became the critical enabler for approach 11 (protects the destructor during engine recreation).


What a proper Flutter engine fix would look like

This plugin is a workaround — it restarts the Dart VM. A proper engine fix (~200 lines) would preserve the VM:

  1. egl/surface.cc — detect EGL_CONTEXT_LOST specifically, signal to engine
  2. egl/manager.cc — add Reinitialize() with SEH-protected eglTerminate
  3. flutter_windows_engine.ccHandleDeviceLost(): suspend raster thread, GrDirectContext::abandonContext(), reinitialize EGL, recreate compositor
  4. flutter_window.cc — handle WM_POWERBROADCAST

References

License

MIT

Libraries

windows_gpu_recovery
Flutter Windows plugin for recovering from EGL_CONTEXT_LOST / DXGI_ERROR_DEVICE_REMOVED after system sleep or GPU driver reset.