[LOG] [CWLCompositorResource] New wl_surface with id 44 at 55b760b26480 [LOG] [CXDGWMBase] New xdg_positioner at 55b760499710 [LOG] [CXDGWMBase] New xdg_surface at 55b760a71830 [LOG] [CXDGSurfaceResource] xdg_surface 55b760a71830 gets a popup 55b760221f80 owner 0 [LOG] CWLSurface 55b760b28980 called init() [LOG] GetPosition with constraint [Vector2D: x: 0, y: 0] [Vector2D: x: 2580, y: 1080] and parent [Vector2D: x: 0, y: 0] [LOG] [CXDGPopupResource] Popup 55b760221f80 gets unconstrained to [Vector2D: x: 2428, y: 34] [Vector2D: x: 148, y: 134] [LOG] New popup at 55b760033f20 [LOG] [CXDGSurfaceResource] xdg_surface 55b760a71830 requests geometry 4x4 148x134 [LOG] [CWLCompositorResource] New wl_region with id 49 at 55b76035ec40 [LOG] popup 7f0378001160: mapped [ERR] [EGL] Command eglDupNativeFenceFDANDROID errored out with EGL_BAD_PARAMETER (0x12300): eglDupNativeFenceFDANDROID [ERR] eglDupNativeFenceFDANDROID failed [ERR] renderer: Explicit sync failed, releasing resources [CRITICAL]

ASSERTION FAILED!

Aborting, glGetGraphicsResetStatus returned GL_INNOCENT_CONTEXT_RESET. Cannot continue until proper GPU reset handling is implemented.

at: line 740 in OpenGL.cpp

Thanks for the log—this is a “Wayland explicit‑sync → EGL native fence → GL context lost” chain. Here’s what it’s telling you and how to fix it.

What the log means

Wayland / xdg‑popup setup succeeded. All the [CWL*]/[CXDG*] lines are routine Wayland protocol activity (creating a wl_surface, xdg_positioner/surface, popup geometry). They’re not the failure.
Explicit GPU sync path failed at eglDupNativeFenceFDANDROID.
```
[ERR] [EGL] ... eglDupNativeFenceFDANDROID ... EGL_BAD_PARAMETER
[ERR] eglDupNativeFenceFDANDROID failed
[ERR] renderer: Explicit sync failed, releasing resources
```
eglDupNativeFenceFDANDROID duplicates (exports) a file descriptor for a sync fence. The driver returns EGL_BAD_PARAMETER if the sync object is invalid or it doesn’t currently have a native fence FD to export (e.g., wrong type, wrong display, or it hasn’t been created yet). Microsoft’s Android EGL bindings (which mirror the Khronos spec) list these exact error cases. (Microsoft Learn)

A subtle spec detail: when you create an EGL_ANDROID_native_fence_sync object without importing an existing FD (i.e., you’re creating a release fence), the native fence isn’t created until the next flush; trying to dup before that flush can yield “no native fence,” causing this failure. (registry.khronos.org)
Your GL context reports GL_INNOCENT_CONTEXT_RESET. That means the driver detected a GPU reset not attributable to your current context (could be another process, or a driver fault). The OpenGL robustness API specifies these return values and requires you to recreate GL objects after a reset. Your code asserts instead:

“Cannot continue until proper GPU reset handling is implemented.” Per glGetGraphicsResetStatus, you must tear down and rebuild the context and resources on a reset (guilty, innocent, or unknown). (docs.gl)
This failure pattern is seen in real compositors. Hyprland maintainers note the same error string and explain two common causes: an invalid fence from a client (so dup fails) or a separate driver crash that makes the compositor hit the reset path. Your log mirrors their output. (GitHub)

Likely root causes (ranked)

Dup before flush when creating a release fence (no FD exists yet). The spec says the native fence is created at the flush following fence creation; dup’ing too early produces an error. (registry.khronos.org)
Wrong sync object / display passed to eglDupNativeFenceFDANDROID (e.g., sync not created on that EGLDisplay, or destroyed). These trigger EGL_BAD_PARAMETER. (Microsoft Learn)
Missing or partial extension support (EGL_ANDROID_native_fence_sync, EGL_KHR_wait_sync). If the extension/function is present but the underlying driver path is incomplete, you can get BAD_PARAMETER at runtime. (NVIDIA, for example, only exposes this extension in certain setups; see the driver notes about enabling it with nvidia-drm.modeset=1.) (NVIDIA Developer Forums)
Client provided a bad acquire fence (invalid FD), leading your compositor to try to dup an invalid sync.
True GPU reset induced by another context/process; your code then sees INNOCENT_CONTEXT_RESET and aborts. (docs.gl)

Fix it: hardened explicit‑sync path (server/compositor side)

1) Feature detection & fallbacks

At startup, check for these EGL extensions on your EGLDisplay:
- EGL_ANDROID_native_fence_sync
- EGL_KHR_wait_sync (or EGL_ANDROID_wait_sync, which is effectively the same) If either is missing, disable explicit sync and fall back to your implicit-sync path (or a conservative glFinish/copy path). (Android Open Source Project)

2) Correct handling of the client’s acquire fence

// acquire_fd is from zwp_linux_surface_synchronization_v1.acquire_fence
if (acquire_fd >= 0) {
    EGLint attrs[] = {EGL_SYNC_NATIVE_FENCE_FD_ANDROID, acquire_fd, EGL_NONE};
    // eglCreateSyncKHR TAKES OWNERSHIP of acquire_fd on success — do not close it yourself.
    // (Chromium notes this explicitly in their EGL fence utils.)
    EGLSyncKHR in_sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, attrs);
    if (in_sync) {
        // GPU-side wait
        eglWaitSyncKHR(dpy, in_sync, 0);
        eglDestroySyncKHR(dpy, in_sync);
    } else {
        // If import fails, fall back: CPU wait on fd or block until safe.
    }
}

Chromium’s code comments call out that importing a fence via EGL_SYNC_NATIVE_FENCE_FD_ANDROID transfers FD ownership to EGL on success; don’t reuse or close it. (Chromium Git Repositories)

3) Creating & exporting the release fence (the common pitfall)

// After all GL writes to the buffer are enqueued:
EGLSyncKHR out_sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);

// Ensure the fence is actually created in the native driver:
// the spec says the native fence is created at the *flush following* creation.
glFlush();  // or eglSwapBuffers if appropriate

int release_fd = eglDupNativeFenceFDANDROID(dpy, out_sync);
// On error, the call returns -1 and sets EGL_BAD_PARAMETER if no native fence exists.
eglDestroySyncKHR(dpy, out_sync);

if (release_fd >= 0) {
    // Send via zwp_linux_buffer_release_v1.fenced_release(release_fd)
} else {
    // Fallback: send immediate_release or avoid zero-copy.
}

That flush-before-dup step is required by the extension language (native fence is created at the next flush). Dup’ing before the flush commonly yields the BAD_PARAMETER you’re seeing. (registry.khronos.org)

4) Wayland side is correct? Make sure you’re using zwp_linux_surface_synchronization_v1/zwp_linux_buffer_release_v1 correctly: wait on the acquire fence before sampling, and return a fenced_release for the commit that requested it (or an immediate_release as a fallback). (Wayland App)

Implement robust GPU reset handling (don’t assert)

Right now you abort when glGetGraphicsResetStatus() reports GL_INNOCENT_CONTEXT_RESET. Instead:

Create a robust context so you’re notified of resets and allowed to read back error-free:
- Set EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR = EGL_LOSE_CONTEXT_ON_RESET_KHR
- Add EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR to EGL_CONTEXT_FLAGS_KHR (if supported) These tokens come from EGL_KHR_create_context and map to the OpenGL robustness behavior. (GitHub)

At safe points (e.g., start of a frame), check:

GLenum rs = glGetGraphicsResetStatus();
if (rs != GL_NO_ERROR) {
    // Log whether it was GUILTY / INNOCENT / UNKNOWN
    // Release current objects/surfaces and destroy the context
    eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
    // Recreate context, recompile shaders, re-create textures/FBOs/VAOs, etc.
    // Rebind and resume.
    return; // skip the frame
}

The spec requires reinitializing objects after any reset; treat it as recoverable runtime, not a fatal assert. (docs.gl)

Quick triage checklist

Verify extension availability at runtime (eglQueryString(dpy, EGL_EXTENSIONS)) for EGL_ANDROID_native_fence_sync and EGL_KHR_wait_sync. If missing, don’t call the native-fence entry points; go to the fallback path. (Android Open Source Project)
Ensure correct object/display pairing when calling eglDupNativeFenceFDANDROID and flush before dup when you created the out fence without an FD. (Microsoft Learn)
Mind FD ownership rules. When you import a fence FD via eglCreateSyncKHR(..., EGL_SYNC_NATIVE_FENCE_FD_ANDROID, fd, ...) and it succeeds, EGL consumes the FD (don’t close/reuse). Only the FD returned by eglDupNativeFenceFDANDROID is yours to send/close. (See Chromium’s EGL fence utils.) (Chromium Git Repositories)
Instrument and isolate:
- Run with WAYLAND_DEBUG=1, EGL_LOG_LEVEL=debug, and MESA_DEBUG=1 to see explicit‑sync and EGL errors (Mesa).
- Watch dmesg -w for GPU hang/reset messages while reproducing; if you see resets, treat them as driver/environmental and verify you can recover via the robust path.
Vendor specifics: If you’re on NVIDIA and expect EGL_ANDROID_native_fence_sync, ensure modesetting is enabled (nvidia-drm.modeset=1) on drivers that gate the extension behind it. (NVIDIA Developer Forums)

Minimal patterns (drop‑in)

Import client’s acquire fence & wait on GPU:

if (acquire_fd >= 0) {
    EGLint attrs[] = {EGL_SYNC_NATIVE_FENCE_FD_ANDROID, acquire_fd, EGL_NONE};
    EGLSyncKHR in_sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, attrs); // takes FD ownership
    if (in_sync) { eglWaitSyncKHR(dpy, in_sync, 0); eglDestroySyncKHR(dpy, in_sync); }
}

(If in_sync is null, validate the FD and fall back.) (Chromium Git Repositories)

Create release fence correctly:

EGLSyncKHR out_sync = eglCreateSyncKHR(dpy, EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
glFlush(); // ensure native fence is created per spec
int release_fd = eglDupNativeFenceFDANDROID(dpy, out_sync); // -1 + EGL_BAD_PARAMETER if no native fence
eglDestroySyncKHR(dpy, out_sync);

Send release_fd via zwp_linux_buffer_release_v1.fenced_release; otherwise send immediate_release. (registry.khronos.org)

Why this will stop the crash

You’ll no longer call eglDupNativeFenceFDANDROID in states where no native fence exists yet (fixing the BAD_PARAMETER). (registry.khronos.org)
If a fence from a client is invalid—or a driver hiccup occurs—you’ll fall back cleanly rather than asserting.
When the GPU/driver does reset (even innocently), you’ll rebuild the GL context and resources as required by the robustness spec instead of aborting. (docs.gl)

If you want, paste your explicit‑sync code paths for acquire and release and I’ll mark up the exact lines that need changes.

SamSaffron/reply.md