A use-after-free bug in the GRPC PosixEventEngine causes crashes when LockfreeEvent::SetReady() dereferences corrupted pointers. The Epoll1EventHandle memory has been freed and reused, leading to invalid pointer dereferences.
Affected Version: GRPC 1.76.0 (Ruby bindings)
Impact: Multiple crashes observed in production Ruby applications.
#0 absl::lts_20250512::Status::operator= (this=0x10000000501013c)
at third_party/abseil-cpp/absl/status/status.h:779
#1 grpc_event_engine::experimental::PosixEngineClosure::SetStatus (this=0x100000005010104)
at ./src/core/lib/event_engine/posix_engine/posix_engine_closure.h:41
#2 grpc_event_engine::experimental::LockfreeEvent::SetReady (this=0x7bd5354fa988)
at src/core/lib/event_engine/posix_engine/lockfree_event.cc:236
#3 grpc_event_engine::experimental::Epoll1EventHandle::ExecutePendingActions (this=0x7bd5354fa960)
at src/core/lib/event_engine/posix_engine/ev_epoll1_linux.cc:122
#4 grpc_event_engine::experimental::Epoll1Poller::Work(...)
at src/core/lib/event_engine/posix_engine/ev_epoll1_linux.cc:445
#5 grpc_event_engine::experimental::PosixEventEngine::PollingCycle::PollerWorkInternal
#6+ ... WorkStealingThreadPool -> pthread_create
In one crash, the Epoll1EventHandle memory was reused to store a user-agent string:
(gdb) x/16s 0x7bd67bf628a0
0x7bd67bf628b0: "gl-ruby/"
0x7bd67bf628bb: ".7 gccl/2.11.1 gax/1.1.0 gapic/1.3.0 grpc/1.7"
The corrupted closure pointer 0x332f627020302e36 decodes to ASCII "6.0 pb/3" - part of the user-agent string that overwrote the handle memory.
In another crash, the handle's vtable pointer points to glibc's malloc arena:
(gdb) print *this # Epoll1EventHandle
$3 = {
_vptr.EventHandle = 0x7f6105a08b60 <main_arena+160>, # VTABLE POINTS TO MALLOC ARENA!
poller_ = 0xc6, # garbage
read_closure_ = {
state_ = 0,
thread_pool_ = 0x97 # garbage - CRASH CAUSE
},
write_closure_ = { thread_pool_ = 0x50 },
error_closure_ = { thread_pool_ = 0x100000001 }
}
This proves:
- The
Epoll1EventHandlewasdeleted - Memory was returned to malloc's free list
- malloc reused it for arena metadata
- A worker thread crashed accessing the stale pointer
In ev_epoll1_linux.cc, Epoll1Poller::Work() (lines 418-448):
Poller::WorkResult Epoll1Poller::Work(...) {
Events pending_events;
{
grpc_core::MutexLock lock(&mu_);
ProcessEpollEvents(..., pending_events); // Collects handle pointers
} // MUTEX RELEASED HERE
schedule_poll_again();
for (auto& it : pending_events) {
it->ExecutePendingActions(); // CRASH: handle may be freed
}
}ProcessEpollEvents()addsEpoll1EventHandle*raw pointers topending_events- Mutex is released
- Another thread orphans the handle (via
OrphanHandle()-> freelist ->Close()->delete) - Memory is reallocated for other purposes
- Worker thread calls
ExecutePendingActions()on freed memory
// lockfree_event.cc:235-237
auto closure = reinterpret_cast<PosixEngineClosure*>(curr);
closure->SetStatus(absl::OkStatus()); // CRASH: closure is garbage
thread_pool_->Run(closure); // CRASH: thread_pool_ is garbageThe crash occurs during normal operation (not shutdown) because Ruby GC destroys GRPC channels while EventEngine workers still hold handle pointers.
// rb_channel.c
static void grpc_rb_channel_free(void* p) {
grpc_channel_destroy(wrapper->channel); // Destroys C-core channel!
xfree(p);
}
static rb_data_type_t grpc_channel_data_type = {
"grpc_channel",
{grpc_rb_channel_mark, grpc_rb_channel_free, ...},
RUBY_TYPED_FREE_IMMEDIATELY // <-- Freed during GC, not deferred
};The EventEngine is a global singleton that outlives individual channels:
// default_event_engine.cc
std::shared_ptr<EventEngine> GetDefaultEventEngine() {
// Returns global engine - ONE instance shared by all channels
}T1 (Ruby Main Thread - GC):
1. Channel has no more Ruby references
2. Ruby GC collects channel
3. grpc_rb_channel_free() -> grpc_channel_destroy()
4. Channel's Epoll1EventHandle goes to freelist
5. Handle deleted or memory reused
T2 (EventEngine Worker Thread):
- Still has handle pointer in pending_events
- Calls ExecutePendingActions() on freed handle
- CRASH!
This appears related to grpc/grpc#19195. The test call_credentials_timeout_test.rb exercises a similar race condition.
-
Extend mutex scope: Hold the mutex through the
pending_eventsiteration -
Use shared_ptr for handles: Convert
pending_eventsto holdstd::shared_ptr<Epoll1EventHandle>instead of raw pointers -
Reference counting: Prevent handles from being freed while referenced in
pending_events -
Additional vulnerability:
PosixEndpointImplstores a rawpoller_pointer (posix_endpoint.h:593) with no lifetime guarantee. If the poller is destroyed while endpoints exist,OrphanHandle()will access freed memory atpoller_->posix_interface().
| File | Lines | Issue |
|---|---|---|
ev_epoll1_linux.cc |
440-445 | Mutex released before iterating pending_events |
ev_epoll1_linux.cc |
270-275 | Freelist allows handles to be deleted while referenced |
lockfree_event.cc |
235-237 | No validation before dereferencing closure pointer |
posix_endpoint.h |
593 | Raw poller_ pointer with no lifetime guarantee |
rb_channel.c |
68-78 | Ruby GC frees channel immediately via RUBY_TYPED_FREE_IMMEDIATELY |