Skip to content

Instantly share code, notes, and snippets.

@KjellKod
Created April 24, 2025 22:28
Show Gist options
  • Save KjellKod/b60416bb0b34cdcec45b73f395928db5 to your computer and use it in GitHub Desktop.
Save KjellKod/b60416bb0b34cdcec45b73f395928db5 to your computer and use it in GitHub Desktop.
mpsc lock free proof of concept for spdlog or g3log

g++ -std=c++17 -O2 -pthread main.cpp -o hybrid_logger

Hybrid SPSC Logger: Proof-of-Concept

📌 Overview

This project is a cross-platform, modern C++17 proof-of-concept demonstrating how to implement a high-performance, thread-friendly logging queue using hybrid SPSC (Single Producer Single Consumer) queues. It is designed to be a drop-in backend concept that could eventually integrate with loggers such as g3log, but this demo intentionally avoids any dependency on existing frameworks.

🧠 Key Concepts

Why Hybrid SPSC?

Traditional MPSC (Multi-Producer Single Consumer) lock-free queues offer good average performance, but exhibit unpredictable latency spikes under contention.

SPSC queues, on the other hand, are highly efficient and predictable due to no contention—but only work between one producer and one consumer.

This proof-of-concept creates one SPSC queue per producer thread, dynamically established and managed with thread_local.

The consumer thread round-robins through all active SPSC queues, efficiently aggregating logs with minimal latency and high predictability.

Key Benefits

Amortized performance: Each thread operates with no locking after initialization.

Predictability: No spikes from contention, unlike MPSC.

Automatic cleanup: Threads call setConnection() and tearDownConnection() to manage queue lifetimes.

Scalable: Can handle many producers efficiently.

⚖️ Comparison with Other Queue Models

🔄 vs. MPMC Lock-Free Queue (e.g., spdlog)

Pros of Hybrid SPSC:

Much more predictable latency profile (no high tail spikes).

Easier to reason about thread-to-queue relationships.

Minimal false sharing/cache invalidation due to SPSC isolation.

Cons Compared to MPMC:

Requires dynamic setup of per-thread queues.

Consumer has to actively poll multiple queues (vs. single dequeue).

Can become inefficient if many queues are mostly idle.

🔐 vs. MPMC Mutex-Protected Queue (e.g., g3log)

Pros of Hybrid SPSC:

Avoids locks entirely in steady-state after setup.

Scales far better with core count, especially under pressure.

Predictable even under high concurrency.

Cons Compared to Mutex-Based:

Slightly more complex to implement and debug.

Setup/teardown cost per thread, though amortized well.

Mutex-based queues can be simpler for small-scale applications with few producers.

⚙️ How It Works

Components

main.cpp — the core implementation and driver

spsc_circular_fifo.hpp — a lightweight, lock-free, fixed-size SPSC queue

Flow

Producer thread first calls log() → this calls setConnection() if needed

A new SPSC queue is created and added to the shared consumer pool

Subsequent log() calls go directly into the thread’s queue (no locking)

The consumer thread loops over all queues and pops available messages

When the thread completes its task, it calls tearDownConnection() to unregister its queue

Example Console Output

[INFO] Set up new thread-local queue connection on thread 140735301760768 Log: Thread 0 says hello 0 ... [INFO] Tear down called from thread 140735301760768 at time 56301230801290 [INFO] Removing queue from consumer on thread 123145302810624

🧪 How to Build and Run

macOS / Linux

g++ -std=c++17 -O2 -pthread main.cpp -o hybrid_logger ./hybrid_logger

Windows (Developer Command Prompt)

cl /std:c++17 /O2 /EHsc main.cpp main.exe

🔌 Potential Integration: g3log / spdlog / custom sinks

You could easily adapt this model to hook into existing logging libraries:

g3log: Replace the current lock-based queue in the LogWorker backend with a hybrid SPSC queue dispatcher

spdlog: Use the SPSC structure as a log sink that receives messages per-thread and dispatches them to the formatting/backend thread

Custom: Use it to send metrics, diagnostics, or messages to a real-time system without incurring blocking or contention

🧱 What Next?

Add a prioritized polling strategy to favor hot queues

Collect metrics (queue depth, drop rate, latency)

Expand for dynamic runtime scaling with thread-safe registry

This model offers a fresh, lightweight concurrency pattern suitable for high-performance logging, message dispatching, or real-time monitoring systems. The hybrid SPSC pattern can be the secret weapon for predictable, scalable thread communication in C++.

// main.cpp
// Proof-of-concept for a hybrid SPSC-based MPSC logging queue
// without using g3log. Cross-platform C++17+.
#include <iostream>
#include <thread>
#include <vector>
#include <atomic>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <chrono>
#include <condition_variable>
#include "spsc_circular_fifo.hpp"
using LogQueue = spsc::circular_fifo<std::string>; // Adjusted to match actual header
class HybridLogger {
public:
void setConnection() {
if (!producerQueue) {
std::lock_guard<std::mutex> lock(mutex_);
producerQueue = std::make_shared<LogQueue>(1024); // Specify size
{
std::lock_guard<std::mutex> vecLock(vecMutex_);
queues_.push_back(producerQueue);
}
std::cout << "[INFO] Set up new thread-local queue connection on thread " << std::this_thread::get_id() << "\n";
}
}
void tearDownConnection() {
if (producerQueue) {
std::lock_guard<std::mutex> lock(vecMutex_);
queuesToRemove_.insert(producerQueue);
std::cout << "[INFO] Tear down called from thread " << std::this_thread::get_id() << " at time " << std::chrono::steady_clock::now().time_since_epoch().count() << "\n";
producerQueue = nullptr;
}
}
void log(const std::string& message) {
if (!producerQueue) {
setConnection();
}
producerQueue->push(const_cast<std::string&>(message)); // Remove const to match queue signature
}
void runConsumer() {
while (running_) {
bool hadWork = false;
{
std::lock_guard<std::mutex> lock(vecMutex_);
for (auto it = queues_.begin(); it != queues_.end(); ) {
auto& q = *it;
std::string msg;
while (q->pop(msg)) {
std::cout << "Log: " << msg << std::endl;
hadWork = true;
}
if (queuesToRemove_.count(q)) {
std::cout << "[INFO] Removing queue from consumer on thread " << std::this_thread::get_id() << "\n";
it = queues_.erase(it);
queuesToRemove_.erase(q);
} else {
++it;
}
}
}
if (!hadWork) {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
}
void stop() {
running_ = false;
}
private:
thread_local static std::shared_ptr<LogQueue> producerQueue;
std::vector<std::shared_ptr<LogQueue>> queues_;
std::unordered_set<std::shared_ptr<LogQueue>> queuesToRemove_;
std::mutex vecMutex_;
std::mutex mutex_;
std::atomic<bool> running_ { true };
};
thread_local std::shared_ptr<LogQueue> HybridLogger::producerQueue = nullptr;
int main() {
HybridLogger logger;
std::thread consumer([&]() { logger.runConsumer(); });
std::vector<std::thread> producers;
for (int i = 0; i < 4; ++i) {
producers.emplace_back([i, &logger]() {
logger.setConnection();
for (int j = 0; j < 10; ++j) {
logger.log("Thread " + std::to_string(i) + " says hello " + std::to_string(j));
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
std::this_thread::sleep_for(std::chrono::milliseconds(100)); // Staggered teardown
logger.tearDownConnection();
});
}
for (auto& t : producers) {
t.join();
}
std::this_thread::sleep_for(std::chrono::seconds(1)); // Let consumer flush
logger.stop();
consumer.join();
return 0;
}
/*
* Not any company's property but Public-Domain
* Do with source-code as you will. No requirement to keep this
* header if need to use it/change it/ or do whatever with it
*
* Note that there is No guarantee that this code will work
* and I take no responsibility for this code and any problems you
* might get if using it.
*
* Code & platform dependent issues with it was originally
* published at http://www.kjellkod.cc/threadsafecircularqueue
* 2012-16-19 @author Kjell Hedström, [email protected]
* First approach for this was in 2007 I think, a significant update happend in 2009.
* I also wrote an article about it a few years later:
* https://kjellkod.wordpress.com/2012/11/28/c-debt-paid-in-full-wait-free-lock-free-queue/
*
* Modified from KjellKod's code at:
* https://github.com/KjellKod/lock-free-wait-free-circularfifo
*/
// should be mentioned the thinking of what goes where
// it is a "controversy" whether what is tail and what is head
// http://en.wikipedia.org/wiki/FIFO#Head_ortail__first
#pragma once
#include <atomic>
#include <cstddef>
#include <thread>
#include <vector>
#include <iostream>
namespace spsc {
template <typename Element>
class circular_fifo {
public:
explicit circular_fifo(const size_t size) :
kSize(size),
kCapacity(kSize + 1),
array_(kCapacity),
tail_(0),
head_(0) {
}
virtual ~circular_fifo() { std::cout << "cicular_fifo is destroyed\n"; }
bool push(Element& item);
bool pop(Element& item);
bool empty() const;
bool full() const;
size_t capacity() const;
size_t capacity_free() const;
size_t usage() const;
size_t size() const;
bool lock_free() const;
size_t tail() const { return tail_.load(); }
size_t head() const { return head_.load(); }
private:
typedef char cache_line[64];
size_t increment(size_t idx) const { return (idx + 1) % kCapacity; }
const size_t kSize;
const size_t kCapacity;
cache_line pad_storage_;
std::vector<Element> array_;
cache_line padtail_;
std::atomic<size_t> tail_;
cache_line padhead_;
std::atomic<size_t> head_; // head(output) index
cache_line padend_;
};
template <typename Element>
bool circular_fifo<Element>::push(Element& item) {
const auto currenttail_ = tail_.load(std::memory_order_relaxed);
const auto nexttail_ = increment(currenttail_);
if (nexttail_ != head_.load(std::memory_order_acquire)) {
array_[currenttail_] = std::move(item);
tail_.store(nexttail_, std::memory_order_release);
return true;
}
return false; // full queue
}
// Pop by Consumer can only update the head (load with relaxed, store with release)
// the tail must be accessed with at least aquire
template <typename Element>
bool circular_fifo<Element>::pop(Element& item) {
const auto currenthead_ = head_.load(std::memory_order_relaxed);
if (currenthead_ == tail_.load(std::memory_order_acquire)) {
return false; // empty queue
}
item = std::move(array_[currenthead_]);
head_.store(increment(currenthead_), std::memory_order_release);
return true;
}
template <typename Element>
bool circular_fifo<Element>::empty() const {
// snapshot with acceptance of that this comparison operation is not atomic
return (head_.load(std::memory_order_relaxed) == tail_.load(std::memory_order_relaxed));
}
// snapshot with acceptance that this comparison is not atomic
template <typename Element>
bool circular_fifo<Element>::full() const {
const auto nexttail_ = increment(tail_.load(std::memory_order_relaxed)); // aquire, we dont know who call
return (nexttail_ == head_.load(std::memory_order_relaxed));
}
template <typename Element>
bool circular_fifo<Element>::lock_free() const {
return std::atomic<size_t>{}.is_lock_free();
}
template <typename Element>
size_t circular_fifo<Element>::size() const {
return ((tail_.load() - head_.load() + kCapacity) % kCapacity);
}
template <typename Element>
size_t circular_fifo<Element>::capacity_free() const {
return (kCapacity - size() - 1);
}
template <typename Element>
size_t circular_fifo<Element>::capacity() const {
return kSize;
}
// percent usage
template <typename Element>
size_t circular_fifo<Element>::usage() const {
return (100 * size() / kSize);
}
} // namespace spsc
@KjellKod
Copy link
Author

Hybrid SPSC Logging Model Evaluation
Summary of Discussion: Hybrid SPSC Queue-Based Logging
Overview:

  • A hybrid SPSC logging system uses one SPSC queue per producer thread.
  • It avoids contention and high latency spikes typical of MPMC/MPSC models.
  • Consumer round-robins through all queues, collecting logs.
    Viability:
  • Technically viable and performant for high-concurrency scenarios.
  • Requires amortized setup but offers lock-free steady-state operation.
    Accuracy:
  • Claims in README about reduced contention and better predictability are mostly accurate.
  • Some challenges remain around idle queues and consumer-side efficiency.
    Performance vs spdlog:
  • Likely better under heavy contention.
  • Lacks advanced features like formatting, batching, or sinks.
  • Needs benchmarking to substantiate performance advantages.
    Technical Merits:
  • Predictable latency and scalable per-thread model.
  • Dynamic teardown via thread_local destructors is not always reliable.
    Limitations:
  • Cleanup via thread_local destructors fails in pooled or non-terminating threads.
  • Consumer-side cleanup is required to handle idle or stale queues.
  • Registry needs careful handling to avoid race conditions.
    Recommendations:
  • Combine thread_local RAII with explicit tearDownConnection().
  • Add periodic GC in the consumer loop for inactive queues.
  • Benchmark against spdlog to quantify improvements.
    Conclusion:
    This hybrid SPSC model is promising for systems needing predictable low-latency logging. With
    added instrumentation, dynamic cleanup, and benchmarking, it could outperform traditional logging
    queues in real-world high-performance systems.
    Code Examples
    Example: RAII-style Logging Setup
struct LoggerConnection {
     LoggerConnection() { setConnection(); }
    ~LoggerConnection() { tearDownConnection(); }
};
thread_local LoggerConnection loggerRAII;

//Example: Logging Function
void logMessage(const std::string& msg) {
    loggerRAII; // Ensures setConnection is called
    log(msg); // Logs to thread-local SPSC queue
}

//Example: Consumer-side Cleanup Logic
for (auto& q : allQueues) {
    if (q->isMarkedDead() || q->isIdleLong()) {
        removeQueue(q);
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment