Concurrency Model
Threading architecture, lock-free patterns, and async producer/consumer design for high-throughput event processing
Threading Model Overview
Key insight: Producers never block. The Channel's TryWrite returns immediately (true/false). If the channel is full, the oldest message is dropped — this is intentional for market data where freshness matters more than completeness.
The consumer is a single async loop using IAsyncEnumerable. This guarantees message ordering without locks, while still allowing efficient batching.
System.Threading.Channels<T>
Channels are the modern .NET primitive for high-performance producer/consumer scenarios. They replace BlockingCollection<T> with a fully async, allocation-efficient implementation.
Why not ConcurrentQueue? ConcurrentQueue has no backpressure, no async wait, and no notion of "full." Channels provide all three with better performance characteristics.
Lock-Free Patterns (Interlocked Operations)
Traditional locks (lock, Mutex, SemaphoreSlim) are unacceptable on the hot path in trading systems. They introduce:
- Thread contention under high load
- Priority inversion (low-priority thread holds lock needed by high-priority)
- Unpredictable latency spikes
- Potential deadlocks in complex systems
Instead, we use Interlocked operations — CPU-level atomic instructions (CMPXCHG, LOCK XADD) that complete in a single cycle without OS involvement.
Why power-of-2 buffer size (4096)? Bitwise AND (& 0xFFF) replaces integer modulo (% 4096). Modulo requires an expensive DIV instruction; AND completes in one cycle. This matters when recording millions of samples per second.
ArrayPool<T> — Zero-Allocation Buffers
Every new T[] allocates on the managed heap, eventually triggering GC. In latency-critical code, we rent pre-allocated buffers from the shared pool and return them after use.
Async Producer/Consumer Architecture
The entire pipeline is async/await-based with no blocking calls. This is crucial for scalability — blocking calls consume thread pool threads, which are a limited resource.
| PATTERN | USED HERE | ALTERNATIVE | WHY THIS IS BETTER |
|---|---|---|---|
| Channel<T> | Queue between producer/consumer | BlockingCollection<T> | Fully async; no thread blocking; bounded with backpressure |
| IAsyncEnumerable | Consumer loop (ReadAllAsync) | while(true) + WaitToReadAsync | Cleaner code; cooperative cancellation built-in |
| ValueTask | EnqueueAsync return type | Task | Avoids Task allocation when completing synchronously (99.9% of writes) |
| Interlocked | All counters and metrics | lock { counter++; } | CPU atomic; no OS involvement; no contention |
| BackgroundService | Long-running consumer + simulator | Task.Run / Thread | Graceful shutdown; health monitoring; DI integration |
Memory & Allocation Strategy
The goal is near-zero allocation on the hot path. Every allocation eventually becomes GC work, and GC work means latency jitter.
| COMPONENT | ALLOCATION STRATEGY |
|---|---|
| Latency sample buffer | Fixed long[4096] — allocated once at startup |
| Batch processing list | Pre-allocated List<T>(50) — reused via Clear() |
| Percentile computation | ArrayPool<long>.Shared — rented and returned |
| Trade signal records | record struct candidates for future optimization (currently record class for SignalR compat) |
| SignalR serialization | System.Text.Json with source generators (future: MessagePack) |