Performance Overview
Comparison with Official Client
How Requests Are Processed
What happens behind client.get(key) / await client.get(key) differs across three patterns.
The calling thread is completely blocked during I/O, so only one request can be processed at a time. To handle concurrent requests, you must manage Python threading yourself.
- GIL Hold Time
- Concurrent Request Handling
- Throughput Scaling
- Memory Usage
The CPython GIL allows only one thread to execute Python code at a time. Each pattern handles GIL ownership differently.
Sync has the fewest GIL transitions, but the calling thread itself blocks during I/O, making concurrent processing impossible. Async executor enables concurrency but adds a GIL contender for each thread. Only aerospike-py achieves both minimal GIL usage and concurrent processing.
Sync cannot handle concurrent processing at all. Async executor is limited to 64 concurrent requests under a 512MB Pod memory limit. aerospike-py can handle tens of thousands of concurrent requests.
Sync has low single-request latency but cannot handle concurrency, making it unsuitable for server environments. Async executor adds concurrency but is bottlenecked by thread pool limits and GIL contention. aerospike-py is fast even for single requests, and is the only pattern where the gap widens as concurrency increases.
Benchmark Limitations
The current benchmark represents conditions where aerospike-py's advantage is smallest. Here we visualize what differences emerge in real production environments.
In a localhost environment, I/O wait is nearly zero, so the cost of thread blocking is effectively 0. When I/O wait increases on a real network, the gap between thread pool and Tokio models grows dramatically.
- I/O x Concurrency
- Memory Efficiency
- GIL Contention Patterns
Adjust the sliders to compare throughput changes between Official async (run_in_executor) and aerospike-py (Tokio).
At concurrency=50, the difference is minimal, but at 500+, Official's p99 spikes sharply. The current benchmark measures at 50, so this gap is not captured.
The current benchmark validates "Can we match C extension single-request latency?" The answer is "slightly faster". But the real reason aerospike-py exists is "exceeding the limits of thread-based models under high concurrency + real network conditions" — an area the current benchmark doesn't measure.
Results
- Benchmark Results — Interactive benchmark dashboard