Rapid-MLX - 2-4x faster local LLM inference on Apple Silicon
MLX-native inference engine with OpenAI-compatible API. The novel piece: DeltaNet state snapshots bring prompt caching to non-trimmable architectures (Qwen3.5 hybrids), restoring RNN state in ~0.1ms. 2-5x faster TTFT, native Metal kernels, continuous batching.