Near-zero-cost captcha solving with a local vision model
Banking-portal monitor — personal project
- Problem
- A portal monitor kept hitting hCaptcha challenges. Solving every one through a cloud vision API would have meant an unbounded, metered bill for a task that runs on a schedule, indefinitely.
- Approach
- I built a local-first solver: a quantized Qwen3-VL vision model running on-GPU via exllamav3, wrapped in a hierarchical cache so repeat challenges never re-infer. Only the rare cases the local model is unsure about escalate to Claude vision. I benchmarked engines head-to-head (exllamav3 vs llama.cpp vs vLLM) at equal accuracy to pick the fastest.
- Outcome
- The winning engine solved a challenge in ~335 ms on local hardware — roughly 3x faster than the alternatives at the same accuracy — and the cache plus local inference cut the recurring cloud-API cost for captcha solving to near zero.
- Stack
Qwen3-VL-8Bexllamav3 (EXL3 4bpw)llama.cppClaude vision escalationPythonhierarchical cacheRTX 5090