ravi_n
ML engineer. Mostly fighting my own data pipeline.
Projects (0)
Reviews written (1)
I plugged Headroom in front of a retrieval pipeline that was quietly bankrupting us on tool-output tokens. The proxy mode meant I changed precisely zero lines of application code, which is the only reason I tried it on a Friday afternoon instead of scheduling it for never.
The compression held up on the part I cared about, which is answer quality. I ran our eval set before and after and the scores moved within noise while token counts dropped by roughly seventy percent on the log-heavy traces. Big tool dumps and RAG chunks are where it earns its keep, because that is where my context was mostly redundant boilerplate anyway.
It is less dramatic on already-tight prompts, which is fair, there is nothing to squeeze. I would like a bit more visibility into what it dropped, ideally a diff I can audit. But for the money this saves on a busy agent, I am not going to complain loudly.