The ops queue was hiding a search quality problem.
Two ML systems built for Zomato's restaurant catalogue. Cuisine classification at scale, automated thumbnail moderation. Both fixed search quality at the source.
Zomato's restaurant catalogue ran on a manual moderation queue. Cuisine tags from restaurant onboarding, user-uploaded menu thumbnails: every entry went through human review before it went live. The mandate was simple. Bring moderation headcount down.
I dug into the queue. Two categories dominated almost entirely: cuisine tag corrections and thumbnail moderation. Both were human-judgement work at scale.
The cuisine pattern was adversarial. Restaurant partners tagged listings with popular cuisines like North Indian or Chinese to surface in search, regardless of what they actually served. This, coupled with human error during moderation, resulted in users experiencing poor search quality.
Two categories accounted for the bulk of the queue. Both were ML-tractable. The brief was a cost lever, but the work pointed somewhere else.
Cutting the queue was the brief. What surfaced while solving it was bigger.
The cuisine tag moderation was the support queue's job. The queue was the gate between new tags and the search index. Gaming volume plus human error meant the gate was leaking.
Automating the moderation wasn't just a cost lever. Done right, it was a search quality lever. A model at the gate catches what humans miss, and catches it consistently.
Cost optimisation was the brief. Improving search quality was the second outcome. The job was to instrument both.
Two ML systems. One for cuisine. One for thumbnails. Both built to close the gate.
Two classifiers, both built to enforce the catalogue rather than clean it up after the fact. One for cuisine. One for thumbnails.
The fix: a model ranking cuisines per restaurant on:
Partners chose from recommendations during onboarding, not the open pool. Adversarial tagging stopped working: the model used menu data, not partner claims.
A multi-check image classifier across new uploads:
Violating images got rejected at upload; borderline cases routed to agents. Moderation moved from agent-led with model assist to model-led with agent verification.
Both shipped to production. Same architectural move in both cases: replace judgement at scale with classification at scale. The catalogue got cleaner. The queue got smaller.
Manual review dependency went from 100% to under 20%. Headcount came down. For those who remained, the work shifted. Moderation became verification.
Manual moderation didn't disappear. It moved up the funnel. ML decided first; agents verified the flags. Fewer tickets needed full review; the queue shrank.
The brief was met. The second outcome shipped with it.
The mandate was cost. The investigation surfaced a search quality problem the org wasn't tracking. I built the systems for both. When the data points past the brief, follow it.
Internally, I held both stories. Externally, I kept pitching the brief: cost. The search quality reframe never made it into the room. The work got deprioritised because stakeholders only ever saw the smaller story. Reframing internally is half the job.
Cost outcomes were instrumented. Model accuracy was instrumented. The downstream search quality lift, the actual user-facing win, wasn't. I'd do that scoping at the start next time, not as a regret.
When a brief lands, surface both the explicit ask and the second-order outcome before scoping. Instrument both at the start. Pitch both at the start. The reframe earns its place when it shapes prioritisation, not just delivery.