Zomato

The ops queue was hiding a search quality problem.

Two ML systems built for Zomato's restaurant catalogue. Cuisine classification at scale, automated thumbnail moderation. Both fixed search quality at the source.

ML at Scale
Role
Product Generalist
Timeline
1 month
Stack
ML systems, content moderation
Stage
Shipped at scale

Optimise headcount. That was the brief.

Zomato's restaurant catalogue ran on a manual moderation queue. Cuisine tags from restaurant onboarding, user-uploaded menu thumbnails: every entry went through human review before it went live. The mandate was simple. Bring moderation headcount down.

I dug into the queue. Two categories dominated almost entirely: cuisine tag corrections and thumbnail moderation. Both were human-judgement work at scale.

The cuisine pattern was adversarial. Restaurant partners tagged listings with popular cuisines like North Indian or Chinese to surface in search, regardless of what they actually served. This, coupled with human error during moderation, resulted in users experiencing poor search quality.

What the queue was telling me

Two categories accounted for the bulk of the queue. Both were ML-tractable. The brief was a cost lever, but the work pointed somewhere else.

Cutting the queue was the brief. What surfaced while solving it was bigger.

Solving for cost surfaced a search quality problem.

The cuisine tag moderation was the support queue's job. The queue was the gate between new tags and the search index. Gaming volume plus human error meant the gate was leaking.

Automating the moderation wasn't just a cost lever. Done right, it was a search quality lever. A model at the gate catches what humans miss, and catches it consistently.

Two outcomes, one job

Cost optimisation was the brief. Improving search quality was the second outcome. The job was to instrument both.

Two ML systems. One for cuisine. One for thumbnails. Both built to close the gate.

Two ML systems at the gate.

Two classifiers, both built to enforce the catalogue rather than clean it up after the fact. One for cuisine. One for thumbnails.

System 1

Cuisine classifier

Partner-facing pool: 150 cuisine tags. Search source-of-truth: 77. The gap was where gaming lived.

The fix: a model ranking cuisines per restaurant on:

  • Menu item count
  • Search demand
  • Order value contribution

Partners chose from recommendations during onboarding, not the open pool. Adversarial tagging stopped working: the model used menu data, not partner claims.

System 2

Thumbnail moderation

A multi-check image classifier across new uploads:

  • Blur detection
  • Food centring
  • Logo and watermark detection
  • Plating
  • Background
  • Dish completeness
  • Aspect ratio

Violating images got rejected at upload; borderline cases routed to agents. Moderation moved from agent-led with model assist to model-led with agent verification.

Both shipped to production. Same architectural move in both cases: replace judgement at scale with classification at scale. The catalogue got cleaner. The queue got smaller.

Cuisine tagging workflow: before and after the ML system

Reduced costs. Improved search quality.

Manual review dependency went from 100% to under 20%. Headcount came down. For those who remained, the work shifted. Moderation became verification.

100% → <20%
Manual review
dependency
89%+
Model
accuracy
-42%
Ticket
reduction

Manual moderation didn't disappear. It moved up the funnel. ML decided first; agents verified the flags. Fewer tickets needed full review; the queue shrank.

The brief was met. The second outcome shipped with it.

What worked. What I'd scope better.

Followed the data past the brief.

The mandate was cost. The investigation surfaced a search quality problem the org wasn't tracking. I built the systems for both. When the data points past the brief, follow it.

Pitched the brief, not the reframe.

Internally, I held both stories. Externally, I kept pitching the brief: cost. The search quality reframe never made it into the room. The work got deprioritised because stakeholders only ever saw the smaller story. Reframing internally is half the job.

I should have scoped the user impact.

Cost outcomes were instrumented. Model accuracy was instrumented. The downstream search quality lift, the actual user-facing win, wasn't. I'd do that scoping at the start next time, not as a regret.

What I'd take into the next room.

When a brief lands, surface both the explicit ask and the second-order outcome before scoping. Instrument both at the start. Pitch both at the start. The reframe earns its place when it shapes prioritisation, not just delivery.

Problem Reframing ML Product Management Stakeholder Management Ops at Scale
Next case study

Pawllo →

Product Teardown