See the data in action
10,000+ brands, 300K+ sponsorship signals — searchable and filterable. Try it free.
Start Free — No Card RequiredA deep dive into our two-stage detection pipeline — regex pattern matching followed by LLM verification — that processes 2M+ videos daily with 95%+ accuracy.
10,000+ brands, 300K+ sponsorship signals — searchable and filterable. Try it free.
Start Free — No Card RequiredEvery day, thousands of YouTube creators publish videos with sponsorship deals embedded in their descriptions. Affiliate links, promo codes, "sponsored by" mentions, tracking URLs — they're all there, hiding in plain text.
The question is: how do you extract structured data from unstructured video descriptions at scale?
We didn't start with an LLM. We started with regex.
The first pass processes every video description through a pattern matching engine. We look for:
This stage is fast, cheap, and catches ~80% of obvious sponsorships. But it misses nuanced mentions and can't identify the specific brand from a generic tracking URL.
Videos flagged with high confidence in Stage 1 go to the LLM for brand identification and relationship classification. The model:
We use structured output with strict JSON schemas to ensure consistent, parseable results.
After 6 months of running this pipeline:
We're exploring transcript analysis as a third detection stage — catching verbal sponsor mentions that never appear in the description. Early tests show this could increase detection by another 15-20%.
This is part of our Engineering series where we share how SponsorTrace is built.