REFERENCE · UPDATED QUARTERLY

What's actually possible this quarter.

Every playbook in The Lens has to be grounded in what AI can really do today, not what the keynote demo promised. This page is the honest read, by capability, with concrete examples of what holds up and what falls apart. We update it every quarter as the models shift.

Last updated 2026-06-29 · Next review Q3 2026

The model landscape this quarter

Four frontier models matter for production marketing work right now. Anthropic's Claude Opus 4.8 (28 May 2026) currently leads the Artificial Analysis Intelligence Index at 61.4, narrowly ahead of OpenAI's GPT-5.5 at 60.2, Google's Gemini 3.1 Pro at 57, and xAI's Grok 4.3 at 53. Gemini 3.5 Pro is shipping this month off the back of I/O, and OpenAI has a GPT-5.6 in the wings. The order changes roughly every quarter and the gaps are narrow, so we build playbooks model-agnostic and let the eval gates decide which model ships the work.

June made the case for that discipline. Claude Fable 5 landed on the 9th as the new reasoning leader and was pulled on the 12th under a US export-control order, three days of being the best model available followed by it not being available at all. Any team that had hard-wired a pipeline to a single named model spent that week rebuilding. The Lens playbooks name a model in the frontmatter as a sensible default, not a dependency, and the prompts run on whatever frontier model you have. When a model moves a capability rating below, we annotate the entry with the date and the evidence.

The grading

Each capability gets one of four ratings.

Production-ready, used on live client work without supervision beyond standard QA. The output ships.
Production-with-guardrails, works in the pipeline but needs an eval gate, a fact-check pass, or a human spot-check before it ships.
Experimental, useful for internal drafting or as a starting point a human heavily edits. Not yet shippable as the final artefact.
Not yet, the demo looks great. Real-world reliability is too low for production work. Don't promise it to a client.

Text generation and structured analysis

Production-ready for drafting, restructuring, summarising, classifying, extracting structured data, and cross-referencing corpora.

This is where AI compounds time hardest. Drafting a long-form piece from a structured brief, extracting a brand voice from a corpus, classifying 500 search queries by intent, surfacing contradictions across a brand's public surface, all of this works reliably with the right scaffolding. The Lens's drafting, voice, positioning and SEO-cluster playbooks all sit in this category.

Caveat. Factual claims need grounding. The model will assert plausible- sounding numbers and quotes if you don't constrain it. Every production drafting pipeline in The Lens runs a fact-grounded eval gate that verifies claims against approved sources.

Voice and tone matching

Production-with-guardrails. Reliable only if the voice profile is observable (sentence patterns, punctuation counts, lexical choices) rather than adjective-based ("warm, confident").

A draft scored against a 12-check voice rubric ships. A draft based on "write in our voice" without a profile produces generic copy every time. The brand-voice-extraction playbook builds the observable profile, and everything downstream relies on it.

Original research and fact-finding

Production-with-guardrails when paired with web search and explicit source citation. Never trust the model's "knowledge" for anything date-sensitive or industry-specific.

Race results, athlete bios, sponsorship deal terms, regulatory changes, these need real sources retrieved at run-time, not recall from training. The earned-media-pitch playbook hard-fails any reference that wasn't fetched and verified, precisely because a hallucinated journalist quote burns relationships permanently.

Image generation, generic subjects

Production-with-guardrails for context shots, mood imagery, hero backdrops, and lifestyle scenes with generic subjects.

Generating a wide shot of a road cyclist at dawn on a coastal switchback, a trail runner cresting a ridge, a swimmer at lap 60 of a session, this works. The output is usable for social backdrops, brand mood pieces, blog hero images, paid ad creative variants. Eval gate is human review for hand quirks, bike geometry, and gear plausibility.

Image generation, branded product

Not yet for product imagery where logo legibility, colourway accuracy, geometry, or model-specific details matter.

We've tested every frontier model on this every quarter for two years. None produce reliable brand-accurate product imagery. Wheelsets get the wrong spoke count. Logos warp. Decals end up in the wrong place. The shoe outsole pattern is generic. The kit's chevrons go the wrong way. For product hero shots, you still need a real shoot. AI is for the b-roll and the moodboard, not the product card.

Workaround that does work is to shoot the product once, then use image-to-image with the real product as the source to produce variants in different environments (different light, different terrain, different season). The model edits the surroundings, not the product. Eval gate is to pixel-diff the product region against the source.

Video generation, generic subjects

Production-with-guardrails for short clips (typically 4 to 10 seconds) of generic subjects in named environments.

A wide of a cyclist on a Tuscany hill at golden hour, a runner climbing wooden stairs in a forest, a swimmer's underwater push-off from a wall, these work. Useful for social cuts, paid creative variants, atmospheric brand films where named athletes and named products aren't required. Eval gate is human review for limb physics, water dynamics, and the recurring "wrong number of pedal rotations per crank stroke" issue.

Video generation, branded product or named athlete

Not yet. Same constraints as branded image generation, compounded across frames.

Generating video of your specific bike being ridden through an environment, with logos intact through every frame, with accurate geometry maintained through motion, doesn't work today. Logos warp. The bike morphs across frames. The kit colour drifts. Generating video of a named athlete who hasn't been licensed for AI use is legally fraught even where the model is convincing.

Workaround that does work is to shoot the athlete and the product practically. Use AI for the supplementary cuts that don't need the named athlete or product in frame, such as environmental shots, time-of-day variants, weather-condition variants for the same campaign.

Audio generation, synthetic voice

Production-with-guardrails for podcast pre-rolls, ad voiceover, internal training material. Disclose AI use where the audience expectation is human delivery.

Quality is good enough for production in most contexts. The ethical gate matters more than the technical one, because listeners feel misled if a brand pretends a synthetic voice is the founder's actual voice. Be honest. The brands that disclose are winning trust, and the brands that don't are losing it when found out.

Audio generation, music

Production-with-guardrails for short pieces. Track length and structural complexity push the limits fast.

15 to 60 second pieces for social cuts and reels work. Full-length original tracks for hero films still struggle with structural coherence past the 90-second mark. Licensing is also evolving, since the legal landscape for AI music in commercial use is changing every few months, and we re-check this section every quarter.

Browser automation and research

Production-ready for research, scraping, structured data extraction, and pipeline orchestration.

Pulling a journalist's last five articles, scraping a competitor's product range, gathering GSC data, classifying a corpus of reviews, all production-ready. The browser-using agents have gotten reliable enough that this is now a workhorse capability in most Lens playbooks.

Data analysis and visualisation

Production-ready for analytical reasoning over structured data. Generated charts need a human eye for axis sanity and label legibility.

Channel-mix simulators, attribution tear-downs, retention cohort analyses, all reliable with structured data input. The model writes the analysis script (Python, R, SQL), runs it, interprets the result. Just don't trust visualisations the model invents from prose, since the chart should come from the data, not from the description.

The honest summary for endurance brands

Most of the leverage is in analysis, drafting and orchestration. The model compounds time across positioning, voice, content production, channel-mix decisions, lifecycle journeys, race-result recaps, and the long tail of operational work that fills a marketing team's week.

The shoot still matters. Athletes, products, named team imagery, these are practical work. You still hire a photographer. You still fly to Tuscany. You still pay the athletes. AI augments the shoot's output (variants, social cuts, environmental b-roll) but doesn't replace the shoot itself.

Brands that pretend otherwise, with claims like "we used AI for our whole campaign, including the product hero", are either lying or shipping work that's hurting the brand. The brands actually winning treat AI as the most productive intern they've ever had, brilliant at the groundwork, supervised on the edge cases, and never trusted with the family silver until the eval gate proves they should be.

How this page evolves

We re-grade every capability quarterly. When a model lands that moves a rating, we annotate the entry with the date and the evidence. When a capability is downgraded, and yes, it happens, we explain what stopped working and what the workaround is.

Subscribers get the change log, and everyone gets the current state.

THE LENS

Honest scope. Real work.

Every playbook in the library is grounded in what's on this page. When the capability landscape shifts, the playbooks shift with it.

Browse the library About The Lens