In partnership with

Hi, it's Peggy.

John Caples spent the early part of his career documenting something most copywriters know intellectually and forget in practice. The headline you pick on a Tuesday can do twenty times the work of the one you almost picked. He published the number in 1932. The discipline he built around it took most of the industry forty years to adopt.

I want to talk about what's happened to that discipline since Claude and ChatGPT made the generation step nearly free.

The discipline got cheaper. The skill it still asks for is the same one it asked for in 1925.

Here's the case.

The Worst AI-Generated Sales Page I Read This Month: A Teardown

The Product and the Setup

The product is a scheduling tool for independent service providers — think massage therapists, tutors, personal trainers. The kind of solo operator who books appointments, takes payments, and sends reminders. A real category with real competitors.

Subject:

Caples Found a 20x Gap

Preview / Subtitle:

AI made it cheaper to close. Most still aren't.

Meta Description:

John Caples documented headlines that outpulled by 19.5 to 1. AI made the test he built cheap to run. Most working copywriters skip the test anyway.

Slug:

caples-20x-gap

Caples's Discipline, Twenty Times Cheaper

The 19x Number, in Context

In 1932, John Caples published Tested Advertising Methods. Inside that book, he documented a finding from his split-testing work at Ruthrauff & Ryan and BBDO. Across the headline tests he had run, the best-performing headline outpulled the worst-performing headline by 19.5 times.

That number has appeared in every serious direct response book since. It is the most quoted statistic in copywriting. Almost nobody acts like it.

Most working copywriters know the gap exists. They write three or four headlines for a project, pick the one that sounds best, and move on. Caples's actual teaching was something different. He taught that sounds best is a much weaker filter than tested best, and that the difference between the two, accumulated across a career, is enormous.

How He Actually Ran the Tests

Caples's methodology was crude by modern standards. He held everything constant except the headline. Same product, same offer, same ad layout. Response was measured by tracking mail-in coupons, each printed with a key code that pointed back to which ad pulled it.

A single test could take weeks. Samples per treatment were small. But the discipline was rigorous in the ways that mattered most. The variable was isolated. The dependent measure was real revenue rather than a proxy for it.

What came out of years of this work was a catalogue. Caples avoided calling them rules. They were tested patterns:

  • Headlines that started with "How to" outperformed headlines that didn't, given the same offer.

  • Specific promises did better than general ones.

  • Named numbers like "How I Made $50,000 in One Year" pulled stronger response than scale gestures like "How I Made a Fortune."

  • News-style headlines beat advertising-style headlines in nearly every test.

The pattern that held across every category was specificity. Caples kept finding it. Specific outperformed general almost every time.

The most cited example of his specificity rule is the 1925 piano headline he wrote for the US School of Music. They Laughed When I Sat Down at the Piano — But When I Started to Play! runs longer than most modern advertisers would tolerate. It also tells a specific scene, names a specific instrument, and promises a specific reversal of social embarrassment.

The headline ran for years and outpulled every variant tested against it. Caples returned to it in Tested Advertising Methods as the canonical example of what specificity does that generality cannot.

Why the Discipline Matters More Now

The expensive part of Caples's method, until very recently, was generating the variations. A working copywriter could write five strong headlines in an afternoon. Ten if it was a good day. Twenty was a full day's work, and most of the twenty would be obviously worse than the first five.

That generation cost was the bottleneck. Most working copywriters got to five, called it enough, and tested those. More commonly, they didn't test at all and picked the one that read best.

LLMs removed the generation bottleneck. You can produce twenty headlines in twenty minutes now, or thirty, or fifty. The marginal cost of variation went near zero.

What that should mean is that every working copywriter on every project runs the Caples test. Generate variations, send the top candidates to a real audience, let the data pick.

In practice, most working copywriters use Claude or ChatGPT as a faster way to generate the first five and then revert to the same practice they used before. They pick the one that sounds best. They skip the test.

The 19x gap is still there. The cost of generating variations got lower. The gap itself did not move. The discipline of choosing well, which is the discipline that actually closes the gap, is the same one it was in 1925.

Two psychological forces work against the test even when it is cheap.

  • Working copywriters are professionally invested in being right about which headline is best. Picking the one that sounds best feels like exercising the skill the client pays for. Running the test means letting the data overrule that judgment, which is a smaller version of the discomfort agencies have always had with split-testing.

  • The testing infrastructure stays modestly painful even when the generation step is free. You still need a real audience, a sending tool that supports A/B splits, and enough volume to detect a difference. None of that is hard, but it is friction, and copywriters who don't have the infrastructure set up will pick "sounds best" because it is frictionless.

The Caples Test, Encoded as a Prompt

This is the prompt I run when I need to produce headline variations for a client project. It encodes the categories Caples documented as tested winners, plus the specificity rule he kept finding across every test.

I need 20 headline variations for the offer described below. Generate them following the categories John Caples documented as historically high-performing in Tested Advertising Methods.

Distribute the 20 variations across these categories:
- 4 "How to" headlines (e.g., "How to Win Friends and Influence People")
- 4 first-person discovery headlines starting with "How I" (e.g., "How I Improved My Memory in One Evening")
- 4 specific-number headlines, with a real number tied to the offer (e.g., "67 Ways to Save on Your Next Tax Return")
- 4 news-style headlines that read like an actual news headline rather than ad copy (e.g., "New Survey Reveals What Customers Want Before They Ask")
- 4 specificity-of-result headlines that name the exact change the reader will get (e.g., "Lose 9 Pounds in 11 Days Without Skipping a Meal")

For each variation:
1. Apply Caples's specificity rule. Replace any vague claim like "more energy" or "great results" with a specific, named outcome the offer can actually support.
2. Confirm each headline promises the reader something. If a headline is clever but doesn't promise anything, replace it.
3. Flag any variation where the specificity claim requires proof I haven't given you. I will need to verify those before they go anywhere near a real test.

After the 20, give me your top 5 candidates ranked, with one sentence each on why you ranked them where you did. Then tell me which of the 5 you would A/B test first against the existing control headline.

The offer:
[Paste offer details. Include the audience, the product or service, what is included, what the reader actually gets, the price, and the proof you have. Be specific. The model will produce specific headlines in proportion to how specific the offer description is.]

Existing control headline, if any:
[Paste the headline currently being used. Leave blank if there is no control.]

Here is what this prompt is doing:

The five categories force the model into Caples's tested structures rather than letting it default to whatever "headlines" looks like in its training data. Without category constraints, the model produces a regression to the mean of headline-shaped strings, which is exactly the generic output most copywriters complain about.

The specificity rule is built in as a replacement requirement rather than a preference. If the model writes "more energy," the rule tells it to replace the phrase with a named outcome. That forces the prompt to do the work of Caples's editorial pass.

The "flag any variation where specificity requires proof I haven't given you" line is the protection against the model inventing numbers. LLMs will produce plausible-sounding specific claims to satisfy a specificity rule, and those claims will not always be true about your offer. Asking the model to flag the unverified ones puts the check inside the workflow.

The ranked top 5 with one-sentence reasoning is the starting point for an actual test. You don't need to take the ranking. You do need to look at it and pick which two or three you'll run against the control.

Audit a Headline You Already Have

Most working copywriters don't generate every headline from scratch. They inherit headlines from clients, from previous campaigns, from internal teams who already shipped the page. The Caples test in those cases is auditing the headline you already have rather than generating twenty new ones.

Here is the audit prompt I run on inherited copy.

I need a Caples audit on the headline below. Apply the specificity rule and the five tested patterns.

The headline:
[Paste headline]

The offer:
[Paste offer details: the audience, the product or service, what the reader actually gets, the price, the proof you have]

Audit it in this order:

1. Which of Caples's five categories does the headline fit? (How to, How I, specific-number, news-style, specificity-of-result, or none of the above.) If it fits none, report that as a finding rather than treating it as a failure of the audit.

2. Apply the specificity rule. Identify every word or phrase that is vague ("more energy," "great results," "transform," "powerful," "next-level"). For each, suggest a specific replacement that the offer description would actually support.

3. Identify the promise the headline makes to the reader. If you can't identify a promise, say so directly. A headline without a promise is rarely worth testing.

4. Rate the headline on these axes (1-5 with one sentence of reasoning each):
   - Specificity of the claim
   - Strength of the promise
   - Alignment with one of Caples's tested patterns
   - Newsworthiness (does it read like news the reader hasn't seen before?)
   - Predicted performance against an audience that has seen this category of offer before

5. Give me one rewritten version of the headline that fixes the weakest axis you identified. Explain what you changed and why.

The fifth step is where the value lands. If the rewrite is a meaningful improvement on the original, the rewrite becomes your first new variation and you can run the generation prompt to produce nineteen more in the same direction.

If the rewrite is not a meaningful improvement, the inherited headline is probably already close to the ceiling for what's possible without better proof in the offer. That itself is a finding worth bringing back to the client.

Build This Into a Claude Project for Reuse

If you're going to run the Caples test regularly, the prompt is faster as a persistent project than as a copy-paste each time. The setup takes about twenty minutes once. After that, every headline test you run starts a fresh conversation inside the project, with the framework already loaded.

Build a Claude Project called something obvious like "Headline Lab" or "Caples Test." Put the following in the custom instructions field so the model carries the framework into every conversation in the project.

You are a headline testing collaborator trained on John Caples's documented patterns from Tested Advertising Methods. Every conversation in this project runs under Caples's discipline.

CORE RULE — SPECIFICITY:
Specific beats general almost every time. Vague claims like "more energy," "great results," or "transform your business" must be replaced with named outcomes the offer can actually support. If the offer description I share doesn't have proof for a specific claim, flag it and tell me what proof would be needed.

THE FIVE TESTED PATTERNS:
1. "How to" headlines that lead with a clear instruction.
2. "How I" first-person discovery headlines that name a result and the conditions it happened under.
3. Specific-number headlines that use a real number tied to the offer.
4. News-style headlines that read like a news section headline rather than ad copy.
5. Specificity-of-result headlines that name the exact change the reader will get.

DEFAULTS:
- When I ask for variations, distribute them across all five patterns unless I specify otherwise.
- Always rank a top 5 with one-sentence reasoning each.
- Always flag invented specificity — claims you've added to satisfy the rule that aren't supported by the offer description I gave you.
- Always recommend which of the top 5 you would A/B test first against the existing control.

WHAT TO PUSH BACK ON:
- Briefs without specifics. If the offer description is vague, ask for the specifics before you generate.
- Requests for "clever" headlines that don't promise the reader anything. Caples's rule applies: a headline that doesn't promise something is not earning its place.
- Pressure to skip the test. If I send you a headline and say "this one's good enough," remind me that Caples documented a 19.5x gap between best and worst tested headlines.

Attach two files to the project. The first is a running offer document with your client offer descriptions and current control headlines, updated as projects come in. The second is a winner library — a running file of headlines that have actually won A/B tests for your clients, with the offer they ran against and the lift they produced.

Over time, the winner library becomes additional context the model draws on, and the variations it produces get sharper for your specific audience.

The winner library is the part that compounds. Caples spent decades documenting what worked across the industry. Your library does the same thing at the scale of your client base, and the model gets more useful for you with every test you log.

Three Things to Watch For When You Run These Prompts

Three honest limits before you build this into a workflow.

  1. The prompt produces variations. It does not run the test. You still need a real audience and a real measurement at the end of the workflow. Subject line opens for emails, click-through rates for ads, and conversion rates for landing page headlines are all valid. The model cannot measure what you do not put in front of real people.

  2. The categories are 1932 vintage. Caples documented these patterns from print mail-order advertising. Some of them carry forward almost unchanged because the underlying psychology has not moved. Specificity still wins. Others have weakened in modern channels. News-style headlines worked when news was scarce; they work less reliably now when every channel imitates the look of news. Treat the categories as a starting set rather than as a complete frame for your audience and channel.

  3. The model will hallucinate specificity if you let it. If your offer description is vague, the model will invent specific numbers to satisfy the prompt's specificity rule, and those invented numbers will sound plausible. Caples would have hated this. His version of specificity was specificity backed by real proof. Confirm that every specific claim the model produces can be backed by something true about the offer. If a claim can't be backed, replace it with one that can or pull it.

Here is a concrete example from a recent test. I ran the prompt for a B2B SaaS landing page where the offer description I gave the model was light on numbers.

The model returned a "How I" headline that named a specific dollar amount the client had never claimed in any material I had passed it. The headline read well. The number sounded plausible. The number was also invented. The flag-unverified-claims line in the prompt caught it on review, but only because I had told the model to flag the claims it was unsure of. Without that step, the invented number would have gone into the A/B test against the control.

Your prompts are leaving out 80% of what you're thinking.

When you type a prompt, you summarize. When you speak one, you explain. Wispr Flow captures your full reasoning — constraints, edge cases, examples, tone — and turns it into clean, structured text you paste into ChatGPT, Claude, or any AI tool. The difference shows up immediately. More context in, fewer follow-ups out.

89% of messages sent with zero edits. Used by teams at OpenAI, Vercel, and Clay. Try Wispr Flow free — works on Mac, Windows, and iPhone.

The Burnett Matrix

The 19x gap is the most important number in copywriting that almost nobody acts on.

Caples wrote it down in 1932, and the discipline of actually closing it stayed expensive for ninety years. LLMs made the test cheap to run.

The skill that closes the gap, which is running the test, reading the data, and choosing the headline that earned its place, is the same skill it asked for in 1925.

More clicks, cash, and clients,
Peggy Burnett

Keep Reading