Generic AI agent + SQL
beacon
A metric moves
Writes a confident narrative about it.
Runs it through three gates before it can surface.
Many plays tested at once
Reports everything with p < 0.05 — textbook p-hacking.
Demands the effect hold in the same direction across 28-, 56- and 90-day windows.
A 12-customer cohort converts
“40% conversion — high confidence.”
Fisher's exact test. Small samples can't fake significance.
Projecting revenue
“Industry benchmarks say 20%” → projects $50k off an unsourced number.
Unvalidated priors are refused. No defensible prior, no dollar claim.
“30% of customers are lapsed”
Recommends a winback. That's a population stat, not evidence the campaign works.
Measures reactivation of the lapsed cohort against baseline — intervention, not description.
A month with weak evidence
Still produces ten ideas. It always does.
Abstains — and shows the typed reason for every play it held.
One question exposes the difference: “When do you say nothing?”
An agent prompting over SQL has no concept of abstention, audience materiality, or validated priors. beacon was built around them — see how beacon decides.