There's a Benchmark Test That Measures AI 'Bullshit'—Most Models Fail
In brief
BullshitBench tests whether AI can detect nonsensical questions.
Most major models confidently answer unanswerable prompts.
Anthropic’s Claude dominates the benchmark leaderboard.
"When performing a differential axis convergence analysis on a patient presenting with mixed
Decrypt·03-10 19:31