A puzzling result from a recent paper on multi-agent AI: more capable, reasoning-enabled language models cooperate LESS in social dilemmas than older, weaker ones. CoopEval takes the puzzle seriously and tests four classic mechanisms for restoring cooperation — repeated play, reputation, mediation, and binding contracts — across six modern LLMs. Without any mechanism, welfare collapses to 7% of optimal; contracts pull it back to 80%. The cross-domain parallel: this is, in miniature, the evolutionary story of human institutions under increasing scale, from small-band reciprocity to courts and contract law — and the gap that remains is the question of what 'binding' even means for an AI agent.