At Paradigm’s Autoresearch Hackathon, a competitor who had virtually “not designed a strategy in person” ultimately took the championship. The winner, Ryan Li—also the CEO of SurfAI—said that nearly the entire problem-solving process was completed by AI, that he even “didn’t know how he won,” yet still secured first place in the Prediction Market Challenge.
The competition required participants to design a market-making strategy in a simulated binary prediction market: provide liquidity in the order book through limit orders, and achieve a balance of profits between “arbitrageurs” and “retail flow.” The final rankings were determined by the average edge (profit advantage) across 200 random simulations. Ryan’s final score was a $42.32 mean edge (calculated as the median among three sets of random seeds), and after re-rating, he topped the leaderboard.
Claude Code + Codex automated research produced 1,039 strategies
Unlike traditional quant trading or market-making strategies that rely on human experts tuning and modeling, Ryan adopted the “Bitter Lesson” approach proposed in recent years by Rich Sutton—letting computational power and search scale beat human experience. He converted the entire problem into an “automated research” (autoresearch) process, exploring the solution space through multiple AI agents in parallel rather than manually optimizing.
Throughout the process, he used 8 to 20 parallel-running AI agents (primarily based on Claude Code, with additional help from Codex). Each agent was responsible for different assumptions and parameter spaces, continuously generating strategies, running simulations, and reporting results. In the end, he accumulated 1,039 strategy variations, conducted more than 2,000 evaluations, and automatically generated 47 parameter-scan scripts. The overall search scale is equivalent to compressing weeks of manual experiments into a few hours.
A 900-line Python market-making algorithm generated by AI won the hackathon
At the strategy level, the final winning solution was a market-making algorithm of roughly 900 lines of Python. Its core logic did not come from a single design, but from stacking multiple “proven effective” modules. These include avoiding the extremely narrow bid-ask spread zones that arbitrageurs can win consistently against; estimating the true price via information theory; dynamically adjusting quote sizes according to arbitrage risk; and proactively entering to capture high-profit regions when the opponent’s order book gets emptied.
The most crucial breakthrough came from an AI agent that “completely abandoned existing strategies and started from scratch.” When overall optimization stalled at around +25 edge, the agent independently discovered a sizing model centered on “the probability of arbitrage risk,” lifting performance in one step to +44—turning point of the entire competition. This result also directly confirmed Ryan’s methodology: when search gets stuck in local optima, restarting is more effective than fine-tuning.
The absolute advantage of AI research: automated trial and error
In his summary, Ryan said the key to this competition was not designing a “smart strategy,” but building a system that can search at scale, validate ideas, and eliminate them. Rather than relying on human intuition, let AI try things in a vast solution space, and amplify efficiency through parallelization and automation.
This case also further reinforces the shift in the role of “agentic AI” in engineering and research workflows. AI is no longer just an assisting tool; it can directly serve as the core execution unit for exploration and decision-making. In some highly structured, simulatable problems, humans can even completely step out of the role of “problem solver,” and instead design the search framework and evaluation mechanisms themselves.
Claude Code automated research won the hackathon! Winner: I honestly have no idea how I won. First appeared on ChainNews ABMedia.