According to OpenAI's official announcement on June 20, the company released LifeSciBench, a new evaluation benchmark designed to assess AI systems on real-world scientific research tasks. The benchmark comprises 750 expert-written tasks spanning 7 research workflows and 7 biology domains, created by 173 Ph.D.-level researchers with biotech or pharmaceutical industry experience.
Over 79% of tasks require multi-step reasoning, averaging approximately 4 reasoning steps per question. The benchmark includes 1,062 real research data attachments such as papers, charts, sequence data, and structural files, emphasizing complex research capabilities including evidence integration, experimental design, data analysis, scientific reasoning, and research communication rather than simple factual questions.