OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric
Most biology benchmarks ask narrow, fact-based questions with clean answers. Scientists weigh imperfect evidence and make decisions. OpenAI released LifeSciBench and it targets that gap directly. Even the strongest model passes roughly one task in three. The benchmark is far from saturated. What is LifeSciBench LifeSciBench contains 750 expert-authored tasks. They span seven workflows and […]










