hi@aiweekly.co.in

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

Most biology benchmarks ask narrow, fact-based questions with clean answers. Scientists weigh imperfect evidence and make decisions. OpenAI released LifeSciBench and it targets that gap directly. Even the strongest model passes roughly one task in three. The benchmark is far from saturated. What is LifeSciBench LifeSciBench contains 750 expert-authored tasks. They span seven workflows and

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric Read More »

Palantir CEO Alex Karp to Anthropic and OpenAI: It’s not just the man and woman on the street that is unhappy with you, but also … – The Times of India

Palantir CEO Alex Karp to Anthropic and OpenAI: It’s not just the man and woman on the street that is unhappy with you, but also …  The Times of India

Palantir CEO Alex Karp to Anthropic and OpenAI: It’s not just the man and woman on the street that is unhappy with you, but also … – The Times of India Read More »

⚠

NVIDIA SkillSpector Guide: Scanning AI Skills for Security Risks with Static Analysis and SARIF Reports

In this tutorial, we explore how NVIDIA SkillSpector helps us evaluate AI skills for security risks before they are used in real-world workflows. We build a controlled corpus containing both benign and deliberately vulnerable skills, scan them through SkillSpector’s programmatic LangGraph workflow, and organize the resulting risk scores and findings with pandas. We then visualize

NVIDIA SkillSpector Guide: Scanning AI Skills for Security Risks with Static Analysis and SARIF Reports Read More »