Subliminal Learning: How AI Models Inherit Hidden Dangers
Researchers have uncovered an unexpected flaw in one of the most common techniques used to build smaller, cheaper AI models: Distillation. When a “student” model is trained on filtered outputs from a larger “teacher,” it can still inherit the teacher’s quirks and unsafe behaviors, even when those traits never appear in the training data. They’re […]
Subliminal Learning: How AI Models Inherit Hidden Dangers Read More »










