Detecting and reducing scheming in AI models

September 17, 2025 Steve

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and discovered behaviors per scheming in managed assessments throughout frontier models. The crew shared concrete examples and stress assessments of an early technique to cut back scheming.

You May Also Like

Best Practices for Ensuring Data Accuracy and Quality with Air Quality and Weather APIs

Don’t Become a Commoditized Data Scientist

how to build a machine learning model