Estimating worst case frontier risks of open weight LLMs

August 5, 2025 Steve

In this paper, we examine the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), the place we try and elicit most capabilities by fine-tuning gpt-oss to be as succesful as attainable in two domains: biology and cybersecurity.

You May Also Like

How to Reduce Cloud Costs: Tips from DevOps Engineers

Embracing digital twin expertise: A smarter approach to manufacturing

Free Google Cloud Learning Path for Gemini