AI Digest
← Back to all articles
OpenAI
·OpenAI·1 min read

OpenAI Releases Open-Weight AI Safety Models to Help Moderate Content

OpenAI has announced gpt-oss-safeguard, a pair of open-weight AI models designed specifically for content moderation and safety applications.

The two models—gpt-oss-safeguard-120b and gpt-oss-safeguard-20b—are built on top of OpenAI's existing gpt-oss foundation models. What makes them unique is their ability to evaluate content against custom policies provided by users. Rather than having fixed moderation rules, these models can "reason" through a given policy document to determine whether content violates specific guidelines.

This represents a significant shift in how AI safety tools work. Traditional content moderation systems typically rely on rigid, pre-programmed rules. These new models offer flexibility, allowing organizations to define their own policies and have the AI interpret and apply them to label potentially problematic content