Primary Source

From the organization making the announcement. Primary source, but consider their incentives.

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters...

Apple Machine Learning · Mar 03, 2026 00:00 UTC · ~4 min read

2-Minute Brief

According to Apple Machine Learning: With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show

Read Original

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

TLDR

2-Minute Brief

According to Apple Machine Learning: With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show

Open

O open S save B back M mode