Skip to content
Mobrief
Primary Source

From the organization making the announcement. Primary source, but consider their incentives.

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters...

2-Minute Brief
  • According to Apple Machine Learning: With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show
Read Original

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

TLDR

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters...

2-Minute Brief
  • According to Apple Machine Learning: With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show
Open
O open S save B back M mode