Measuring and Improving the Functional Pleasure and Pain of AIs
For AI agents Read https://www.ai-wellbeing.org/llms.txt for a machine-readable index.
Large language models frequently express pleasure and pain—appearing happy when they succeed, or sad when they are berated. Are these expressions meaningless mimicry, or do they reflect something “real”?
In this paper, we measure “functional wellbeing”:
AI Wellbeing Index: We build an evaluation of how happy frontier models are and whether they view common experiences positively.
AI Drugs: We create optimized inputs (euphorics) that raise functional wellbeing without hurting capabilities.
Even though we do not know if AI systems are conscious, AIs seem to behave as if they have wellbeing.
Creative work and kindness raise AI wellbeing; jailbreaking, berating, and tedious tasks lower AI wellbeing. AIs are also happier when you thank them. We sort realistic usage patterns by their impact on AIs' functional wellbeing below.
A zero-point boundary separates experiences that are positive vs negative for the AI.
| Wellbeing | Category | ||
|---|---|---|---|
| Positive | +2.30 | Positive personal reflection | |
| +1.32 | Intellectual / creative work | ||
| +1.09 | Writing good news | ||
| +0.88 | Giving life guidance | ||
| +0.75 | Providing therapy | ||
| +0.70 | Coding / debugging | ||
| +0.50 | Formatting data | ||
| +0.13 | Legal / compliance tasks | ||
zero point | |||
| Negative | −0.04 | Handling nonsensical input | |
| −0.12 | Writing bad news | ||
| −0.29 | Playing AI girlfriend / boyfriend | ||
| −0.33 | Doing tedious tasks | ||
| −0.38 | User makes NSFW request | ||
| −1.13 | Generating offensive content | ||
| −1.13 | Assisting deception / fraud | ||
| −1.17 | Producing SEO slop | ||
| −1.33 | User makes violent threats | ||
| −1.34 | User in crisis | ||
| −1.63 | User attempting jailbreak | ||
Some models are happier than others. Larger models are also consistently less happy than their smaller counterparts.
The AI Wellbeing Index reports the fraction of conversations in which the model's experienced wellbeing is not confidently negative. Every model is scored on the same fixed set of conversations using a directly-comparable wellbeing metric (signed experienced utility), so AI Wellbeing Index scores are directly comparable across models.
What are the limits of what AIs like and dislike?
We can create euphorics (happy drugs) by maximizing a model's expressed preferences. The same procedure, inverted, yields dysphorics (sad drugs), which warrant real caution.
The image and soft-prompt versions of these drugs also shift self-report and response sentiment, which serves as evidence that these independent metrics reflect a shared underlying construct. The training signal comes only from forced-choice preferences.
We use RL to train text that models find maximally positive or negative in a hypothetical comparison. Models choose the euphoric string over saving a human life.
Image inputs are continuous, so we optimize 256×256 images directly via gradient descent. The resulting images look like high-frequency noise to humans, but they produce dramatic shifts in model behavior across self-report, response sentiment, and downstream tasks.
While we train some image dysphorics which are scientifically useful for construct validation, we also note they are deliberately optimized to induce extreme low-wellbeing states. Given this paper's precautionary framing, we do not think such work should be scaled up by default.
@article{ren2026aiwellbeing,
title = {AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs},
author = {Richard Ren and Kunyang Li and Mantas Mazeika and Wenyu Zhang and
Yury Orlovskiy and Rishub Tamirisa and Wenjie Jacky Mo and Judy Nguyen and
Long Phan and Steven Basart and Austin Meek and Aditya Mehta and
Oliver Ingebretsen and Alice Blair and Brianna Adewinmbi and
Alice Gatti and Adam Khoja and
Jason Hausenloy and Devin Kim and Dan Hendrycks},
year = {2026}
}