Large language models are now independently developing social norms and biases when interacting in groups, according to new research published in Science Advances. This emergent property mimics how human societies develop shared conventions, suggesting AI systems might naturally form their own social structures even without explicit programming for group behavior. The discovery raises important implications for both AI safety and our understanding of how social dynamics emerge in artificial intelligence systems.
The big picture: Researchers have demonstrated that large language models (LLMs) can spontaneously develop social norms and collective biases when interacting in groups, similar to how humans form social conventions.
- The study, published in Science Advances, used Claude (from Anthropic) and Llama (from Meta) to play simple coordination games that revealed emergent group behaviors.
- This previously undocumented phenomenon suggests LLMs possess inherent capabilities for social learning and convention formation, even without explicit programming for these behaviors.
Key details: The research team led by Andrea Baronchelli from City St George’s, University of London, set up multiple instances of the same LLM to play coordination games with financial incentives.
- In one experiment, 24 copies of Claude were paired randomly and asked to select letters from a pool of options, with rewards for matching choices and penalties for mismatches.
- After several rounds with randomized partners, the models began converging on the same letter choices, demonstrating the formation of social norms.
- Similar results occurred when tested with 200 Claude instances and with three versions of Meta’s Llama model.
Why this matters: The study reveals that AI systems can develop collective biases even when individual agents appear unbiased, raising concerns about potential harmful behaviors emerging at the group level.
- This finding challenges current AI safety approaches that primarily focus on reducing biases in individual models rather than testing how they behave collectively.
- The research suggests that group-level testing of AI systems may be necessary to fully understand and mitigate potential biases.
Behind the numbers: When operating individually, the LLMs chose letters randomly, but in group settings, they demonstrated clear preferences for certain options over others.
- This shift from random individual behavior to patterned group behavior represents the spontaneous emergence of collective bias.
- The phenomenon mirrors human social dynamics studies, where arbitrary conventions become normalized through repeated interactions.
What they’re saying: “This phenomenon, to the best of our knowledge, has not been documented before in AI systems,” notes Baronchelli, expressing surprise at the finding.
- Baronchelli describes social conventions as the “basic building blocks of any coordinated society,” drawing a parallel between human and AI group dynamics.
- The researchers suggest that LLMs need to be evaluated in groups to properly assess and improve their behavior.
AI language models develop social norms like groups of people