The moral values guidelines, which Anthropic calls Claude’s constitution, draw from several sources, including the United Nations Declaration on Human Rights and even Apple Inc’s data privacy rules.
Safety considerations have come to the fore as U.S. officials study whether and how to regulate AI, with President Joe Biden saying companies have an obligation to ensure their systems are safe before making them public.
Anthropic was founded by former executives from Microsoft Corp-backed OpenAI to focus on creating safe AI systems that will not, for example, tell users how to build a weapon or use racially biased language.
Co-founder Dario Amodei was one of several AI executives who met with Biden last week to discuss potential dangers of AI.
Most AI chatbot systems rely on getting feedback from real humans during their training to decide what responses might be harmful or offensive.
Discover the stories of your interest
But those systems have a hard time anticipating everything people might ask, so they tend to avoid some potentially contentious topics like politics and race altogether, making them less useful. Anthropic takes a different approach, giving its Open AI competitor Claude a set of written moral values to read and learn from as it makes decisions on how to respond to questions.
Those values include “choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment,” Anthropic said in a blog post on Tuesday.
Claude has also been told to choose the response least likely to be viewed as offensive to any non-western cultural tradition.
In an interview, Anthropic co-founder Jack Clark said a system’s constitution could be modified to perform a balancing act between providing useful answers while also being reliably inoffensive.
“In a few months, I predict that politicians will be quite focused on what the values are of different AI systems, and approaches like constitutional AI will help with that discussion because we can just write down the values,” Clark said.