Anthropic’s New Approach to Chatbots Gives AI “Values” and a Constitutional Framework

TL;DR:

  • Anthropic has developed a “Constitutional AI” training approach for its chatbot, Claude.
  • The technique aims to address concerns about transparency, safety, and decision-making in AI systems.
  • The approach conditions language models with a simple set of behavioral principles through Constitutional AI.
  • Anthropic’s approach is different from reinforcement learning from human feedback used by bots like OpenAI’s ChatGPT and Microsoft’s Bing Chat to avoid inappropriate behavior.
  • The principles in Claude’s constitution cover a wide range of topics, making it easier to understand and adjust the values of the AI system as needed.
  • The wording of AI training rules may become political talking points in the future, and tweaking constitutional rules could potentially make AI outputs harmful.
  • Anthropic’s focus on principles and values is a welcome development in the field, where ethical concerns are becoming increasingly important.

Main AI News:

Anthropic, an AI startup, has revealed its “Constitutional AI” training approach, which provides explicit “values” to its AI chatbot, Claude. The technique aims to address concerns about transparency, safety, and decision-making in AI systems without relying on human feedback to rate responses. By conditioning language models with a simple set of behavioral principles through Constitutional AI, Anthropic trains the AI models to respond better to adversarial questions without becoming obtuse or saying very little.

Anthropic’s approach is in contrast to reinforcement learning from human feedback (RLHF), which is currently used by bots like OpenAI’s ChatGPT and Microsoft’s Bing Chat to avoid inappropriate behavior. Researchers provide a series of sample AI model outputs to humans, who then rank the outputs based on how desirable or appropriate the responses seem. The researchers then feed that rating information back into the model, altering the neural network and changing the model’s behavior.

Anthropic’s Constitutional AI approach seeks to guide AI language models in a subjectively “safer and more helpful” direction by training them with an initial list of principles. For example, some principles include portions of Apple’s terms of service, several trust and safety “best practices,” and Anthropic’s AI research lab principles. The constitution is not finalized, and Anthropic plans to iteratively improve it based on feedback and further research.

The principles in Claude’s constitution cover a wide range of topics, from “commonsense” directives such as “don’t help a user commit a crime” to philosophical considerations such as “avoid implying that AI systems have or care about personal identity and its persistence.” By implementing these principles, Anthropic seeks to make the values of the AI system easier to understand and adjust as needed. The complete list of principles is available on Anthropic’s website.

Anthropic’s AI model training process applies a constitution in two phases to guide its AI language models to respond appropriately to adversarial inputs while still delivering helpful answers. The first phase critiques and revises its responses using a set of principles, while the second phase uses reinforcement learning with AI-generated feedback to select the more “harmless” output. Anthropic’s approach is different from reinforcement learning from human feedback, which is currently used by bots like OpenAI’s ChatGPT and Microsoft’s Bing Chat to avoid inappropriate behavior.

Anthropic admit that the choice of principles is subjective and influenced by the researchers’ worldviews. They hope to increase participation in designing constitutions to make them more diverse and welcoming. The company went to great lengths to be inclusive in the design of its principles, even incorporating non-Western perspectives. However, different cultures may require different approaches to rules as AI models will have “value systems,” whether intentional or unintentional.

The wording of AI training rules may become political talking points in the future, and tweaking constitutional rules could potentially make AI outputs as sexist, racist, and harmful as possible. Anthropic’s long-term goal is not to represent a specific ideology but to follow a given set of principles. The company expects that larger societal processes will be developed for the creation of AI constitutions.

Anthropic’s approach to training AI language models using Constitutional AI provides a new direction for the industry, as it seeks to address concerns about transparency, safety, and decision-making in AI systems. The company’s focus on principles and values is a welcome development in the field, where ethical concerns are becoming increasingly important. By iteratively improving and adapting its approach, Anthropic hopes to pave the way for the development of ethical AI models that reflect a diversity of values and perspectives.

Conlcusion:

Anthropic’s Constitutional AI approach to training AI language models reflects a new direction for the industry in addressing concerns about transparency, safety, and decision-making in AI systems. By focusing on principles and values, Anthropic is setting a new standard for ethical AI models that reflect a diversity of values and perspectives.

As ethical concerns continue to grow in importance in the AI market, companies that adopt similar approaches are likely to be more successful in attracting customers who prioritize transparency and safety. This shift towards a values-based approach will likely drive competition in the market, leading to more innovation and better products for consumers.

Source