The White House’s Artificial Intelligence Voluntary Safety Commitments

I am an AI optimist, but I am not naive. I, like many others, have been heartened by the dedication to proactive regulation among AI leaders, which is why last week I was encouraged to see the White House’s action on AI risk mitigation, including eight commitments from Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI.

These commitments make for a laudable start and include many of the elements most experts agree are necessary to safeguard AI: watermarking, reporting, safety research, security testing, and more. However, one crucial word was omitted from those commitments: human.

This omission is notable given AI safety research almost always emphasizes the need for alignment with human values. Perhaps it was determined that “values” was too subjective a term. If so, let me propose a more objective ninth commitment: human oversight.

A commitment to human oversight in artificial intelligence development is paramount to mitigate risk because it ensures AI technologies align with ethical standards and societal norms, enables the creation of better, more reliable AI, and provides governments with a tangible basis for monitoring and regulation.

As the White House develops its upcoming Executive Order, it is important to make sure that explicit human oversight is not left out again.

Human values are critical to the creation of harmless AI.

Here’s the truth: Every AI model, no matter how extraordinary, requires training, retraining, and monitoring to remain harmless.

AI always faces “alignment” problems where outputs are inconsistent with the intentions of its creators. We have already seen this in consumer apps like Google Photos, where biased AI algorithms have led to misidentification of people as gorillas. Similarly, in models used to evaluate parole applicants – systems like COMPAS – the presence of bias can result in unjust outcomes and perpetuate systemic inequalities. It is crucial to mitigate these biases to ensure fair and equitable decisions, especially as AI proliferates across society.

Of course, solving these problems will take effort. The real-world deployment of AI systems is a never-ending and painstaking series of optimizations. Every optimization, every tweak of a weight or bias, is an opportunity to put human values and human outcomes first. When we forget this, our AI commitments ring hollow. It should reassure us, however, that big tech companies have already been using human feedback at scale to make these exact kinds of adjustments in deep learning projects. The only thing that has actually changed is the importance of the work. With generative AI, machines have moved from prediction to creation, and so human oversight becomes more important, not less.

Human feedback leads to better AI.

Just as consequential, when we neglect the human element of artificial intelligence development, our AI is less helpful. Human feedback provides better quality data to machine feedback, a fact underscored by recent UK and Canadian research published in the open access journal arXiv.

The study warns of a phenomenon called model collapse, which occurs when AI models are trained on data generated by other AI models, leading to a degenerative process where the AI progressively forgets the true underlying data distribution. This is because AI-generated data tends to overfit for popular data and often misrepresents less popular data, leading to a distorted perception of reality. Over time, errors in the generated data compound, causing models to misperceive reality even further. This can lead to serious implications, from the societal (discrimination) to the economic (inferior enterprise AI).

There is a genuine benefit to machine feedback, and a cost to sourcing human data. Therefore, the economic pressure will be to focus on AI training AI, creating an over-reliance on synthetic data, and putting us at greater risk of misalignment.

Human monitoring offers a path to dynamic regulation and public discourse.

Even if we were to imagine an AI model was fully helpful and harmless, it does not preclude the necessity of continuous human monitoring and control. This is especially relevant as human values and societal norms are not static but evolve over time.

Take self-driving vehicles as an example. Even when these AI-controlled machines surpass human drivers in terms of safety, they will still need to adapt to shifting laws, redesigned city layouts, and changing societal attitudes towards transportation.

Regardless of how advanced or benign an AI system may appear, it is crucial that humans remain in control, ensuring AI technologies align with our evolving values and societal contexts. Modern feedback techniques provide the sort of vigilant human monitoring that allows governments to mitigate the potential harms of AI and gives society an avenue to discuss how we want our AI to behave – all while simultaneously fostering innovation. Our present course relies too heavily on just “getting the technology right” and the hope that AI leaders will overcome financial motivations to do the course correcting themselves.

That is why, in addition to the commitments in the White House Fact Sheet, AI leaders should commit to align AI to human values through human oversight by:

Training with human feedback: AI systems should not be developed exclusively with machine feedback and should use some form of Reinforcement Learning with Human Feedback (RLHF) techniques.
External monitoring of AI with humans: AI systems should be subject to continuous human monitoring to ensure their behavior remains in line with desired outcomes, and to enable timely intervention when necessary.
Sourcing feedback from representative humans: AI systems should be developed with a commitment to a diversity of human perspectives, experiences, and needs. This includes avoiding bias and discriminatory practices, ensuring fairness, and promoting inclusivity.

The commitments outlined in the White House Fact Sheet are encouraging, but they must be supplemented with further dedication to human involvement, continuous monitoring, and a demonstrable pursuit of positive human outcomes. The development and deployment of AI systems should not just be about advancements in technology, but more importantly, about the progress of humanity. As the Biden administration develops an Executive Order to guide the industry into this new era, they should explicitly commit to human oversight that ensures artificial intelligence does not diverge from our collective values and aspirations.