LinkLinkLinkedIn

DR IASON GABRIEL
Philosopher, Senior Staff Research Scientist at Google DeepMind

My work focuses on the ethics of artificial intelligence, including questions about AI value alignment, distributive justice, language ethics and human rights.

More generally, I’m interested in AI and human values, and in ensuring that technology works well for the benefit of all. I’ve contributed to several projects that promote responsible innovation in AI. 

Before joining DeepMind, I taught moral and political philosophy at Oxford University and worked for the United Nations Development Program in Lebanon and Sudan.

Media Highlights

Named as one of the 100 most influential people in AI by TIME Magazine. 

Check out my Google DeepMind Podcast with Prof. Hannah Fry on the ethics of AI. 

Speaking about AI and justice with Matt Clifford CBE Chair of ARIA

Research Highlights

The Ethics of Advanced AI Assistants (2024) [link] 

Author(s): Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks et al

Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user’s expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.

Toward a theory of justice for artificial intelligence (2023) Daedalus [link] 

Author(s): Iason Gabriel

Abstract: This essay explores the relationship between artificial intelligence and principles of distributive justice. Drawing upon the political philosophy of John Rawls, it holds that the basic structure of society should be understood as a composite of sociotechnical systems, and that the operation of these systems is increasingly shaped and influenced by AI. Consequently, egalitarian norms of justice apply to the technology when it is deployed in these contexts. These norms entail that the relevant AI systems must meet a certain standard of public justification, support citizens' rights, and promote substantively fair outcomes, something that requires particular attention to the impact they have on the worst-off members of society.

Using the Veil of Ignorance to align AI systems with principles of justice (2023)
Proceedings of the National Academy of Sciences [link] 

Author(s): Laura Weidinger, Kevin R McKee, Richard Everett, Saffron Huang, Tina O Zhu, Martin J Chadwick, Christopher Summerfield, Iason Gabriel

Abstract: The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.

In conversation with artificial intelligence: aligning language models with human values (2023) Philosophy & Technology [link] 

Author(s): Atoosa Kasirzadeh, Iason Gabriel

Abstract: Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.

Ethical and social risks of harm from language models (2021) [link] 

Author(s): Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.

We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities.

In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.

Artificial intelligence, values, and alignment (2020) Minds and Machines [link] 

Author(s): Iason Gabriel

Abstract: This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive engagement between people working in both domains. Second, it is important to be clear about the goal of alignment. There are significant differences between AI that aligns with instructions, intentions, revealed preferences, ideal preferences, interests and values. A principle-based approach to AI alignment, which combines these elements in a systematic way, has considerable advantages in this context. Third, the central challenge for theorists is not to identify ‘true’ moral principles for AI; rather, it is to identify fair principles for alignment that receive reflective endorsement despite widespread variation in people’s moral beliefs. The final part of the paper explores three ways in which fair principles for AI alignment could potentially be identified.

Effective altruism and its critics (2017) Journal of Applied Philosophy [link] 

Author(s): Iason Gabriel

Abstract: Effective altruism is a philosophy and a social movement that aims to revolutionise the way we do philanthropy. It encourages individuals to do as much good as possible, typically by contributing money to the best‐performing aid and development organisations. Surprisingly, this approach has met with considerable resistance among activists and aid providers who argue that effective altruism is insensitive to justice insofar as it overlooks the value of equality, urgency and rights. They also hold that the movement suffers from methodological bias, reaching mistaken conclusions about how best to act for that reason. Finally, concerns have been raised about the ability of effective altruism to achieve systemic change. This article weighs the force of each objection in turn, and looks at responses to the challenge they pose.