DR IASON GABRIEL
Philosopher, Senior Staff Research Scientist at Google DeepMind
Philosopher, Senior Staff Research Scientist at Google DeepMind
My work focuses on the ethics of artificial intelligence, including questions about AI value alignment, distributive justice, language ethics and human rights.
More generally, I’m interested in AI and human values, and in ensuring that technology works well for the benefit of all. I’ve contributed to several projects that promote responsible innovation in AI.
Before joining DeepMind, I taught moral and political philosophy at Oxford University and worked for the United Nations Development Program in Lebanon and Sudan.
Media Highlights
Named as one of the 100 most influential people in AI by TIME Magazine.
Speaking about living well in the age of AI at the Mind & Life Insitute in India.
Check out my Google DeepMind Podcast with Prof. Hannah Fry on the ethics of AI.
Research Highlights
Author(s): Iason Gabriel, Geoff Keeling, Arianna Manzini & James Evans
Abstract: The deployment of capable AI agents raises fresh questions about safety, human-machine relationships and social coordination. We argue for greater engagement by scientists, scholars, engineers and policymakers with the implications of a world increasingly populated by AI agents. We explore key challenges that must be addressed to ensure that interactions between humans and agents, and among agents themselves, remain broadly beneficial.
Author(s): Iason Gabriel, Geoff Keeling
Abstract: The normative challenge of AI alignment centres upon what goals or values ought to be encoded in AI systems to govern their behaviour. A number of answers have been proposed, including the notion that AI must be aligned with human intentions or that it should aim to be helpful, honest and harmless. Nonetheless, both accounts suffer from critical weaknesses. On the one hand, they are incomplete: neither specification provides adequate guidance to AI systems, deployed across various domains with multiple parties. On the other hand, the justification for these approaches is questionable and, we argue, of the wrong kind. More specifically, neither approach takes seriously the need to justify the operation of AI systems to those affected by their actions – or what this means for pluralistic societies where people have different underlying beliefs about value. To address these limitations, we propose an alternative account of AI alignment that focuses on fair processes. We argue that principles that are the product of these processes are the appropriate target for alignment. This approach can meet the necessary standard of public justification, generate a fuller set of principles for AI that are sensitive to variation in context, and has explanatory power insofar as it makes sense of our intuitions about AI systems and points to a number of hitherto underappreciated ways in which an AI system may cease to be aligned.
Author(s): Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen & Scott A. Hale
Abstract: Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI systems. We explore how increasingly capable AI agents may generate the perception of deeper relationships with users, especially as AI becomes more personalised and agentic. This shift, from transactional interaction to ongoing sustained social engagement with AI, necessitates a new focus on socioaffective alignment—how an AI system behaves within the social and psychological ecosystem co-created with its user, where preferences and perceptions evolve through mutual influence. Addressing these dynamics involves resolving key intrapersonal dilemmas, including balancing immediate versus long-term well-being, protecting autonomy, and managing AI companionship alongside the desire to preserve human social bonds. By framing these challenges through a notion of basic psychological needs, we seek AI systems that support, rather than exploit, our fundamental nature as social and emotional beings.
Author(s): Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks et al
Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user’s expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
Author(s): Iason Gabriel
Abstract: This essay explores the relationship between artificial intelligence and principles of distributive justice. Drawing upon the political philosophy of John Rawls, it holds that the basic structure of society should be understood as a composite of sociotechnical systems, and that the operation of these systems is increasingly shaped and influenced by AI. Consequently, egalitarian norms of justice apply to the technology when it is deployed in these contexts. These norms entail that the relevant AI systems must meet a certain standard of public justification, support citizens' rights, and promote substantively fair outcomes, something that requires particular attention to the impact they have on the worst-off members of society.
Author(s): Laura Weidinger, Kevin R McKee, Richard Everett, Saffron Huang, Tina O Zhu, Martin J Chadwick, Christopher Summerfield, Iason Gabriel
Abstract: The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.
Author(s): Atoosa Kasirzadeh, Iason Gabriel
Abstract: Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.
Author(s): Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel
Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities.
In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.
Author(s): Iason Gabriel
Abstract: This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive engagement between people working in both domains. Second, it is important to be clear about the goal of alignment. There are significant differences between AI that aligns with instructions, intentions, revealed preferences, ideal preferences, interests and values. A principle-based approach to AI alignment, which combines these elements in a systematic way, has considerable advantages in this context. Third, the central challenge for theorists is not to identify ‘true’ moral principles for AI; rather, it is to identify fair principles for alignment that receive reflective endorsement despite widespread variation in people’s moral beliefs. The final part of the paper explores three ways in which fair principles for AI alignment could potentially be identified.
Author(s): Iason Gabriel
Abstract: Effective altruism is a philosophy and a social movement that aims to revolutionise the way we do philanthropy. It encourages individuals to do as much good as possible, typically by contributing money to the best‐performing aid and development organisations. Surprisingly, this approach has met with considerable resistance among activists and aid providers who argue that effective altruism is insensitive to justice insofar as it overlooks the value of equality, urgency and rights. They also hold that the movement suffers from methodological bias, reaching mistaken conclusions about how best to act for that reason. Finally, concerns have been raised about the ability of effective altruism to achieve systemic change. This article weighs the force of each objection in turn, and looks at responses to the challenge they pose.