Hello! I’m a Staff Research Scientist at Google DeepMind and where I work in the Ethics Research Team. My work focuses on the ethics of artificial intelligence, including questions about AI value alignment, distributive justice, language ethics and human rights.
More generally, I’m interested in AI and human values, and in ensuring that technology works well for the benefit of all. I’ve contributed to several projects that promote responsible innovation in AI, including the creation of the ethics review process at NeurIPS.
Before joining DeepMind, I taught moral and political philosophy at Oxford University, and worked for the United Nations Development Program in Lebanon and Sudan.
Research
AI Ethics
-
Author(s): Stevie Bergman, Nahema Marchal, John Mellor, Shakir Mohamed et al
Journal: Scientific Reports
Abstract
Value alignment, the process of ensuring that artificial intelligence (AI) systems are aligned with human values and goals, is a critical issue in AI research. Existing scholarship has mainly studied how to encode moral values into agents to guide their behaviour. Less attention has been given to the normative questions of whose values and norms AI systems should be aligned with, and how these choices should be made. To tackle these questions, this paper presents the STELA process (SocioTEchnical Language agent Alignment), a methodology resting on sociotechnical traditions of participatory, inclusive, and community-centred processes. For STELA, we conduct a series of deliberative discussions with four historically underrepresented groups in the United States in order to understand their diverse priorities and concerns when interacting with AI systems. The results of our research suggest that community-centred deliberation on the outputs of large language models is a valuable tool for eliciting latent normative perspectives directly from differently situated groups. In addition to having the potential to engender an inclusive process that is robust to the needs of communities, this methodology can provide rich contextual insights for AI alignment.
-
Author(s): Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks et al
Journal: arXiv
Abstract
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user’s expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
-
Author(s): A Stevie Bergman, Lisa Anne Hendricks, Maribeth Rauh, Boxi Wu et al
Journal: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
Abstract
Calls for representation in artificial intelligence (AI) and machine learning (ML) are widespread, with “representation” or “representativeness” generally understood to be both an instrumentally and intrinsically beneficial quality of an AI system, and central to fairness concerns. But what does it mean for an AI system to be “representative”? Each element of the AI lifecycle is geared towards its own goals and effect on the system, therefore requiring its own analyses with regard to what kind of representation is best. In this work we untangle the benefits of representation in AI evaluations to develop a framework to guide an AI practitioner or auditor towards the creation of representative ML evaluations. Representation, however, is not a panacea. We further lay out the limitations and tensions of instrumentally representative datasets, such as the necessity of data existence and access, surveillance vs expectations of privacy, implications for foundation models and power. This work sets the stage for a research agenda on representation in AI, which extends beyond instrumentally valuable representation in evaluations towards refocusing on, and empowering, impacted communities.
-
Author(s): Laura Weidinger, Kevin R McKee, Richard Everett, Saffron Huang et al
Journal: Proceedings of the National Academy of Sciences
Abstract
The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.
-
Author(s): A Kasirzadeh, I Gabriel
Journal: Philosophy & Technology
Abstract
Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions.
For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents.
Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains.
-
Author(s): I Gabriel, V Ghazavi
Journal: The Oxford Handbook of Digital Ethics (ed) Carissa Veliz (OUP, 2022)
Abstract
This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of technology to ‘lock-in’ different value systems.
There has also been considerable thought about how to align technologies with specific social values, including through participatory design-processes. In this paper we look more closely at the question of AI value alignment and suggest that the power and autonomy of AI systems gives rise to opportunities and challenges in the domain of value that have not been encountered before.
Drawing important continuities between the work of the fairness, accountability, transparency and ethics community, and work being done by technical AI safety researchers, we suggest that more attention needs to be paid to the question of ‘social value alignment’ – that is, how to align AI systems with the plurality of values endorsed by groups of people, especially on the global level. -
Author(s): A Birhane, W Isaac, V Prabhakaran, M Díaz, MC Elish, I Gabriel et al
Journal: ACM EAAMO
Abstract
Participatory approaches to artificial intelligence are gaining momentum with the view that participation opens the gateway to an inclusive, equitable, robust, responsible and trustworthy AI.
Indeed, these approaches are essential to understanding and adequately representing the needs, desires and perspectives of historically marginalized communities. However, there is also a lack of clarity about what meaningful participation entails and what it is expected to do in the context of AI.
This paper reviews participatory approaches across varied historical contexts as well as participatory methods and practices within the AI pipeline. We then examine three case studies in participatory AI. Ultimately, participation supports beneficial, emancipatory and empowering technology design, only when it avoids cooptation, power asymmetries and conflation with other activities.
-
Author(s): V Prabhakaran, M Mitchell, T Gebru, I Gabriel
Journal: ACM EAAMO poster
Abstract
This paper explores the relationship between artificial intelligence and human rights, defending the value of a human rights-based approach in three different contexts.
First, human rights can serve as a focal point for inter-cultural AI value alignment, functioning as part of an ‘overlapping consensus’ between different global value systems.
Second, human rights, and their supporting legal instruments, can help determine who is responsible for what in the context of AI ethics, mapping out the duties of different actors including states and technology companies.
Third, human rights can serve as a lingua franca that helps bridge the divide between the technical AI research community and civil society and activists on the ground. To illustrate how these claims work in practice, the paper focuses on three specific human rights: freedom from discrimination, health, and access to science.
-
Author(s): Iason Gabriel
Journal: Daedalus 151 (2), 218-231
Abstract
This essay explores the relationship between artificial intelligence and principles of distributive justice. Drawing upon the political philosophy of John Rawls, it holds that the basic structure of society should be understood as a composite of sociotechnical systems, and that the operation of these systems is increasingly shaped and influenced by AI.
Consequently, egalitarian norms of justice apply to the technology when it is deployed in these contexts. These norms entail that the relevant AI systems must meet a certain standard of public justification, support citizens’ rights, and promote substantively fair outcomes, something that requires particular attention to the impact they have on the worst-off members of society. -
Author(s): I Gabriel
Journal: Minds and machines 30 (3), 411-437
Abstract
This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions.
First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive engagement between people working in both domains.
Second, it is important to be clear about the goal of alignment. There are significant differences between AI that aligns with instructions, intentions, revealed preferences, ideal preferences, interests and values. A principle-based approach to AI alignment, which combines these elements in a systematic way, has considerable advantages in this context.
Third, the central challenge for theorists is not to identify ‘true’ moral principles for AI; rather, it is to identify fair principles for alignment, that receive reflective endorsement despite widespread variation in people’s moral beliefs.
The final part of the paper explores three ways in which fair principles for AI alignment could potentially be identified.
Technical Reports and Papers
-
Author(s): Laura Weidinger, Joslyn Barnhart, Jenny Brennan et al
Journal: arXiv
Abstract
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind’s advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety – including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.
-
Author(s): Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini et al
Journal: arXiv
Abstract
Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.
-
Author(s): Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong et al
Journal: arXiv
Abstract
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through “dangerous capability evaluations”) and the propensity of models to apply their capabilities for harm (through “alignment evaluations”). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
-
Author(s): M Rauh, J Mellor, J Uesato, PS Huang, J Welbl, L Weidinger et al
Journal: ACM NeurIPS – arXiv preprint arXiv:2206.08325
Abstract
Large language models produce human-like text that drive a growing number of applications. However, recent literature and, increasingly, real world observations, have demonstrated that these models can generate language that is toxic, biased, untruthful or otherwise harmful.
Though work to evaluate language model harms is under way, translating foresight about which harms may arise into rigorous benchmarks is not straightforward. To facilitate this translation, we outline six ways of characterizing harmful text which merit explicit consideration when designing new benchmarks. We then use these characteristics as a lens to identify trends and gaps in existing benchmarks.
Finally, we apply them in a case study of the Perspective API, a toxicity classifier that is widely used in harm benchmarks. Our characteristics provide one piece of the bridge that translates between foresight and effective evaluation. -
Author(s): Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin et al
Journal: 2022 ACM Conference on Fairness, Accountability, and Transparency, 214-229
Abstract
This paper develops a comprehensive taxonomy of ethical and social risks associated with LMs. We identify twenty-one risks, drawing on expertise and literature from computer science, linguistics, and the social sciences.
We situate these risks in our taxonomy of six risk areas: I. Discrimination, Hate speech and Exclusion, II. Information Hazards, III. Misinformation Harms, IV. Malicious Uses, V. Human-Computer Interaction Harms, and VI. Environmental and Socioeconomic harms. For risks that have already been observed in LMs, the causal mechanism leading to harm, evidence of the risk, and approaches to risk mitigation are discussed.
We further describe and analyse risks that have not yet been observed but are anticipated based on assessments of other language technologies. We conclude by highlighting challenges and directions for further research on risk evaluation and mitigation with the goal of ensuring that language models are developed responsibly.
-
Author(s): Amelia Glaese, Nat McAleese, Maja Trębacz et al
Journal: arXiv preprint arXiv:2209.14375
Abstract
We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour.
First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models.
Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed.
Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases. -
Author(s): Z Kenton, T Everitt, L Weidinger, I Gabriel et al
Journal: arXiv preprint arXiv:2103.14659
Abstract
For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer.
We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues. -
Author(s): Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican et al
Journal: arXiv preprint arXiv:2112.11446
Abstract
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales — from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.
These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model’s behaviour, covering the intersection of model scale with bias and toxicity.
Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.
General Philosophy
-
Author(s): I Gabriel
Journal: Utilitas 30 (1), 32-53
Abstract
This paper focuses on the demandingness of morality in an age where spending on luxury goods and extreme poverty continue to exist side by side. If morality grants the wealthy permissions, then what do they allow? If there are limits on what morality may demand of us, then how much does it permit?
For a view Henry Shue has termed ‘yuppie ethics’, the answer to both questions is a great deal. It holds that rich people are morally permitted to spend large amounts of money on themselves, even when this means leaving those living in extreme poverty unaided.
Against this view, I demonstrate that personal permissions are limited in certain ways: their strength must be continuous with the reasons put forward to explain their presence inside morality to begin with.
Typically, these reasons include non-alienation and the preservation of personal integrity. However, when personal costs do not result in alienation or violate integrity, they are things that morality can routinely demand of us. Yuppie ethics therefore runs afoul of what I call the ‘continuity constraint’. -
Author(s): H Lazenby, I Gabriel
Journal: The Philosophical Quarterly 68 (271), 265-285
Abstract
Award-winning paper (OUP Best of Philosophy, 2018) that offers an account of the information condition on morally valid consent in the context of sexual relations. The account is grounded in rights. It holds that a person has a sufficient amount of information to give morally valid consent if, and only if, she has all the information to which she has a claim-right.
A person has a claim-right to a piece of information if, and only if: a. it concerns a deal-breaker for her; b. it does not concern something that her partner has a strong interest in protecting from scrutiny, sufficient to generate a privilege-right; c.i. her partner is aware of the information to which her deal-breaker applies, or c.ii. her partner ought to be held responsible for the fact that he is not aware of the information to which her deal-breaker applies; and d. she has not waived or forfeited her claim-right. -
Author(s): I Gabriel
Journal: Journal of Applied Philosophy 34 (4), 457-473
Abstract
Effective altruism is a philosophy and a social movement that aims to revolutionise the way in which we do philanthropy. It encourages individuals to do as much good as possible, typically by contributing money to the best-performing aid and development organizations.
Surprisingly, this approach has met with considerable resistance among aid practitioners. They argue that effective altruism is insensitive to justice insofar as it overlooks the value of equality, urgency and rights. They also hold that the movement suffers from methodological bias, which means that it takes materialistic, individualistic and instrumental approach to doing good.
Finally, they maintain that effective altruists hold false empirical beliefs about the world, and that they reach mistaken conclusions about how best to act for that reason. This paper weighs the force of each objection in turn, and looks at responses to the challenge they pose.
Talks and Podcasts
-
Author(s): Iason Gabriel
Host: Schwartz Reisman Institute
Summary: The development of general-purpose foundation models such as Gemini and GPT-4 has paved the way for increasingly advanced AI assistants. While early assistant technologies, such as Amazon’s Alexa or Apple’s Siri, used narrow AI to identify and respond to speaker commands, more advanced AI assistants demonstrate greater generality, autonomy and scope of application. They also possess novel capabilities such as summarization, idea generation, planning, memory, and tool-use—skills that will likely develop further as the underlying technology continues to improve.
Advanced AI assistants could be used for a range of productive purposes, including as creative partners, research assistants, educational tutors, digital counsellors, or life planners. However, they could also have a profound effect on society, fundamentally reshaping the way people relate to AI. The development and deployment of advanced assistants therefore requires careful evaluation and foresight. In particular we may want ask:
What might a world populated by advanced AI assistants look like?
How will people relate to new, more capable, forms of AI that have human-like traits and with which they’re able to converse fluently?
How might these dynamics play out at a societal level—in a world with millions of AI assistants interacting with one another on their user’s behalf?
This talk will explore a range of ethical and societal questions that arise in the context of assistants, including value alignment and safety, anthropomorphism and human relationships with AI, and questions about collective action, equity, and overall societal impact. -
Author(s): John Danaher
Host: Philosophical Disquisitions
Summary: With John Danaher for the podcast Philosophical Disquisitions (1 hr 8 mins)
-
Author(s): I Gabriel
Host: UC Berkeley Social Science Matrix
Summary: Author meets critics event with David Robinson and Deirdre Mulligan at the UC Berkeley Social Science Matrix (October 2022)
-
Author(s): Matt Clifford
Host: Thoughts in Between
Summary: Matt Clifford for the podcast Thoughts in Between (48 mins)
-
-
Author(s): Lucas Perry
Host: Podcast – The Future of Life Institute
Summary: With Lucas Perry for The Future of Life Institute Podcast (1 hr 45 mins)
-
Author(s):
Host: Princeton University
Summary: A public lecture at Princeton University in November 2019
Media
-
-
Author(s): Iason Gabriel
Host: DeepMind Technical Blog
Summary: DeepMind Blog exploring work on value alignment and language models.
-
Author(s): Matthew Hutson
Host: The New Yorker
Summary: Exploration in The New Yorker of the ethics requirements introduced at NeurIPS and wider questions surrounding responsibility in the AI industry.
-
Author(s): David Castelvecci
Host: Nature
Summary: Write-up in Nature of the requirement to include social impact statements alongside research submissions at NeurIPS in 2020
-
Author(s): Iason Gabriel
Host: DeepMind Technical Blog
Summary: DeepMind Blog exploring value alignment research and approaches that draw upon political theory.
-
Author(s): Iason Gabriel
Host: Medium
Summary: An early exploration of the way in which insights from political philosophy, in particular those of intersectional analysis, can cast light on the challenge of algorithmic injustice
-
Author(s): World Bank Blog
Host: Let’s Talk About Development
Summary: Blog exploring the psychology of charity fund-raising and competitive dynamics within the sector.
-
Author(s): Derek Thompson
Host: The Atlantic
Summary: Article in the Atlantic exploring what it means to “do the most good” and whether a focus on systemic change could be relevant to this project.