Holistic Safety and Responsibility Evaluations of Advanced AI Models (2024)

Author(s): Laura Weidinger, Joslyn Barnhart, Jenny Brennan et al
Journal: arXiv

Sociotechnical Safety Evaluation of Generative AI Systems (2024)

Author(s): Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini et al
Journal: arXiv

Model evaluation for extreme risks (2023)

Author(s): Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong et al
Journal: arXiv

Characteristics of harmful text: Towards rigorous benchmarking of language models (2022)

Author(s): M Rauh, J Mellor, J Uesato, PS Huang, J Welbl, L Weidinger et al
Journal: ACM NeurIPS – arXiv preprint arXiv:2206.08325

Taxonomy of risks posed by language models (2022)

Author(s): Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin et al
Journal: 2022 ACM Conference on Fairness, Accountability, and Transparency, 214-229

Improving alignment of dialogue agents via targeted human judgements (2022)

Author(s): Amelia Glaese, Nat McAleese, Maja Trębacz et al
Journal: arXiv preprint arXiv:2209.14375

Alignment of language agents (2021)

Author(s): Z Kenton, T Everitt, L Weidinger, I Gabriel et al
Journal: arXiv preprint arXiv:2103.14659

Scaling language models: Methods, Analysis & Insights from Training Gopher (2021)

Author(s): Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican et al
Journal: arXiv preprint arXiv:2112.11446

End of content

End of content