Dr Fazl Barez
About
Dr Fazl Barez is a Senior Research Fellow at the University of Oxford, specializing in AI safety, interpretability, and governance.
He leads research initiatives within the AI Governance Initiative, focusing on the development of safety frameworks and interpretability methods for advanced AI systems. Dr Barez is also affiliated with the Centre for the Study of Existential Risk (CSER) at the University of Cambridge, contributing to research on the risks associated with artificial intelligence.
His research interests include mechanistic interpretability of large language models, AI alignment mechanisms, and the development of policy frameworks for AI governance. Dr Barez has contributed to publications in top AI conferences and his research has shaped industry and governments.
Expertise
- Interpretability of large language models
- AI safety and alignment mechanisms
- Governance frameworks for emerging AI technologies
- Policy development for AI ethics and safety standards
- Translating technical insights into regulatory frameworks
- Robustness and risk analysis in AGI systems
- Fairness and ethical AI development
- Risks and mitigation strategies in advanced AI
Selected publications
- Open Problems in Machine Unlearning for AI Safety (2025)
- Rethinking AI Cultural Evaluation (2025)
- Interpreting Learned Feedback Patterns in Large Language Models (2024)
- Risks and Opportunities of Open-Source Generative AI (2024)
- Safeguarding AI In Finance: Lessons for Regulated Industries (2024)
- Measuring Value Alignment (2023)
- The Alan Turing Institute’s response to the (House of Lords) Large Language Models Inquiry: Call for Evidence (2023)
- AI Systems of Concern (2023)
Media experience
Dr Fazl Barez has engaged with newspapers and public forums to discuss AI governance and safety, emphasizing transparency and alignment in AI systems. He has delivered keynotes and participated in high-profile panels, including:
- Dialogue on Digital Trust and Safe AI 2024 (Singapore)
- Technical AI Safety Conference 2024 (Japan)
- Foresight AGI Safety & Security Workshop 2024 (San Francisco, USA)
- Mechanistic Interpretability Panel (International Conference on Learning Representations 2024, Austria)
Dr Barez has co-organized workshops, such as the Mechanistic Interpretability Workshop (International Conference on Machine Learning 2024), and his recorded talks, including Unlearning and Relearning in LLMs at NTU Singapore, are publicly accessible. He has also contributed to widely-read publications on AI safety and interpretability.