My primary research is in moral psychology, cognitive science, and the philosophy of AI. It is aimed at explaining motivation in biological and artificial agents and exploring the implications of such explanations for accounts of rationality and well-being. To do so, I draw on computational methods from cognitive science and AI research. I also have interests in decision theory and other areas of formal philosophy.

Publications

“Do Expected Utility Maximizers Have Commitment Issues?”, Philosophy and Phenomenological Research, Forthcoming

“What makes discrimination wrong?”, Journal of Practical Ethics, Vol. 5(2):105-113, 2017

Works in progress (Drafts available upon request)

“What Stands to Desire as Perception Stands to Belief?”

“A Conflation of Valences”

Works on AI

“Machine Theory of Mind and the Structure of Human Values”, NeurIPS 2023 MP2 Workshop (link)

“Generative Theory of Mind and the Value Misgeneralization Problem”, 2023 AI Alignment Awards Final Prize winner (link)

“Alignment as a Dynamic Process”, NeurIPS 2022 AI Safety Workshop, AI Risk Analysis award winner (link)

“AI: The Consequences for Human Rights: Initial findings on the expected future use of AI in authoritarian regimes”, expert testimony report submitted to the House of Representatives Commission on Human Rights, 2017 (link)

Writing for a popular audience

“Why gender studies are generally misunderstood”, Expressen, 2019 (in Swedish) (link)

“Where is the meaning of art?”, Medium, 2017 (link)