My newest column for @QuantaMagazine:
What Does It Mean to Align AI With Human Values?
https://www.quantamagazine.org/what-does-it-mean-to-align-ai-with-human-values-20221213?swcfpc=1
@melaniemitchell I really enjoyed this article. AI risk problems can't be understood without understanding how ethical reasoning takes place in real-time.
Paradigms like Inverse RL will probably fail because RL agents don't reason about complex goals/intentions/theory-of-mind in real time, and I doubt that RL as a paradigm will be able to handle it, frankly.
Also, processes like deception always will require extra computational work---hard to imagine that it wouldn't be detectable.