Last week, amid the #ICML2023 rush, we were informed that our paper (w/ Sonali Parbhoo and Marzyeh Ghassemi): "Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning" was accepted to
#TMLR! #ReinforcementLearning #Healthcare
In safety-critical environments, such as healthcare, it is important to be mindful of worst case outcomes when assessing which actions to avoid. With an estimated distribution over expected return, we can use the conditional value at risk (CVaR) to characterize this (2/7)
We use distributional RL to give this rich representation of possible outcomes for each action and use the CVaR to assess the risk of any action leading to a dead-end. Extending our prior work on dead-end discovery (https://tinyurl.com/Neurips21DeD) we introduce DistDeD! (3/7)
There are 2 immediate benefits of using CVaR and distributional RL in dead-end discovery:
1) We enable *even earlier indication* of when things may go wrong.
2) The implementation of DistDeD is tunable in order to account for specific aspects of the intended use case. (4/7)
The improvements we've made with DistDeD are exciting! This is one promising direction to make RL useful in the real-world.
We are working with several clinical collaborators to determine how we can best use DistDeD. Lots to come in the near future, stay tuned! (5/7)
There are lots of people to thank. Foremost, I wanted to publicly acknowledge those whose enthusiasm and encouragement helped push me to continue building upon these ideas. Thank you Mehdi Fatemi, Marc Bellemare, Will Dabney, Vinith Suriyakumar, and Haoran Zhang. (6/7)
I also need to mention the #RLDM and #TMLR communities. Having venues such as these, that welcome cross-disciplinary work, is such a benefit to our ML research community.
Our paper can be seen at: https://openreview.net/forum?id=oKlEOT83gI
And our code (soon!): https://github.com/MLforHealth/DistDeD
(7/7)