I keep going back to this question about #TemporalCreditAssignment and #HippocampalReplay:
As an "agent" you want to learn the value of places and which places are likely to lead to reward;
-1) if a place leads to higher than expected reward, you'll want to propagate back the reward info from the reward throughout the places that led to the reward. If replay does that you should see an increase of replay at a new reward site and the replay sequences should start at the reward and reflect what you just did to reach it. Right?
-2) if a place leads to lower than expected reward, you'll also want to propagate that lowered value, pretty much in the same way, so if replay does that you should see a similar replay rate and content for increased OR decreased reward sites. Right?
-3) if a place has had unchanged reward for a while and you're just in exploitation mode (just going there again and again because you know that's the best place to go to in the environment) then you shouldn't need to update anything and replay rate should be quite low at that unchanged reward side. Right?
That's not at all what replay is doing IRL, so does that mean replay is not used for temporal credit assignment? Or did I (very likely) miss something?