@janeadams @jfpuget a bit context to the riddle:
Student uses a pretrained Bert as text vectorizer of their PyTorch model’s input. Not trying to continue training Bert. When they initialize their NN, they set the bert model to evaluation mode (assume they also did all @jfpuget suggested). Then, they call their own model’s .train() and print out the Bert token ids and embeddings in the .forward() — where they get same ids but different embs in different runs.
How can this happen?
@janeadams @jfpuget I’ll probably give it one more day before sharing the (perhaps somewhat disappointing?) current solution tomorrow. So cast your answers today!