sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

588
active users

Chris Offner

I'm running into some unexpected and significant non-determinism when running a diffusion model on my Apple GPU.

On the left we see the progression of cross-attention maps for time steps from t = 0 to t = 900 when running the model via the CPU.

We see that each cross-attention map undergoes some "refinement" progression as we go from t = 0 to t= 900.

On the right we see the same but on the GPU.

It's a much more erratic and discontinuous progression.

For example, check the second row, fifth column and how it changes between t = 600 and t = 700.

Is this some bug specific to Apple GPUs or does this also happen with CUDA?

For t = 0, the CPU and GPU images look identical. For higher t, the GPU run produces *very* different results even when re-running with the exact same model inputs, i.e. also for the same time step t.

Any idea why that is?

@chrisoffner3d GPU computations can be nondeterministic because floating point addition is not associative, and GPU libraries are allowed change the order of computations when doing operations like reduce. See tensorflow.org/xla/operation_s

TensorFlowOperation semantics  |  XLA  |  TensorFlow

@BartWronski Ah, thank you. This is a Keras model so I'm not actually using PyTorch or its MPS backend (unlike what my image claims). I'm even less familiar with Keras/TF than I am with (the backend of) PyTorch, so I'll look into whether an equivalent setting exists for TF.

@chrisoffner3d how sensitive is this computation to small changes in inputs? As others have said, GPUs are allowed to reorder arithmetic. (CPUs can too sometimes, but often not as blatantly). Since computer arithmetic is non associative, this can change the answer. It should only change the answer *significantly* if the underlying model is sensitive to small perturbations.