The Gradient @thegradient

Chris Offner @chrisoffner3d@sigmoid.social

I'm running into some unexpected and significant non-determinism when running a #Keras diffusion model on my Apple GPU.

On the left we see the progression of cross-attention maps for time steps from t = 0 to t = 900 when running the model via the CPU.

We see that each cross-attention map undergoes some "refinement" progression as we go from t = 0 to t= 900.

On the right we see the same but on the GPU.

It's a much more erratic and discontinuous progression.

GIF

#MLEngineering #DeepLearning #GPU

Dec 16, 2023, 05:52 PM··Web

1boost·3favorites

**Chris Offner** @chrisoffner3d · Dec 16, 2023 *

Dec 16, 2023 *

Chris Offner @chrisoffner3d

For example, check the second row, fifth column and how it changes between t = 600 and t = 700.

Is this some bug specific to Apple GPUs or does this also happen with CUDA?

For t = 0, the CPU and GPU images look identical. For higher t, the GPU run produces *very* different results even when re-running with the exact same model inputs, i.e. also for the same time step t.

Any idea why that is?

#MLEngineering #GPU #DeepLearning

**David W. Body** @davidbody@fosstodon.org · Dec 16, 2023

Dec 16, 2023

David W. Body @davidbody@fosstodon.org

@chrisoffner3d GPU computations can be nondeterministic because floating point addition is not associative, and GPU libraries are allowed change the order of computations when doing operations like reduce. See https://www.tensorflow.org/xla/operation_semantics#reduce

TensorFlowOperation semantics | XLA | TensorFlow

**Bart Wronski** @BartWronski@mastodon.gamedev.place · Dec 16, 2023

Dec 16, 2023

Bart Wronski @BartWronski@mastodon.gamedev.place

@chrisoffner3d this is expected because reductions use atomics and are out of order. You can ask PyTorch to be deterministic - but it makes it also slower. https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms

pytorch.orgtorch.use_deterministic_algorithms — PyTorch 2.1 documentation

**Chris Offner** @chrisoffner3d · Dec 16, 2023 *

Dec 16, 2023 *

Chris Offner @chrisoffner3d

@BartWronski Ah, thank you. This is a Keras model so I'm not actually using PyTorch or its MPS backend (unlike what my image claims). I'm even less familiar with Keras/TF than I am with (the backend of) PyTorch, so I'll look into whether an equivalent setting exists for TF.

**Andrew Barker** @abarker@mast.hpc.social · Dec 16, 2023

Dec 16, 2023

Andrew Barker @abarker@mast.hpc.social

@chrisoffner3d how sensitive is this computation to small changes in inputs? As others have said, GPUs are allowed to reorder arithmetic. (CPUs can too sometimes, but often not as blatantly). Since computer arithmetic is non associative, this can change the answer. It should only change the answer *significantly* if the underlying model is sensitive to small perturbations.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back