sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

753
active users

Excited to share the Complex AutoEncoder (CAE):

✨ The CAE decomposes images into objects without supervision by taking inspiration from the temporal coding patterns found in biological neurons. ✨

Now accepted at TMLR!

📜 arxiv.org/abs/2204.02075

with @phillip_lippe, Maja Rudolph, and Max Welling

1/5

🧠 In the brain, objects are theorized to be represented through temporal spiking patterns: a neuron’s firing rate represents whether a feature is present; and if neurons fire in sync, their respective features are bound together to represent one object.

🤖 We employ a similar mechanism by using complex-valued activations: a neuron's magnitude represents whether a feature is present; and if neurons have similar phases, their respective features are bound together to represent one object.

2/5

We implement this coding scheme by augmenting all activations in an autoencoder with a phase dimension. By training the CAE to reconstruct the input image (left), it learns to represent the disentangled object identities in the phases without supervision (right).

This simple setup works surprisingly well! The CAE learns to create object-centric representations, and to segment objects accurately, as highlighted in the predictions below.

3/5

On the simple, grayscale datasets that we consider, it even achieves competitive or better performance to SlotAttention - a state-of-the-art object discovery method.

⚡And it’s lightning fast - compared to SlotAttention, the CAE trains between 10-100 times faster! ⚡

4/5

Sindy Löwe

There are some other cool properties to this model. For example, it seems to express uncertainty about object identity in its phase values; and it’s equivariant to global rotations. Take a look at the paper to learn more!

📜 arxiv.org/abs/2204.02075

5/5