sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

656
active users

Johannes Gasteiger

You can sample nodes for scalable . But how do you do ?

In our latest paper (Oral
) we introduce influence-based mini-batching () for both fast inference and training, achieving up to 130x and 17x speedups, respectively!

1/8 in 🧵

IBMB uses influence scores to select the most important neighbors, instead of a random set.

It works in 2 steps:
1. Partition the output nodes (for which we want predictions) into batches.
2. For each mini-batch, select the auxiliary nodes that help most with predictions.
2/8

Luckily, the influence scores simplify to personalized PageRank (PPR) if we make some assumptions.

Step 2 then becomes an application of PPR or topic-sensitive PageRank.

Step 1 is more tricky and requires falling back to heuristics like PPR or graph partitioning.
3/8

This results in very efficient batches and an up to 130x speedup over the baseline.

These plots show the accuracy vs. speed trade-off for 3 datasets, 3 GNNs, and multiple mini-batching methods when varying their hyperparameters. Note the log. x-axis.
4/8

This results in a fixed set of batches. You might think "WTF, SGD with fixed batches?!" And you're almost right. Almost.

Adaptive optimization and momentum (Adam) can handle these sparse gradients quite well. For the remaining problems we propose a batch scheduling scheme.
5/8

And the fixed batches allow us to precompute them and cache them in a nice block of consecutive memory, substantially speeding up training as well.

Note again the log. x-axis.
6/8

Since IBMB only looks at the output nodes and their surroundings, its runtime is actually independent of the overall graph size! So the speedup becomes even greater for sparser training sets, such as hand-labeled nodes.

Again, note the log. x-axis.
7/8