Getting Started with Scatterbrained
This is a quickstart tutorial for the Scatterbrained federated learning library. It will assume that you have already installed the library and have a working, high-level understanding of federated learning.
- For installation instructions, see Installation.
- For a refresher on federated learning, see this Wikipedia article.
Step 1: Building a model
For this example, we're going to build a basic linear regression model using torch. (You can also use other machine learning frameworks, such as TensorFlow or sklearn!) We'll load a simple built-in dataset, and train a linear regression model on it.
import sklearn.datasets
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch import optim
# Load the data
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
# Generate train/test splits
(X_train, X_val, y_train, y_val) = train_test_split(
X, y, test_size=0.2
)
# Create the data loaders for torch
train = DataLoader(
TensorDataset(
torch.from_numpy(X_train), torch.from_numpy(y_train)
),
batch_size=32, shuffle=True
)
val = DataLoader(
TensorDataset(
torch.from_numpy(X_val), torch.from_numpy(y_val)
),
batch_size=32, shuffle=True
)
# Create the model
model = nn.Linear(X_train.shape[1], 1)
create_optimizer = lambda model: optim.AdamW(model.parameters())
In the codeblock above, we've created a model just like we would in a traditional ML pipeline. We've also created a function that returns an optimizer. So far, we're not doing anything "FL-flavored": This is just plain old ML.
Step 2: Training the model
It's now time to introduce the Scatterbrained library. Training a model is still quite simple:
import scatterbrained as sb
NUM_EPOCHS = 10
# Create a new scatterbrained Node to hold all of the logic
# for a federated learning compute node:
async with sb.Node() as node:
# Create a new Namespace. This is a unique identifier
# shared by all nodes in the same FL community. Other
# nodes on the same network that know this name can join
# your FL cluster:
async with node.namespace(
"MyFirstNamespace", model, create_optimizer
) as ns:
# Finally we can ask scatterbrained to perform the
# training loop for us in a background thread:
await ns.train(NUM_EPOCHS, trainloader, validloader)
# At the same time, we can also serve our node's
# resources to other nodes on the network:
await ns.serve()
Step 3: Joining the cluster
From another machine on the same network (or the same machine on a different port), you can join the cluster by running the following code.
Unlike the code blocks above, where we built up a file gradually, this is all the code you need to run from your second machine:
import scatterbrained as sb
async with sb.Node() as node:
async with node.join("MyFirstNamespace") as ns:
await ns.sync_model() # This line is different!
await ns.serve()
Wow, that's succinct! Let's take a look at what's happening behind the scenes when we run this code.
First, we create a new scatterbrained Node, just like before. And just like in the first example, we create a new Namespace with the same name, so that the nodes know they're allowed to talk to each other. (A sb.Node can't cooperatively learn with a Node training a different model, so we use the Namespace to indicate to the Node that this networked peer is speaking the same language — i.e., using the same model.)
But here we don't specify any training data or model architecture: Instead, we use the sync_model method so that this node can download the model from another peer Node on the network. This means that you can use Scatterbrained to quickly transit a model from one machine to another. (In other words, your models will always be in sync, with a single node serving as the source of truth.)
Next Steps
In this tutorial, we've covered the basics of using Scatterbrained to train a model. But there are many other features that you can use to tailor your decentralized federated learning code. For example,
- You can emulate traditional, centralized federated learning if you want a single machine to serve as an authority
- You can train a model on multiple machines in parallel with different datasets
- You can specify different optimizers for different Nodes
- You can change your network topology so that nodes can communicate with only certain peers, and will ignore others
- You can design custom network topologies so that information can only flow in certain directions