mirror of
https://github.com/libp2p/gossipsub-hardening.git
synced 2026-01-08 21:48:00 -05:00
Merge pull request #1 from libp2p/feat/readme-jupyterhub
add readme for jupyterhub workflow
This commit is contained in:
147
README-shared-environment.md
Normal file
147
README-shared-environment.md
Normal file
@@ -0,0 +1,147 @@
|
||||
# Shared jupyterhub Workflow
|
||||
|
||||
This doc describes how we (the gossipsub-hardening team at Protocol Labs) has been running the tests in this repo.
|
||||
|
||||
## Connecting to the Jupyterhub server
|
||||
|
||||
We have an EC2 instance running Ubuntu 18.04, with [The Littlest Jupyterhub][tljh] installed. It doesn't
|
||||
have a persistent domain or SSL cert, so we've been connecting to it using an SSH tunnel.
|
||||
|
||||
The current incantation is:
|
||||
|
||||
```shell
|
||||
ssh -A -L 8666:localhost:80 jupyter-protocol@ec2-3-122-216-37.eu-central-1.compute.amazonaws.com
|
||||
```
|
||||
|
||||
This will open a shell as the `jupyter-protocol` user, and tunnel traffic from port 80 on the remote
|
||||
machine to port 8666 on localhost.
|
||||
|
||||
If your ssh key isn't authorized, ping @yusefnapora (or someone else with access, if I'm sleeping or something)
|
||||
to get added to the `authorized_keys` file.
|
||||
|
||||
The `-A` flag enables ssh-agent forwarding, which will let you pull from this repo while you're shelled in, assuming
|
||||
your SSH key is linked to your github account & you have read access to this repo. Note that the agent forwarding
|
||||
doesn't seem to work if you're inside a tmux session on the remote host. There's probably a way to
|
||||
get it working, but I've just been doing `git pull` outside of tmux.
|
||||
|
||||
Once the tunnel is up, you can go to `http://localhost:8666`, where you'll be asked to sign in. Sign in as
|
||||
user `protocol` with an empty password.
|
||||
|
||||
## Server Environment
|
||||
|
||||
There are some things specific to the environment that are worth mentioning.
|
||||
|
||||
The testground daemon is running inside a tmux session owned by the `jupyter-protocol` user. Running `tmux attach`
|
||||
while shelled in should open it for you - you may have to switch to a different pane - it's generally running in
|
||||
the first pane.
|
||||
|
||||
If for some reason testground isn't running, (e.g. `ps aux | grep testground` comes up empty), you can start the
|
||||
daemon with:
|
||||
|
||||
```shell
|
||||
testground --vv daemon
|
||||
```
|
||||
|
||||
The testground that's on the `$PATH` is a symlink to `~/repos/testground/testground`, so if you pull in changes
|
||||
to testground and rebuild, it should get picked up by the runner scripts, etc automatically.
|
||||
|
||||
This repo is checked out to `~/repos/gossipsub-hardening`, and there's a symlink to it in `~/testground/plans`, so that
|
||||
the daemon can find it and run our plans.
|
||||
|
||||
## Cluster setup
|
||||
|
||||
The [testground/infra](https://github.com/testground/infra) repo is checked out at `~/repos/infra`. It contains
|
||||
the scripts for creating and deleting the k8s cluster. The infra README has more detail and some helpful commands,
|
||||
but here are some of the most relevant, plus some things to try if things break.
|
||||
|
||||
Before running any of the commands related to the cluster, you'll need to source some environment vars:
|
||||
|
||||
```
|
||||
source ~/k8s-env.sh
|
||||
```
|
||||
|
||||
|
||||
To see the current status of the cluster:
|
||||
|
||||
```shell
|
||||
kops validate cluster
|
||||
```
|
||||
|
||||
If that command can't connect to the cluster VMS at all, it either means the cluster has been deleted,
|
||||
or you need to export the kubectl config:
|
||||
|
||||
```shell
|
||||
kops export kubecfg --state $KOPS_STATE_STORE --name=$NAME
|
||||
```
|
||||
|
||||
If `kops validate cluster` still can't connect to anything, someone probably deleted the cluster when they were
|
||||
done with it. To create it:
|
||||
|
||||
```shell
|
||||
cd ~/repos/infra/k8s
|
||||
./install.sh cluster.yaml
|
||||
```
|
||||
|
||||
This will take a few minutes, and the newly created cluster will only have 4 workers. To resize it:
|
||||
|
||||
```shell
|
||||
kops edit ig nodes
|
||||
```
|
||||
|
||||
and edit the `maxSize` and `minSize` params - set both to the desired node count. Then, apply the changes with
|
||||
|
||||
```shell
|
||||
kops update cluster $NAME --yes
|
||||
```
|
||||
|
||||
After a few minutes, `kops validate cluster` should show all the instances up, and the cluster will be ready.
|
||||
|
||||
## Running Tests
|
||||
|
||||
Everything in the [main README](./README.md) should apply when running tests on the server, but you can ignore
|
||||
the parts that tell you to run `jupyter notebook` manually.
|
||||
|
||||
When you log into the Jupyterhub server, you should see a file browser interface. Navigate to
|
||||
`repos/gossipsub-hardening/scripts` to open the `Runner.ipynb` notebook.
|
||||
|
||||
There are a bunch of `config-*.json` files next to the runner notebook - these are a good starting point for
|
||||
configuring test variations - the `config-1k.json` is the baseline test described in the report.
|
||||
|
||||
At the moment, some of the config json files may still be targeting the `feat/hardening` branch and will give an
|
||||
error right away if you run them - change the branch in the Pubsub config panel to `master` and it should be all good.
|
||||
|
||||
If you want to target "vanilla" gossipsub (v1.0), you can set the branch to `release-v0.2` and uncheck the
|
||||
"target hardened API" checkbox in the UI.
|
||||
|
||||
After a successful run, you should see the path to the analysis notebook printed. Nav there with the Jupyter
|
||||
file browser to run the analysis notebook and generate the charts, etc.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Sometimes, especially if you're running with lots of instances, `weave` (the thing that manages the k8s data network)
|
||||
will give up the ghost, and one or more test instances will get stuck and be unable to communicate with the others.
|
||||
|
||||
If you never see the `All networks initialized` message in the testground output, or if it takes several minutes to
|
||||
get to `All networks initialized` after all instances are in the Running state, it's likely that you've hit this issue.
|
||||
|
||||
If weave has failed, you may see some weave pods stuck in a "not ready" state if you run
|
||||
|
||||
```shell
|
||||
kubectl validate cluster
|
||||
```
|
||||
|
||||
You can try forcibly removing the stuck weaves, although I can't find the magic commands for that at the moment.
|
||||
What I've been doing instead is scaling the cluster down to a single worker and then back up, to start with a clean
|
||||
slate. Scaling down to zero would probably be better, now that I think of it...
|
||||
|
||||
If you do hit the weave issue, you can try lowering the # of connections for the attacker, having fewer attackers,
|
||||
or packing the attacker peers less tightly into their containers by adjusting the number of attacker nodes and
|
||||
the number of attack peers per container. Spreading the attackers out over more containers may help, but you may
|
||||
also need to resize the cluster and add more worker VMs.
|
||||
|
||||
If you don't have enough resources, testground will fail you right away and helpfully tell you what the limit it.
|
||||
|
||||
You can control the CPU and RAM allocated for each test container by editing `~/testground/.env.toml` and restarting
|
||||
the testground daemon.
|
||||
|
||||
[tljh]: http://tljh.jupyter.org/en/latest/
|
||||
Reference in New Issue
Block a user