mirror of
https://github.com/MAGICGrants/Monero-Dataset-Pipeline.git
synced 2026-01-09 13:37:57 -05:00
096d35ea0e68cb71e05ed7a9c71ccee681b30325
Monero Dataset Pipeline
A pipeline that automates the creation and transaction of monero wallets used to collect a dataset suitable for supervised learning applications.
Installation
sudo apt update
sudo apt install vim git jq expect tmux parallel python3 python3-tk bc curl python3-pip -y
pip3 install numpy
cd ~ && wget https://downloads.getmonero.org/cli/monero-linux-x64-v0.17.3.0.tar.bz2
tar -xvf monero-linux-x64-v0.17.3.0.tar.bz2 && cd monero-x86_64-linux-gnu-v0.17.3.0 && sudo cp monero* /usr/bin/ && cd ..
git clone git@github.com:ACK-J/Monero-Dataset-Pipeline.git && cd XMR-Transaction-Automation
chmod +x ./run.sh && chmod 777 -R Funding_Wallets/
# Make sure global variables are set
./run.sh
Stagenet Dataset
| File | Size | Serialized | Description |
|---|---|---|---|
dataset.csv |
1.4GB | The exhaustive dataset including all metadata for each transaction in csv format. | |
dataset.json |
1.5GB | ✅ | The exhaustive dataset including all metadata for each transaction in json format. |
X.csv |
4.1GB | A modified version of dataset.csv with all features irrelevant to machine learning removed, in csv format. | |
X.pkl |
6.5GB | ✅ | A modified version of dataset.json with all features irrelevant to machine learning removed, as a pickled pandas dataframe. |
y.pkl |
9.5MB | ✅ | A pickled list of python dictionaries which contain private information regarding the coresponding index of X.pkl. |
X_Undersampled.csv |
1.4GB | A modified version of X.csv with all data points shuffled and undersampled. | |
X_Undersampled.pkl |
2.3GB | ✅ | A modified version of X.pkl with all data points shuffled and undersampled. |
y_Undersampled.pkl |
325kB | ✅ | A pickled list containing the labels coresponding to the index of X_Undersampled.pkl. |
Dataset Download Link
Stagenet_Dataset_7_2_2022.7z 837 MB
-
Includes all files mentioned above in the Stagenet Dataset table, compressed using 7-zip
-
https://drive.google.com/file/d/1cmkb_7_cVe_waLdVJ9USdK07SPWgdgva/view
How to load the dataset using Python and pickle
import pickle
with open("./Dataset_Files/dataset.json", "r") as fp:
data = json.load(fp)
with open("./Dataset_Files/X.pkl", "rb") as fp:
X = pickle.load(fp)
with open("./Dataset_Files/y.pkl", "rb") as fp:
y = pickle.load(fp)
with open("./Dataset_Files/X_Undersampled.pkl", "rb") as fp:
X_Undersampled = pickle.load(fp)
with open("./Dataset_Files/y_Undersampled.pkl", "rb") as fp:
y_Undersampled = pickle.load(fp)
Problem Solving and Useful Commands
If Collect.sh throws the error: Failed to create a read transaction for the db: MDB_READERS_FULL: Environment maxreaders limit reached
/home/user/monero/external/db_drivers/liblmdb/mdb_stat -rr ~/.bitmonero/testnet/lmdb/
Check progress of collect.sh while its running
find ./ -iname *.csv | cut -d '/' -f 2 | sort -u
After running collect.sh gather the ring positions
find . -name "*outgoing*" | xargs cat | cut -f 6 -d ',' | grep -v Ring_no/Ring_size | cut -f 1 -d '/'
Data Collection Pipeline Flowcharts
Run.sh
Collect.sh
Create_Dataset.py
Description
Languages
Python
84.1%
Shell
15.9%


