Monero Dataset Pipeline

A pipeline that automates the creation and transaction of monero wallets used to collect a dataset suitable for supervised learning applications.

Installation

sudo apt update
sudo apt install vim git jq expect tmux parallel python3 python3-tk bc curl python3-pip -y
pip3 install numpy
cd ~ && wget https://downloads.getmonero.org/cli/monero-linux-x64-v0.17.3.0.tar.bz2
tar -xvf monero-linux-x64-v0.17.3.0.tar.bz2 && cd monero-x86_64-linux-gnu-v0.17.3.0 && sudo cp monero* /usr/bin/ && cd ..
git clone git@github.com:ACK-J/Monero-Dataset-Pipeline.git && cd XMR-Transaction-Automation
chmod +x ./run.sh && chmod 777 -R Funding_Wallets/
# Make sure global variables are set
./run.sh

Stagenet Dataset

File Size Serialized Description
dataset.csv 1.4GB The exhaustive dataset including all metadata for each transaction in csv format.
dataset.json 1.5GB The exhaustive dataset including all metadata for each transaction in json format.
X.csv 4.1GB A modified version of dataset.csv with all features irrelevant to machine learning removed, in csv format.
X.pkl 6.5GB A modified version of dataset.json with all features irrelevant to machine learning removed, as a pickled pandas dataframe.
y.pkl 9.5MB A pickled list of python dictionaries which contain private information regarding the coresponding index of X.pkl.
X_Undersampled.csv 1.4GB A modified version of X.csv with all data points shuffled and undersampled.
X_Undersampled.pkl 2.3GB A modified version of X.pkl with all data points shuffled and undersampled.
y_Undersampled.pkl 325kB A pickled list containing the labels coresponding to the index of X_Undersampled.pkl.

Stagenet_Dataset_7_2_2022.7z 837 MB

How to load the dataset using Python and pickle

import pickle

with open("./Dataset_Files/dataset.json", "r") as fp:
    data = json.load(fp)
    
with open("./Dataset_Files/X.pkl", "rb") as fp:
    X = pickle.load(fp)
    
with open("./Dataset_Files/y.pkl", "rb") as fp:
    y = pickle.load(fp)
    
with open("./Dataset_Files/X_Undersampled.pkl", "rb") as fp:
    X_Undersampled = pickle.load(fp)
    
with open("./Dataset_Files/y_Undersampled.pkl", "rb") as fp:
    y_Undersampled = pickle.load(fp)

Problem Solving and Useful Commands

If Collect.sh throws the error: Failed to create a read transaction for the db: MDB_READERS_FULL: Environment maxreaders limit reached

/home/user/monero/external/db_drivers/liblmdb/mdb_stat -rr ~/.bitmonero/testnet/lmdb/

Check progress of collect.sh while its running

find ./ -iname *.csv | cut -d '/' -f 2 | sort -u

After running collect.sh gather the ring positions

find . -name "*outgoing*" | xargs cat | cut -f 6 -d ',' | grep -v Ring_no/Ring_size | cut -f 1 -d '/'

Data Collection Pipeline Flowcharts

Run.sh

Collect.sh

Create_Dataset.py

Description
No description provided
Readme 336 MiB
Languages
Python 84.1%
Shell 15.9%