mirror of
https://github.com/MAGICGrants/MoneroAna.git
synced 2026-01-08 04:23:54 -05:00
submitted doc
This commit is contained in:
@@ -7,6 +7,7 @@
|
||||
\usepackage{subfigure}
|
||||
\usepackage{lineno}
|
||||
\usepackage{rotating}
|
||||
\usepackage[title]{appendix}
|
||||
\linenumbers
|
||||
|
||||
\usepackage{glossaries}
|
||||
@@ -14,10 +15,10 @@
|
||||
\input{snippets/glossary.tex}
|
||||
\makenoidxglossaries
|
||||
|
||||
\begin{document}
|
||||
\begin{document}
|
||||
\topmargin=.05in
|
||||
|
||||
\title{Report Update June 2023; Epistemology of Decoy Systems; Probing the Attacks on the Privacy of the Monero Blockchain}
|
||||
\title{Report Update July-August 2023; Epistemology of Decoy Systems; Probing the Attacks on the Privacy of the Monero Blockchain}
|
||||
\author{Nathan Borggren}
|
||||
\affiliation{CompDec}
|
||||
\date{\today}
|
||||
@@ -30,17 +31,25 @@
|
||||
|
||||
\tableofcontents
|
||||
|
||||
\section{Introduction}
|
||||
\input{snippets/introduction}
|
||||
|
||||
\input{snippets/value}
|
||||
|
||||
Check \gls{EAE} and \gls{TDA} \gls{rct}
|
||||
|
||||
\input{snippets/fits.tex}
|
||||
|
||||
\input{snippets/RingTDA.tex}
|
||||
\input{snippets/RingTDA}
|
||||
|
||||
\input{snippets/probTDA}
|
||||
|
||||
\input{snippets/EAE}
|
||||
|
||||
\input{snippets/software.tex}
|
||||
|
||||
|
||||
\input{snippets/notations.tex}
|
||||
|
||||
|
||||
\printnoidxglossaries
|
||||
|
||||
\bibliography{monero}
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 22 KiB After Width: | Height: | Size: 27 KiB |
@@ -5,6 +5,7 @@ booktitle = {arXiv},
|
||||
title = {{Simulated blockchains for machine learning traceability and transaction values in the Monero network}},
|
||||
year = {2020}
|
||||
}
|
||||
|
||||
@misc{Borggren2020a,
|
||||
abstract = {Copyright {\textcopyright} 2020, arXiv, All rights reserved. A variety of correlations are detected in the Monero blockchain. The joint distribution of the time- since-last-transaction between elements of pairs of RingCTs is enhanced in comparison with the product of the marginal distributions. Similarly there is an enhancement in the joint distribution of the hour timestamps between the same pairs. Lastly, we find another enhancement when the correlation is measured between the hour timestamps of the transaction itself and the elements of the RingCTs. We calculate some adjustments to the probabilities of which input in a RingCT is real, providing an additional heuristic to denoising the Monero blockchain.},
|
||||
author = {Borggren, N. and Yao, L.},
|
||||
@@ -15,8 +16,45 @@ year = {2020}
|
||||
|
||||
@misc{MiersZcash,
|
||||
title={{Blockchain Privacy; Equal Parts Theory and Practice}},
|
||||
author={Miers, Ian},
|
||||
url={https://zfnd.org/blockchain-privacy-equal-parts-theory-and-practice/},
|
||||
year = {2023}
|
||||
year={2023}
|
||||
}
|
||||
|
||||
@misc{MMF,
|
||||
title={{EAE Attack and Churning}},
|
||||
author={Borggren, Nathan},
|
||||
url={https://monerofund.org/projects/eae_attack_and_churning},
|
||||
year={2023}
|
||||
}
|
||||
|
||||
@misc{breakingChurn,
|
||||
title={{Breaking Monero Episode 09: Poisoned Outputs (EAE Attack)}},
|
||||
author={Monero Community Workgroup},
|
||||
url={https://www.youtube.com/watch?v=iABIcsDJKyM},
|
||||
year={2019}
|
||||
}
|
||||
|
||||
@book{villani2009optimal,
|
||||
title={Optimal transport: old and new},
|
||||
author={Villani, C{\'e}dric and others},
|
||||
volume={338},
|
||||
year={2009},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@misc{linear,
|
||||
title={{Linear Programming Review}},
|
||||
author={Burke, James},
|
||||
url={https://sites.math.washington.edu/~burke/crs/409/LP-rev/lp_rev_notes.pdf},
|
||||
year={2023}
|
||||
}
|
||||
|
||||
@book{wilmott2007paul,
|
||||
title={Paul Wilmott introduces quantitative finance},
|
||||
author={Wilmott, Paul},
|
||||
year={2007},
|
||||
publisher={John Wiley \& Sons}
|
||||
}
|
||||
|
||||
@article{hagberg2020networkx,
|
||||
@@ -120,3 +158,4 @@ year={2022}}
|
||||
year={2021}
|
||||
}
|
||||
|
||||
|
||||
|
||||
16
docs/snippets/EAE.tex
Normal file
16
docs/snippets/EAE.tex
Normal file
@@ -0,0 +1,16 @@
|
||||
\section{EAE Experiments}
|
||||
|
||||
Although this section is very incomplete I'll describe the experiments that are underway.
|
||||
For the sake of this effort 15 transactions were made.
|
||||
\begin{itemize}
|
||||
\item 5 churning transactions from myself (monero-cli) to myself.
|
||||
\item 5 transactions from myself to a popular self-custodial wallet.
|
||||
\item 5 transactions from myself to an exchange.
|
||||
\end{itemize}
|
||||
|
||||
These repeated transactions form the basis for our investigations of the 2-AE through 5-AE attack.
|
||||
The codes and results are still being verified, and I'll find a way to present the information in a redacted way for the sake of privacy.
|
||||
The preliminary results are that historical transactions can be found but spurious connections are also present.
|
||||
|
||||
We will be establishing experiments to find intersections of uncorrelated transactions to provide a background, ideally any false pair of transactions will also have intersections with similar statistics.
|
||||
|
||||
@@ -4,7 +4,7 @@ We will be using homology and persistent homology in multiple ways, the implemen
|
||||
|
||||
The main idea of TDA is that of \textit{persistence}; a sub-complex of a simplicial complex is constructed by providing a parameter and watching how that sub-complex changes from sub-complexes to the full simplicial complex as the parameter is swept. In our context, the transactions composing the ring are the vertices, the parameter being swept is the block height, and a vertex is joined with another vertex if its distance is within that height of the vertex. Persistent Homology uses the Union Find algorithm to find unions. In \ref{UnionFind} we show which set each transaction in a ring is a member of as the algorithm progresses. Each vertex begins as the singleton set containing just that vertex
|
||||
|
||||
In practice we will simply call \textit{Giotto's} Vietoris-Rips functionality and output a persistence diagram. Indeed this occurs when the \texit{MoneroAna.tx} object is instantiated. We expect these block height persistence diagrams to be used in a multitude of ways.
|
||||
In practice we will simply call \textit{Giotto's} Vietoris-Rips functionality and output a persistence diagram. Indeed this occurs when the \textit{MoneroAna.tx} object is instantiated. We expect these block height persistence diagrams to be used in a multitude of ways.
|
||||
\begin{itemize}
|
||||
\item Unsupervised Machine Learning; the diagrams themselves occupy a metric space and can be used for clustering (bottle neck distances, Wasserstein distances, Frechet mean)
|
||||
\item Supervised Machine Learning; the decoy algorithm is implemented at the wallet level, not the protocol level, as such multiple decoy models exist in the wild. An experiment could be to generate transactions from a variety of wallets and develop a model to predict which wallet a signer of some transaction is using. In the context of EAE attacks, an exchange can potentially ascertain the external wallet used by a customer.
|
||||
@@ -15,6 +15,44 @@ In practice we will simply call \textit{Giotto's} Vietoris-Rips functionality an
|
||||
|
||||
\subsection{Worked example}
|
||||
|
||||
For a given RingCT we'd like to be able to evaluate the likelihood subrings came from a decoy selection algorithm, find similar and comparable rings.
|
||||
We can also develop summary statistics about the nature of these rings and representations appropriate for Machine Learning.
|
||||
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{block_height}
|
||||
\caption{Persistence diagram showing the birth-death pairs of a single ring. A log scale is shown to separate the points on the graph. A persistence diagram is a concise representation of all the information shown in Tables \ref{tab:pers2}, \ref{tab:pers}, \ref{UnionFind}. These diagrams can be analyzed in bulk to find means, anomalies, a basis for Machine Learning and More!}
|
||||
\label{fig:alpha}
|
||||
\end{figure}
|
||||
|
||||
\begin{center}
|
||||
\begin{table}
|
||||
\begin{tabular}{|c|c|c|}
|
||||
|
||||
\hline
|
||||
\includegraphics[scale=0.3]{Pers2_00} & \includegraphics[scale=0.3]{Pers2_01} & \includegraphics[scale=0.3]{Pers2_02} \\ \hline
|
||||
\includegraphics[scale=0.3]{Pers2_12} & \includegraphics[scale=0.3]{Pers2_13} & \includegraphics[scale=0.3]{Pers2_14} \\ \hline
|
||||
\end{tabular}
|
||||
\caption{As the filtration progresses, holes are filled, joining neighboring transactions into a larger simplex. Only the first three and last three steps of the algorithm are shown, all of the structure during the intermediate heights is confined to the band on the right side, shown in greater detail in the next diagram.
|
||||
In this particular ring, one of the transactions is far older than the others, requiring a large parameter for the height to join the tx with the other transactions.}
|
||||
\label{tab:pers2}
|
||||
\end{table}
|
||||
\end{center}
|
||||
|
||||
\begin{center}
|
||||
\begin{table}
|
||||
\begin{tabular}{|c|c|c|}
|
||||
\hline
|
||||
\includegraphics[scale=0.3]{Pers_00} & \includegraphics[scale=0.3]{Pers_01} & \includegraphics[scale=0.3]{Pers_02} \\ \hline
|
||||
\includegraphics[scale=0.3]{Pers_03} & \includegraphics[scale=0.3]{Pers_04} & \includegraphics[scale=0.3]{Pers_05} \\ \hline
|
||||
\includegraphics[scale=0.3]{Pers_06} & \includegraphics[scale=0.3]{Pers_07} & \includegraphics[scale=0.3]{Pers_08} \\ \hline
|
||||
\includegraphics[scale=0.3]{Pers_09} & \includegraphics[scale=0.3]{Pers_10} & \includegraphics[scale=0.3]{Pers_11} \\ \hline
|
||||
\includegraphics[scale=0.3]{Pers_12} & \includegraphics[scale=0.3]{Pers_13} & \includegraphics[scale=0.3]{Pers_14} \\ \hline
|
||||
\end{tabular}
|
||||
\caption{
|
||||
As the filtration progresses, holes are filled, joining neighboring transactions into a larger simplex. The fine structure at the different orders of the filtration are evident as we have zoomed into just the right side of the previous diagram.}
|
||||
\label{tab:pers}
|
||||
\end{table}
|
||||
\end{center}
|
||||
|
||||
|
||||
\begin{center}
|
||||
|
||||
@@ -1,7 +1,44 @@
|
||||
\section{Fitting Decoy Distributions}
|
||||
|
||||
The obfuscation of the history of a transaction is a fascinating feature of the Monero blockchain.
|
||||
Every transaction is constructed with one or more rings and the real outputs are hidden amongst decoys.
|
||||
As a physicist, whose colleagues can tease out Higgs Bosons out of a slurry of particles, gravitational waves from the rest of the cosmic background, quantum coherence in a Faraday cage, the idea that one could hide a transaction among decoys, on a graph no-less, was an offensive one to me.
|
||||
Yet the decoy selection does seem to introduce enough Fear-Uncertainty-Doubt into a history to achieve the desired outcome of keeping the true history hidden.
|
||||
It certainly generates a mess while trying to explore and those smart-alecks who do use 300 inputs and 4000+ decoys in a transaction do successfully screech my brute-force approaches to a halt.
|
||||
However, my suspicions do remain, hence the methodologies conceived herein.
|
||||
|
||||
A few things are noteworthy of the implementation of the decoys.
|
||||
|
||||
\begin{itemize}
|
||||
\item Transactions are held for 10 blocks before they can be reused.
|
||||
\item To account for changes in volume that do occur, a dynamic approach is used in selection for the recent transactions.
|
||||
\item By default, a Gamma Distribution, that has a very thin tail for both long and short times, renders a poor fit for recent times, and makes old transactions in rings rather surprising.
|
||||
\item Decoys are administered at the wallet level, not the protocol level, and multiple decoy selection algorithms have been deployed in the wild. Some even repeat entire rings, or otherwise trivialize the detection of the real transaction.
|
||||
\item the decoy selection improves with time, but heuristics noted from the past persist through some block range.
|
||||
\item Methods have gone from static to dynamic and efforts are being made to replace decoys with zero-knowledge proof setups
|
||||
\end{itemize}
|
||||
|
||||
The details for which we are most concerned are the particular values for the probabilities associated with a given element of a given ring.
|
||||
We fit a gamma distribution to provide ourselves with a parameterized probability distribution we can subsequently call to determine the filtration parameter we will use in the Persistent Homology by Probability section.
|
||||
It has been pointed out to me that I used $log(block\, height)$ rather than $log(seconds)$, which could explain the deviation from expectations for the parameter results.
|
||||
This error provides a change of scale but not in change of ordering.
|
||||
|
||||
The resulting fits are shown for the alpha parameter in \ref{fig:alpha}, \ref{fig:beta}, \ref{fig:fits} below.
|
||||
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{gamma_alpha}
|
||||
\caption{Fits of the gamma parameter $\alpha$. Inset zooms in on the region of convergence.}
|
||||
\label{fig:alpha}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{gamma_beta}
|
||||
\caption{Fits of the gamma parameter $\beta$. Inset zooms in on the region of convergence.}
|
||||
\label{fig:beta}
|
||||
\end{figure}
|
||||
|
||||
\includegraphics[scale=0.5]{fits}
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{fits}
|
||||
\caption{The empirical, measured, and theoretical (erratum: wrong scale as described in text)}
|
||||
\label{fig:fits}
|
||||
\end{figure}
|
||||
@@ -1,4 +1,14 @@
|
||||
\newglossaryentry{AML}
|
||||
{
|
||||
name=Anti-Money-Laundering,
|
||||
description=An envelope term for laws and regulations enacted to counter terrorism financing and money laundering.
|
||||
}
|
||||
|
||||
\newglossaryentry{KYC}
|
||||
{
|
||||
name=Know-Your-Customer,
|
||||
description=Laws and regulations that require banking and other financial services to collect identifying information of customers using their service.
|
||||
}
|
||||
|
||||
\newglossaryentry{EAE}
|
||||
{
|
||||
|
||||
145
docs/snippets/introduction.tex
Normal file
145
docs/snippets/introduction.tex
Normal file
@@ -0,0 +1,145 @@
|
||||
\section{Introduction}
|
||||
|
||||
This document serves to report efforts and progress as recipient of a Magic Monero Fund \cite{MMF}. As such it does not aim to be a completed scientific work, so much as to be a starting point for discussion, collaboration, future effort and a source of mathematical definitions for issues raised in \cite{MiersZcash} and \cite{breakingChurn}.
|
||||
|
||||
In summary these issues correspond to information one party can glean about their counter-party through repeated transactions.
|
||||
The term \textit{EAE Attack} or \textit{Overseer attack} have been used. EAE stands for Eve-Alice-Eve, with the role of Eve usually being played by a government or an exchange.
|
||||
We would prefer to omit the words `attack' and `adversary' in favor of `analysis' and `counterparty' to adopt a more non-partisan stance.
|
||||
Eve is only acting in accordance with optimal play in a game theoretical sense, making due with all information available to her. Were this information gained by illicit acts, then `attack' might be more instructive, but the information spoken of, connections and transaction values, can largely be gained through ordinary transactions.
|
||||
It is a point of confusion, at least for me, when the ends of a transaction are referred to by their moral proclivities rather than their name.
|
||||
In the intelligence community, `attack' may refer to the attempt of money laundering where in the privacy oriented community `attack' refers to the attempt of tracing the flow of funds.
|
||||
My concern here is with the actual fungibility of Monero, and am concerned with senders and receivers not attackers and victims.
|
||||
Through the inventions of tor (US Navy) and the establishment of security standards (NSA), we can see the sometimes bogey-men are also equally active and encouraging in developing protocols for privacy.
|
||||
|
||||
\subsection{Global vs Local}
|
||||
|
||||
A juxtaposition of scales occurs naturally in the Monero blockchain.
|
||||
Connections can occur locally, for example through direct interaction with a counter-party, or global, from the spending of coins minted as some particular block.
|
||||
Mesoscales, neither local or global are also created courtesy connections to decoys that occurred at intermediate scales.
|
||||
Scales in value as well as time occur, though this information is generally hidden it can be collected over time by actors that interact with numerous parties through numerous transactions.
|
||||
|
||||
Recovering hidden values or assigning hidden values is a typical task in quantitative finance.
|
||||
From evaluating the prices of IPOs to derivatives on stocks or other assets, or even more abstract notions like risk and liquidity, determining hidden values or getting bounds on hidden values is common place.
|
||||
In the \textit{Value in the Monero Network} section some global approaches to determining value are discussed, one of them quantitatively.
|
||||
The EAE attack is generally local, even if the repeated transactions are over 6 months to a year, this still represents a small fraction of the total blockchain.
|
||||
Furthermore, not all transactions need to be explored, only a fraction of the total transactions will be present in a given taint tree.
|
||||
However, the information leaks of value are local, and the information propagate outwards constrains the expectation of value in other transactions.
|
||||
In \cite{Borggren2020} it was estimated that at the time $10-15\%$ of transactions involved ShapeShift as a counter-party.
|
||||
Thus ShapeShift or anyone substantially observant of the API for that time period has substantial information about the values of transactions.
|
||||
|
||||
The analysis for these global characteristics tends to be more computationally intense, but can be more straightforward to express mathematically.
|
||||
The attacks of interest in this study are more of the local varietal.
|
||||
A typical motif we'd analyze might be just a few transactions occurring over just minutes, hours, or days.
|
||||
A common and simple scenario is a small transaction, intended to verify receipt at an address, followed by the intended substantial transaction.
|
||||
We call this scenario the $2-AE$ motif and it already has the potential to leak some transaction history of the sender.
|
||||
Privacy-focused users may want to skip this verification step.
|
||||
|
||||
The information gained by an exchange, that is \textit{E} or receiver in a $2-AE$ analysis is that the receiver can assume the sender has signing capabilities of the two transactions.
|
||||
They can also garnish which output belongs to the sender and follow it forward through successive transactions.
|
||||
E can then check for intersections between the ring constituents, looking for overlap of previous transactions.
|
||||
Efforts have begun to detect and quantify these overlaps, which do indeed occur; common histories can indeed be found.
|
||||
What remains to be shown is that any two transactions would also have these common histories generated through the decoy selection methods.
|
||||
It is interesting that the lack of asymptotic statistics, that each block has 10s to 100s of transactions rather than 1000s to 10000s, is actually helping matters, since it increases the likelihood these spurious connections, common histories that are not real common histories, do actually occur.
|
||||
We do also find overlap of taint trees of these random pairs of transactions, and are investigating further wether or not these share sufficient statistics to obfuscate real EAE patterns.
|
||||
In the \textit{EAE experimental design section we set out on experiments to quantify this issue more succinctly.}
|
||||
|
||||
\subsection{Statistical Attacks vs Deterministic Attacks}
|
||||
What we can gather about the tracing capabilities purported by Chainalysis and others on the Monero blockchain are of the statistical varietal.
|
||||
We speculate as to how these analysis may proceed and what disbelief must be suspended to believe these analysis.
|
||||
Almost surely no judge on our great planet will sit through an \textit{almost surely} proof that there exists some possibility that the proposed transaction chain actually occurred and wait for this burden of proof to present itself.
|
||||
Warrants can be issued and subpoenas made on relatively sparse information.
|
||||
Property can be seized long before or entirely without a court decreeing to do so.
|
||||
We can imagine a range of responses along the Draconian spectrum ranging from tolerance to outright ban of Monero.
|
||||
On the laissez-faire end, law enforcement would rely entirely on the time stamps, the indelible truth of a transaction on Monero, and have to pull the thread of the actual humans/weapons/drugs being trafficked rather than the flow of cryptocurrency.
|
||||
Next would be guilt-by-association, which is similar to the logic of KYC laws already established.
|
||||
Herein interacting with scoundrels is tantamount to being a scoundrel oneself.
|
||||
Next would be guilt-by-bad-luck, where a party is considered a scoundrel by sharing a ring with a scoundrel.
|
||||
Finally, just guilt, you use Monero, ergo you are trying to hide your devious methods.
|
||||
We imagine, but don't know, that the United States is operating somewhere between guilt-by-association and guilt-by-bad-luck, as in if the probabilities are high enough, the federal jackets will sweep the floor.
|
||||
We can also imagine federal orders to Monero developers/miners that render it a violation of KYC to verify transactions over 10000 USD.
|
||||
|
||||
A retired NYPD officer, once upon a time implicated through spurious connections to the theft of the \textit{Star Ruby} from the American Museum of Natural History confided with me, `My innocence was besides the point. When all arrows point at you, all arrows are pointing at you.'
|
||||
There is thus the need to insure that the mixing and decoy selections that are occurring on the blockchain have the largest possible anonynimity set possible; rendering each transaction virtually indistinguishable with the other transactions that occurred at the same time.
|
||||
|
||||
This indistinguishability property is reminiscent of the early 20th century developments of Statistical Mechanics and ultimately Quantum Mechanics.
|
||||
Boltzmann inserted a $1/n!$ factor by hand to the partition functions in order to be consistent with the laws of thermodynamics.
|
||||
It took the introduction of Quantum Mechanics to explain what this factor was doing; accounting for the indistinguishability of the particles involved.
|
||||
No coloring of atoms or molecules was possible, one could never say `it was this $H_2O$ molecule not that one.'
|
||||
All $H_20$ molecules are effectively and actually the same, ie indistinguishable, the history of the trajectory of a molecule washed completely by thermodynamics and quantum mechanics.
|
||||
This level of indistinguishability should be a goal of Monero, currently transactions are like a red-dye propagating outwards, tainting it's path as it goes.
|
||||
We expect analogies from heat equations or fluid equations that quantify this mixing to be useful in the future, but we don't go down this pathway at this stage.
|
||||
|
||||
In the \textit{Fitting Decoy Distribution} sections we measure some empirical distributions, we can then for any given ring look at all $n-1$ sized subrings to order the ring constituents in order of likelihood.
|
||||
We suspect the algorithms pushed as tracing to be of this varietal, and one merely chooses to believe the order of likelihoods the algorithm suggests, which could be sufficient to sell a product to a government or other Overseer, and issue warrants, regardless of the actual quality of the algorithm.
|
||||
|
||||
The EAE attacks are not of this varietal though, the connections an Overseer seeks are deterministic connections, demanding consistency between possible histories until only one true history remains.
|
||||
The random variables we use for transaction values also collapse to their deterministic variables, the counter-parties do indeed know the value of the transactions.
|
||||
|
||||
\subsection{Hybrid Attacks}
|
||||
|
||||
Hybrid Attacks would involve pursuing EAE determinism through statistical means. Namely sampling.
|
||||
We explore sampling methods as we were defeated when trying to exhaustively explore all paths.
|
||||
These sampling methods at this stage are sampled from uniform distributions, but we are developing the Bayesian update steps to explore the more likely transactions in a ring first.
|
||||
We also are developing are sampling methods to be exhaustive, removing paths as they arise so as to not be sampled twice.
|
||||
|
||||
|
||||
\subsection{Partial vs Complete Information}
|
||||
|
||||
It is a goal for the privacy of Monero to be robust to small leakages of information, it should not matter globally if an exchange knows a few values and connections locally on the chain.
|
||||
Even large leaks where mass amounts of transaction information are present, should ideally be of negligible utility of transactions outside of that set.
|
||||
|
||||
\subsection{Connectivity in the Monero Network}
|
||||
|
||||
I can't speak for all parliaments across all nations and times, but we can suspect some common desires and choices with respect to the tracing of flows of funds across the Monero or any network.
|
||||
The fear from the government perspective is funds from illicit activity changes hands or funds change hands to finance illicit activity, their countermeasures evolved and are known as \gls{AML} (anti-money laundering).
|
||||
Obligations are placed upon exchanges to \gls{KYC}, "know your customer" to prevent such matters.
|
||||
If a currency comes about that can clear transactions while bypassing these measures it is likely that legal measures will evolve to mitigate or prevent this.
|
||||
This process has begun in many jurisdictions.
|
||||
Currencies like this already exist, however, the dollar, the euro, the yuan etc. and this fungibility is generally considered a necessary condition on a Money.
|
||||
|
||||
However, with the advent of cryptocurrencies, opportunist surveillance industries took advantage of the lack of fungibility implicit to most blockchains to trace the flows of funds, so much so that they've come to expect this capability.
|
||||
Similar parallels exist for end-to-end private messaging with government reactions spanning the whole spectrum of tolerance to outright ban.
|
||||
Monero is also experiencing the same range of reactions across the planet.
|
||||
This effort here in no way promotes money laundering, indeed I discourage it.
|
||||
It does, however, seek to make improvements towards removing the historical traceability of cryptocurrencies to push it towards a more cash-like state.
|
||||
Just like the onus is on a cash-only bagel store to honestly report their earnings and pay taxes etc accordingly, the onus of a monero-only bagel store is to do the same.
|
||||
Whether or not they do so is not my concern nor the developers of cash, credit, or crypto.
|
||||
|
||||
At the same time I have no moral objection with a person, government, or an exchange to use all information legally available to them to get a clearer picture of the world around them and understand the interactions they are engaged in.
|
||||
In the end we have a classic evolutionary Red Queen scenario with all parties sharper as a result.
|
||||
|
||||
Pardon the interlude/disclaimer just some heat blowing on my neck.
|
||||
|
||||
Monero seeks to hide the sand at the beach, anonymity through obscurity, and does so by adding decoys to inputs to hide the true input.
|
||||
From a traceability perspective the lack of decoys in the outputs is problematic though.
|
||||
From a tracer's perspective every output is important, if it isn't the sender it is the receiver; both parties are of interest.
|
||||
In the case of churning, both parties are even the same, all paths forward are relevant and in some sense equivalent.
|
||||
From either the sender's or the receiver's perspective, the outputs are wholly de-anonymized; both parties know which output is theirs and which isn't.
|
||||
This fact is important in the context of the EAE attacks as it allows parties to build up a profile of their counterparty.
|
||||
|
||||
Perhaps an equally important issue with the large ratio of decoys/outputs is simply that there is an inefficiency present.
|
||||
More entropy, paths/kbyte on the blockchain, is available with more outputs.
|
||||
Let m be the number of decoys and transactions present at the input of a transaction and n be the number of outputs.
|
||||
The number of paths goes as $m*n$ whereas the space on the blockchain goes as $m+n$.
|
||||
For $m+n = C$ for some constant C, the maximum number of paths occurs when the number of inputs is equal (or a difference of one when C is odd).
|
||||
For a typical transaction with one ring input with 16 transactions and 2 outputs, C is thus 18, and the number of paths could be 81 rather than 32 for the same byte-cost on the blockchain.
|
||||
This could perhaps be implemented by generating multiple stealth addresses for either the sender/the receiver or both and splitting the corresponding outputs between those.
|
||||
This however ignores the issue that all outputs would still be of interest.
|
||||
It could be interesting to either use the additional outputs to pay for mining rewards rather than aggregating the mining rewards into a coinbase transaction or having 0 XMR transactions to ghost addresses.
|
||||
This could also have the added entropic benefit of some of the coinbase transactions appearing like any other transaction as the outputs get reused in the future.
|
||||
|
||||
In a previous work, correlations among the different rings of a multi-input transaction were shown. \cite{Borggren2020a}
|
||||
This fact was purely statistical in nature, measured through counting, but it is possible that fears related to the EAE attack are already present at the multi-ring level.
|
||||
For example, for each pairwise combination between the two rings, run the taint tree backwards, just as you would investigating two transaction histories in the 2-AE attack.
|
||||
We know that there is an enhancement in counts present when there is similarity between block heights, but it could be the case that not only are they the same height, but coming from the same transactions.
|
||||
That is to say, if the decoys are not effectively mixing then the histories of the \textit{true} pair will overlap more than any other pair.
|
||||
Further efforts will explore if this is actually the case and if this statistical correlation can be rendered deterministic by deeper scrutiny of these pairwise taint trees.
|
||||
|
||||
This approach of course is rendered possible by the fact that there is one real transaction present in each ring.
|
||||
If there were rings entirely of decoys, or multiple real outputs in a single ring the correlations could be mitigated.
|
||||
Another approach could be to simply aggregate all the txs of all the rings into a single large ring, shuffle, and connect to the same outputs.
|
||||
With RingCT at 16, a transaction with two ring inputs has 256 possible pairs, whereas one RingCT of 32, two of which are real would have, 32 choose 2 or 496; nearly doubled.
|
||||
The situation is even more dramatic as the number of ring inputs increases. For the case of three ring inputs we'd have $\frac{(48 choose 3)}{16^3} \approx 4.22$, more than quadrupled the number of possibilities.
|
||||
A bonus benefit comes from a small drop in transaction bytes from the lack of a need of multiple ring hashes.
|
||||
|
||||
|
||||
23
docs/snippets/notations.tex
Normal file
23
docs/snippets/notations.tex
Normal file
@@ -0,0 +1,23 @@
|
||||
|
||||
\section{Notations}
|
||||
|
||||
\begin{table}[h]
|
||||
\begin{tabular}{|c|c|}
|
||||
\multicolumn{2}{c}{\bf{Notations}} \\ \hline
|
||||
$tx$ & transaction identifier (hash)\\
|
||||
$tx_j$ & j-th transaction in set (often a ring) \\
|
||||
$tx_{o,j}$ & transaction output \\
|
||||
$v(tx)$ & transaction value \\
|
||||
$r_j(tx)$ & j-th ring input to tx \\
|
||||
$r_j$ & j-th ring input when particular tx is implied \\
|
||||
$v(r_j(tx)))$ & value of j-th ring input \\
|
||||
$ r_j,tx_k;r_l,tx_m;...&path identifier: the kth transaction of the jth ring \\
|
||||
& followed by the mth transaction of the lth ring. \\
|
||||
$\{$ & start of a branching along a path \\
|
||||
$\}$ & end of a branch and return to parent node \\
|
||||
$r_0,\{0;2,5;1,3\}\{1;3,1;2,4\}$ & eg two paths out of the zeroth ring \\
|
||||
& 0th tx of $r_0$ followed by 5th tx of 2nd ring etc. \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
%\caption{Notation for\ldots}
|
||||
\end{table}
|
||||
34
docs/snippets/probTDA.tex
Normal file
34
docs/snippets/probTDA.tex
Normal file
@@ -0,0 +1,34 @@
|
||||
\section{Persistent Homology by probability}
|
||||
|
||||
(preliminary)
|
||||
|
||||
While persistence by height allows us to do some basic accounting and comparisons, it is not capturing the graph connectivity questions we are after.
|
||||
All transactions at a given time occupy the same set, they are not distinguishable one from the other.
|
||||
We introduce another construction that lets us try to connect with graph approaches to the analysis.
|
||||
|
||||
We will need a notion for distance, and we refer to the cdfs and fits computed in the fit section to do so.
|
||||
|
||||
\subsection{Taint Trees}
|
||||
|
||||
Persistence works a little bit differently than your intuition might have for probabilities.
|
||||
For example a path two-hops deep with .9 connecting the first and .9 connecting the second has probability of .81 of occurring, yet the two txs will already be connected when the filtration parameter reaches .9.
|
||||
|
||||
\subsection{Sampling Paths}
|
||||
|
||||
To sample paths each ring has a pymc categorical distribution over the RingCT that we can draw from. This distribution is also called in calls to the value of a ring or tx.
|
||||
|
||||
\includegraphics[scale=0.5]{pathstocoinbase}
|
||||
|
||||
\begin{center}
|
||||
\begin{table}
|
||||
\begin{tabular}{|c|c|}
|
||||
|
||||
\hline
|
||||
\includegraphics[scale=0.3]{pds_0} & \includegraphics[scale=0.3]{pds_1} \\ \hline
|
||||
\includegraphics[scale=0.3]{pds_2} & \includegraphics[scale=0.3]{pds_3} \\ \hline
|
||||
\end{tabular}
|
||||
\caption{As the filtration progresses, holes are filled, j}
|
||||
\end{table}
|
||||
\end{center}
|
||||
|
||||
\includegraphics[scale=0.5]{valscoinbase}
|
||||
@@ -4,19 +4,63 @@ A git repository containing this documentation and of the python codes generated
|
||||
|
||||
\subsection{Basic Classes}
|
||||
|
||||
Basic python classes were created to query and load the data as well as maintain close contact and syntax with the mathematics we will be using. As this analysis is primarily concerned with churning, EAE attacks, and other scenarios which can be characterized by involving relatively few actors and short time scales, the designs were made with composability and easy access in mind and to be used in a generative sense. For example $<, >, = , +, *, ^$ are being overwritten so as to extend the functionality and convenience of the objects.
|
||||
Basic python classes were created to query and load the data as well as maintain close contact and syntax with the mathematics we will be using. As this analysis is primarily concerned with churning, EAE attacks, and other scenarios which can be characterized by involving relatively few actors and short time scales, the designs were made with composability and easy access in mind and to be used in a generative sense. For example $<, >, = , +, *$ are being overwritten so as to extend the functionality and convenience of the objects.
|
||||
|
||||
Other options were presented for the loading and interacting with the data and database or csv approaches might be of more use for more statistical analysis of the entire blockchain. The use case here is directed towards the user (or attacker) who is trying to understand the history and co-history of a potentially small set of transactions. The objects have a registry keyword that provides a context, basically a dictionary of what has been looked at already, whose keys are the hash and values are the objects instance in memory.
|
||||
|
||||
One can count on an adversary to have access to reasonable time and computing resources and willingness to spend hours, days, and months tracing the history of transactions.
|
||||
We therefore aim that any outcome of such a query results in maximal confusion with the maximal number of transactions.
|
||||
|
||||
It was a design choice, since the focus of this work is the local behavior in n-AE analysis, to keep a registry of every transaction visited over the course of a taint-tree exploration.
|
||||
This registry is a python dictionary with keys the hash of the tx, and the value a pointer to the instance of the Tx object described here.
|
||||
The tx objects maintains a list of inputs and outputs and appends to them as the tx arises in other contexts.
|
||||
The persistent homology by probability is implemented by providing a distance matrix directly and is the focus of the research.
|
||||
From these registries the relevant distances can be computed and the homology may commence.
|
||||
|
||||
10000 blocks is around two weeks of blockchain and all transactions therein held simultaneously in memory was manageable with a common laptop.
|
||||
When an instance of Block or Tx are created, a single query is made to an explorer and populated with the information therein.
|
||||
Maintaining the registry prevents the need for repeated calls to the api.
|
||||
|
||||
\subsubsection{Block}
|
||||
|
||||
The block object is instantiated given a block height.
|
||||
\begin{itemize}
|
||||
\item called with block height
|
||||
\item txs attribute provides list of tx hashes for the block
|
||||
\item $get\_txs$ attribute is a function that instantiates the Tx class for all the txs.
|
||||
\item obeys arithmetic properties using the block height as an integer. (in dev)
|
||||
\end{itemize}
|
||||
|
||||
\subsubsection{Tx}
|
||||
|
||||
\begin{itemize}
|
||||
\item instantiated with a call to the tx hash
|
||||
\item possesses attributes with the same names as the explorer api
|
||||
\item has a list of sources and a list of sinks maintaining a history of contexts the tx has arisen in
|
||||
\item $get\_rings$ instantiates ring objects for each ring input of the transaction.
|
||||
\item taint an iterator over the rings and mixins (in dev)
|
||||
\item value attribute, usually zero for non-coinbase transactions to be replaced with (pymc) random variable discussed in text. (in dev)
|
||||
\item required for taint tree sampling path computations
|
||||
\end{itemize}
|
||||
\gls{tx}
|
||||
|
||||
\subsubsection{Ring}
|
||||
|
||||
A ring instance is called with a dictionary of inputs and a tx to serve as the parent node.
|
||||
Usually these are rings that have actually occurred on the blockchain, but we can do more.
|
||||
We can take the same ring of inputs and attach it to a different parent transaction,
|
||||
|
||||
\begin{itemize}
|
||||
\item called with a collection of tx inputs and a txo, providing a parent node
|
||||
\item txs attribute provides list of tx hashes for the block
|
||||
\item $get\_txs$ attribute is a function that instantiates the Tx class for all the txs.
|
||||
\item obeys arithmetic properties using the block height of parent node as an integer. (in dev)
|
||||
\end{itemize}
|
||||
\gls{rct}
|
||||
|
||||
\subsection{Taint Trees, Sampling Paths, and Paths to Coinbase}
|
||||
|
||||
Various functions have been created to enumerate and annotate the taint trees, sample paths up to a certain height, and create a bar code from the paths to coinbase as described in the text.
|
||||
The functions and documentation are in development.
|
||||
|
||||
|
||||
|
||||
99
docs/snippets/value.tex
Normal file
99
docs/snippets/value.tex
Normal file
@@ -0,0 +1,99 @@
|
||||
\subsection{Value in the Monero Network}
|
||||
|
||||
It is generally the case that tracers, like most folks, are more interested in large transactions than small ones.
|
||||
Although transaction values are obfuscated on the Monero blockchain there may be ways to recover some bounds.
|
||||
A few thoughts have occurred towards this end that I'll briefly discuss. One such avenue has quantitatively been explored.
|
||||
|
||||
\subsubsection{Value through Optimal Transport}
|
||||
If you replace the word `sand' with `cons' and `holes' with `wallets' in \cite{villani2009optimal} the rest follows.
|
||||
The classic picture in optimal transport is a pile of sand distributed over one region X, is moved into a distribution of sand over region Y.
|
||||
It takes some effort to move the sand from $x \in X$ to $y \in Y$, quantified by some cost c(x,y).
|
||||
A `Plan' is some strategy, a probability measure in the product space, $\pi \in P(X,Y)$. This plan specifies exactly which sand in X goes to which hole in Y.
|
||||
Optimal transport then seeks to find the optimal plan; the one which minimizes the cost to execute.
|
||||
|
||||
For our situation with Monero, we need the relaxed, Kantorovich formulation (as opposed to the Monge formulation) since the coins can and generally are split.
|
||||
|
||||
Specifically, let X be the set of all coinbase transactions and let Y be the set of all utxos.
|
||||
Usually we would normalize to unit mass, though here it could be more natural to normalize to coins in circulation.
|
||||
The constraining equation $\int_Y d\pi(x,y) = d\mu(x)$ would simply be the coinbase value of the x transaction, read directly off of the blockchain.
|
||||
The equation is a fancy way of saying \textit{The coinbase coins are now somewhere}.
|
||||
The complementary equation $\int_X d\pi(x,y) = d\nu(y)$ would then be the value corresponding to output y.
|
||||
It is a fancy way of saying \textit{the coins in this output came from somewhere}.
|
||||
|
||||
A countable set of comparable equations can be created, constraining the number of plans we need to optimize over, by noticing this equation has to hold regardless of what time we look for utxos.
|
||||
For any block height we can consider the utxos as of that block height.
|
||||
|
||||
The cost used to evaluate a plan could be the probability, as measured by inverting the measured cdfs, to move from coinbase to the utxo.
|
||||
Some of these costs are infinite, indeed all costs outside of the taint tree for a transaction would be infinite.
|
||||
They have the interpretation that no coins from transaction y, could have come from transaction x.
|
||||
Similar infinite values will occur when we look at TDA through the filtration probability.
|
||||
Again it means that there is no transaction history present that can link the two transactions.
|
||||
|
||||
We do not explore this approach more at this stage, but we note that the sampling methods we develop are indeed sampling these types of plans.
|
||||
|
||||
\subsubsection{Value through Derivative Pricing}
|
||||
|
||||
In a `risk neutral' framework the price of a derivative is simply the expectation value, the sum over all paths from present time to the expiry of the derivative, with each path weighted by the payout of that path times the likelihood of that path occurring\cite{wilmott2007paul}.
|
||||
Whereas it is the uncertainty of the future that sets the price of a stock derivative, it is the uncertainty of the past that sets the price of Monero in this analogy.
|
||||
This is to motivate the use of a stochastic variable in the place of the unknown value.
|
||||
We describe a preliminary approach to sampling this distribution, which will also relate to the distribution of the number of possible path histories for a given transaction.
|
||||
|
||||
Let us define a notion for `implied paths,' a stochastic variable, for a given path to a coinbase sample. Notice these paths are also sampling the space described in the previous Optimal Transport section.
|
||||
|
||||
\begin{equation}
|
||||
\#Implied\, Paths = \prod_{j=1}^{Max\, Depth} \frac{\# rings_j*\#mixins_j}{\#outputs_j}
|
||||
\end{equation}
|
||||
|
||||
Application of this equation and more discussion are included in the software section.
|
||||
A coupled set of equations is also used to describe value through these random variables.
|
||||
|
||||
\begin{equation}
|
||||
tx\, value = \sum_{j=1}^{rings}ring\, value(j)
|
||||
\end{equation}
|
||||
|
||||
\begin{equation}
|
||||
ring\, value = \sum_{j=1}^{decoys}tx\, value(j)*P(j)
|
||||
\end{equation}
|
||||
|
||||
Where $P(j)$ is the probability the jth transaction is the real transaction of the ring.
|
||||
Without additional knowledge this number is simply, $\frac{1}{\#\, decoys}$.
|
||||
As information is revealed, these probabilities could change, and even collapse to zero or 1.
|
||||
|
||||
The implied value of a tx from a single sample is simply $\#rings * coinbase\, value $
|
||||
|
||||
Although these formula only supply a stochastic look at the value of a given transaction, and thus do not achieve the deterministic goal we have for an EAE analysis, it is a belief of this author that these random variables when studied in bulk, can lead to some interesting measurements about the macroeconomics of Monero while maintaining privacy at the microeconomic level, which would be an achievement for the Monero developers.
|
||||
Also, as more gets known about the network, these distributions may end up getting tighter and tighter around particular values.
|
||||
Examples of such macroeconomic variables might be the effective money multipliers, average holding times, average transaction values, and with some additional assumptions, factorization methods (Principal Component Analysis (PCA) and Non-negative matrix factorization (NMF) the `Mapper' algorithm often associated with (TDA) come to mind) might be able to find `sectors' of the Monero economy.
|
||||
|
||||
|
||||
\subsubsection{Value through Linear Programming}
|
||||
|
||||
Despite the vast number of unknown values for unknown transactions there are equally as many constraints on these values\cite{linear}.
|
||||
Furthermore, these constraints are linear.
|
||||
|
||||
The first constraint is that the sum of the values of the inputs is equal to the sum of the outputs (for the simplicity of notation we will consider the contribution to the miner's reward as an output.
|
||||
|
||||
\begin{equation}
|
||||
\sum_{tx_i} v(tx_i) = \sum_{tx_o} v(tx_o)
|
||||
\end{equation}
|
||||
|
||||
The second constraint in it's most unassuming form is that the transaction value is greater than zero and less than the total number of coins in circulation.
|
||||
A much tighter constraint can be pulled from the taint tree.
|
||||
If we trace back the taint tree, every path originates as a coinbase transaction of some value.
|
||||
The upper bound then is merely the sum of all these coinbase values.
|
||||
This value would also be too large, as some paths exclude others yet all are counted, this number will still be much smaller than the total number of coins in circulation.
|
||||
Still we have an equation though for the constraint.
|
||||
|
||||
\begin{equation}
|
||||
0 < v(tx) < \sum_{coinbase_i} v(coinbase_i)
|
||||
\end{equation}
|
||||
|
||||
We still would need a function to optimize over these constraints, which remains to be discovered, but the impulse is a functional that assigns a likelihood to each configuration of values based on the measured cdfs.
|
||||
As an estimate, pretending we have a hundred transactions in a block, and three million blocks, we are left with an unholy linear programming problem of 300 million unknown variables.
|
||||
Unholy, perhaps, but not entirely out of the realm of computational tractability.
|
||||
We'd also have 600 million constraints.
|
||||
These constraints are also incredibly sparse and might be deeply parallelizable, and are not dis-similar to Traveling Salesmen type problems an Amazon or Uber has to try to solve.
|
||||
|
||||
This framework could also be important in the Overseer context, since an exchange that has collected 1000s to millions of these transaction details, can naturally just adjust the constraints to include the additional information they have gleaned and potentially dramatically simplify the problem.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user