mirror of
https://github.com/MAGICGrants/MoneroAna.git
synced 2026-01-06 19:43:56 -05:00
code and document updates.
This commit is contained in:
@@ -18,6 +18,13 @@ In practice we will simply call \textit{Giotto's} Vietoris-Rips functionality an
|
||||
For a given RingCT we'd like to be able to evaluate the likelihood subrings came from a decoy selection algorithm, find similar and comparable rings.
|
||||
We can also develop summary statistics about the nature of these rings and representations appropriate for Machine Learning.
|
||||
|
||||
In \ref{fig:alpha} we show the persistence diagram of a single ring. A log scale is shown to separate the points on the graph. A persistence diagram is a concise representation of all the information shown in Tables \ref{tab:pers2}, \ref{tab:pers}, \ref{UnionFind}.
|
||||
|
||||
Persistence diagrams are great at capturing structure at large scales. In \ref{tab:pers2} we see large scale structures; this guides the search to just bands of interest we can ignore or at least postpone queries for intersections in a large number of blocks when these diagrams are compared. We see as we zoom in at \ref{tab:pers}, the structure reappear as the filtration parameter is reduced.
|
||||
|
||||
Usual histograms have washed away a lot of this information, and require choices of bin widths that this process can circumvent (or guide).
|
||||
|
||||
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{block_height}
|
||||
\caption{Persistence diagram showing the birth-death pairs of a single ring. A log scale is shown to separate the points on the graph. A persistence diagram is a concise representation of all the information shown in Tables \ref{tab:pers2}, \ref{tab:pers}, \ref{UnionFind}. These diagrams can be analyzed in bulk to find means, anomalies, a basis for Machine Learning and More!}
|
||||
|
||||
@@ -1,12 +1,33 @@
|
||||
\section{Persistent Homology by probability}
|
||||
|
||||
(preliminary)
|
||||
|
||||
While persistence by height allows us to do some basic accounting and comparisons, it is not capturing the graph connectivity questions we are after.
|
||||
Nor does it allow us to explore the taint tree probabilistically.
|
||||
All transactions at a given time occupy the same set, they are not distinguishable one from the other.
|
||||
We introduce another construction that lets us try to connect with graph approaches to the analysis.
|
||||
|
||||
We will need a notion for distance, and we refer to the cdfs and fits computed in the fit section to do so.
|
||||
In the Ring object instantiation we require a set of decoys \textit{and} a reference tx, we label txo for origin transaction.
|
||||
This allows us to do a few things.
|
||||
The ring needn't be required to actually exist somewhere on the blockchain, we can instantiate it with a different txo and place the ring in the context of a different txo.
|
||||
It is the case that rings have been re-used for different transactions \footnote{Isthmus, Rucknium private communications}, but they will still differ by different txo (must also differ in the real input too), and different hash.
|
||||
|
||||
These different txos change the offset, how long one must integrate to get the proper cdf, and thus the probabilities will be shifted monotonically as well.
|
||||
Furthermore, we can take as input a height persistogram, along with some parameter, to find a different tx that could be `confused' with our tx (as in occupy the same simplex, and thus point to the same representative).
|
||||
This parameter when set to zero will force the sampled tx to have come from the same block as the target tx, and the probabilities will be identical.
|
||||
|
||||
The evaluations of the cdf in particular we are interested in are the integral of the pdf from time zero (the height of the txo) to the time of the height of the ring constituent.
|
||||
These give us the probabilities of the constituents being the real transaction.
|
||||
We can also consider the relative probabilities by normalizing; dividing by the sum of the evaluated cdfs.
|
||||
This has the more intuitive intrpretation of a weighted (currently) 16 sided dice.
|
||||
|
||||
We pivot to a distance notion by taking $1-q$ rather than q, so more likely things are the ones closer together, and certaicornties resolve on top of each other.
|
||||
|
||||
The registry objects, basically just a dictionary with keys the tx hash and values the tx object, can be used to construct the distance matrices we need to compute the homologies, or other graph metrics.
|
||||
We can recover spectra and other metrics for the corresponding graphs (1-skeletons) by setting the distance to one for each ring constituent.
|
||||
|
||||
A first attempt at constructing these matrices is included in the Taint-Explorer notebook.
|
||||
|
||||
|
||||
|
||||
\subsection{Taint Trees}
|
||||
|
||||
@@ -16,8 +37,24 @@ For example a path two-hops deep with .9 connecting the first and .9 connecting
|
||||
\subsection{Sampling Paths}
|
||||
|
||||
To sample paths each ring has a pymc categorical distribution over the RingCT that we can draw from. This distribution is also called in calls to the value of a ring or tx.
|
||||
We have considered all paths with equal opportunity at this stage.
|
||||
Fig. \ref{fig:ptoc} shows a histogram of 3300 paths to coinbase from a transaction.
|
||||
We haven't parameterized this histogram at this stage, but we expect it to be exponential with mean related to the probability of drawing a coinbase transaction out of the ring, which terminates the sampling path.
|
||||
We can construct persistence diagrams for any of these paths, height paths are used to show the four diagrams in Table \ref{tab:pdsSP}.
|
||||
For a given decoy selection algorithm, (or series, since this changes with block height), we can evaluate the likelihood of a given path to occur.
|
||||
Dynamic partition functions, that are weighted path integrals like this here, are called Maximum Caliber and have utility in statistical mechanics when the observables observed are not the energy paramater, but a categorical state (folded/unfolded, orbiting stationary points A,B,C etc. ).
|
||||
These diagrams are used to estimate the value of a given tx, and to probabilistically sample the taint tree.
|
||||
|
||||
We can also look at a distribution of the values of the coinbases at \ref{fig:valsc}.
|
||||
We expect taint trees of different txes with common true source to have comparable statistics.
|
||||
We need to check if transactions which could have been used interchangably as a decoy, also generate similar statistics.
|
||||
As mentioned in the Value section, these distrubitons of coinbases can be used to generate a probabilistic notion of value of an unknown tx.
|
||||
|
||||
\includegraphics[scale=0.5]{pathstocoinbase}
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{pathstocoinbase}
|
||||
\caption{A histogram of the length it takes to get to a coinbase, drawn from 3300 samples of a single transaction. These values can be used in the value expectation}
|
||||
\label{fig:ptoc}
|
||||
\end{figure}
|
||||
|
||||
\begin{center}
|
||||
\begin{table}
|
||||
@@ -27,8 +64,14 @@ To sample paths each ring has a pymc categorical distribution over the RingCT th
|
||||
\includegraphics[scale=0.3]{pds_0} & \includegraphics[scale=0.3]{pds_1} \\ \hline
|
||||
\includegraphics[scale=0.3]{pds_2} & \includegraphics[scale=0.3]{pds_3} \\ \hline
|
||||
\end{tabular}
|
||||
\caption{As the filtration progresses, holes are filled, j}
|
||||
\caption{Persistence diagrams of four sampled paths to coinbase. Diagrams with a few points have short trips to coinbase, diagrams with a lot of points have a lot of transactions prior to making it to coinbase. The spacings within the diagram specifies how large of block jumps were required to make it there.}
|
||||
\label{tab:pdsSP}
|
||||
\end{table}
|
||||
\end{center}
|
||||
|
||||
\includegraphics[scale=0.5]{valscoinbase}
|
||||
\begin{figure}[h]
|
||||
\includegraphics[scale=0.5]{valscoinbase}
|
||||
\caption{A distribution of the values at coinbase of the separate paths.
|
||||
An estimate of the value of an unknown tx is the mean of this distribution times the number of inputs divided by the number of outputs.}
|
||||
\label{fig:valsc}
|
||||
\end{figure}
|
||||
|
||||
17
overseer.txt
Normal file
17
overseer.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
2927514 in unlocked 2023-07-11 17:08:25Z 0.100000000000 9c57c572f9f9f5192bd06389d5ea4a166a4c18588f2910a255fc3561b3e85b0b 0000000000000000 0.000000000000 83aHpK:0.100000000000 1 -
|
||||
2927523 in unlocked 2023-07-11 17:27:53Z 85.900000000000 17aafff5003c91f1f3d33fdab293077d9d839e3d5147442004b4fae592aaab75 0000000000000000 0.000000000000 83aHpK:85.900000000000 1 -
|
||||
2931910 out - 2023-07-17 17:56:55Z 0.000000000000 8c827b848a4bd07c55fe959e4f5389692d9e2625e87ec33a92a49be82c7666a1 0000000000000000 0.000030640000 - 0 -
|
||||
2931914 out - 2023-07-17 18:05:28Z 0.000000000000 3e84234aa990efe2342b3b32cfc186299660c6e9c014b38e90962d5f82908e93 0000000000000000 0.000030720000 - 0 -
|
||||
2931919 out - 2023-07-17 18:16:37Z 0.000000000000 d7fe2c66925084abfa7afa4c620399fd4c374b386dc35d72713c441a21133a0b 0000000000000000 0.000030720000 - 1 -
|
||||
2931927 out - 2023-07-17 18:35:52Z 0.000000000000 6b1ca86936d9a924380829a4fc48983413d4b769374b31c29227f982b6b78189 0000000000000000 0.000030560000 - 0 -
|
||||
2931928 out - 2023-07-17 18:40:23Z 0.000000000000 ee45eddf5fb6bc97b5f9e7ab80ae64850cf78c462fdbb9ff63c050e1b47f7317 0000000000000000 0.000030600000 - 3 -
|
||||
2931944 out - 2023-07-17 19:02:26Z 0.000000000000 8126a96f50eac5fabcb0eb0d100c0e639564f689bb746c2b408632acfb5181b6 0000000000000000 0.000085600000 - 0,4 -
|
||||
2944703 out - 2023-08-04 14:31:39Z 1.000000000000 be869adbc71473428431989d976497e7a4b3f222440ab5e85edca25c5183bb23 0000000000000000 0.000030560000 - 5 -
|
||||
2944805 out - 2023-08-04 18:06:34Z 2.000000000000 5c6be4eac2c548156c49befe2632091680c07cd20722ef5ac6c0fc0e31be6dad 0000000000000000 0.000030700000 - 0 -
|
||||
2944958 out - 2023-08-04 23:35:56Z 3.000000000000 f3d856c31fe5ce8d2bc64d17ca83cc7e61db8b8068aec6b6bb7dc3b44fe64c13 0000000000000000 0.000044260000 - 3 -
|
||||
2945027 out - 2023-08-05 02:04:11Z 4.000000000000 66895cf13c0cb525fddfbd9b58db92fc5ce733a8fda7d33a236ddb72697304a1 0000000000000000 0.000030620000 - 0 -
|
||||
2945295 out - 2023-08-05 11:44:01Z 0.100000000000 e6da16f7147a37125aebf6ec36faf9fb0d55473e5f94a09175f6541deb15bdbb 0000000000000000 0.000030740000 - 0 -
|
||||
2945306 out - 2023-08-05 12:10:04Z 3.000000000000 2deb6a7ff79f400340440af51a953de1a194c9af627dc9ced9857a638b1c343e 0000000000000000 0.000030660000 - 0 -
|
||||
2945306 out - 2023-08-05 12:10:04Z 2.000000000000 a98835a38f507bcad90fded2690ac66489f32357c99398addccd5d5d1d077ede 0000000000000000 0.000030660000 - 0 -
|
||||
2945333 out - 2023-08-05 12:59:59Z 4.000000000000 a1874949d1aece80a383d790f0d2c06a572057a7da1dc386bc86a72b9c128267 0000000000000000 0.000030620000 - 0 -
|
||||
2945387 out - 2023-08-05 14:30:46Z 5.000000000000 4f17f31d3abe7b833c448ed6c4d39c0fde102dfdc7bc30ac76f29a1cab7ae661 0000000000000000 0.000044180000 - 0 -
|
||||
1112
sandbox/Taint_Explorer.ipynb
Normal file
1112
sandbox/Taint_Explorer.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -7,6 +7,7 @@ import gtda
|
||||
from gtda import homology
|
||||
import pymc as pm
|
||||
import random
|
||||
import pandas as pd
|
||||
|
||||
VR = homology.VietorisRipsPersistence(homology_dimensions=[0])
|
||||
|
||||
@@ -106,11 +107,13 @@ class ring:
|
||||
#self.txo.sinks.append(j)
|
||||
if j in registry:
|
||||
registry[j].sources.append(self.txo.tx_hash)
|
||||
registry[self.txo.tx_hash].sinks.append(j)
|
||||
mixins.append(registry[j])
|
||||
else:
|
||||
registry[j] = tx(j, registry=registry)
|
||||
|
||||
registry[j].sources.append(self.txo.tx_hash)
|
||||
registry[self.txo.tx_hash].sinks.append(j)
|
||||
mixins.append(registry[j])
|
||||
|
||||
return mixins
|
||||
|
||||
Reference in New Issue
Block a user