Monero-Dataset-Pipeline

mirror of https://github.com/MAGICGrants/Monero-Dataset-Pipeline.git synced 2026-01-08 21:17:57 -05:00

Author	SHA1	Message	Date
Jack	c7deececab	Added Paper	2022-08-31 19:34:11 -04:00
Jack	164ea4ad15	pip requirements command added	2022-08-28 12:51:23 -04:00
ACK-J	f1cbd8c354	Merge remote-tracking branch 'origin/main' into main	2022-08-17 22:04:21 -04:00
ACK-J	f77f48e02c	- Filled in comments	2022-08-17 22:04:14 -04:00
Jack	792ed7d22e	Added model weights	2022-08-17 21:22:54 -04:00
ACK-J	e7a292e1d4	- changed samples to weighted	2022-08-15 22:21:58 -04:00
ACK-J	551bf1f458	- changed micro f1 to avg f1	2022-08-15 10:14:51 -04:00
ACK-J	02990dc2bd	- added micro f1 to testing data	2022-08-14 12:56:02 -04:00
ACK-J	4e6db7f16c	- fixed NN plots	2022-08-14 11:54:31 -04:00
ACK-J	eca377f1f9	- changed weighted f1 to macro	2022-08-13 15:56:54 -04:00
ACK-J	8c22ac88b6	- fixed charts plotting precision - reworked neural network	2022-08-13 13:12:48 -04:00
ACK-J	187a1010b0	fixed small training bugs	2022-08-12 11:45:03 -04:00
ACK-J	0ea9be222a	Merge remote-tracking branch 'origin/main' into main	2022-08-10 17:15:48 -04:00
ACK-J	1ed0157962	reduced runs from 10 to 5	2022-08-10 17:15:34 -04:00
Jack	eb641af17a	Added name of paper	2022-08-09 22:59:27 -04:00
Jack	07e6708d5e	Added comments around sample code	2022-08-09 22:52:46 -04:00
Jack	354b93d0af	Added more debugging detail	2022-08-09 22:46:41 -04:00
ACK-J	7b1c511408	- added multi processing for training GBC and RF	2022-08-09 22:28:34 -04:00
ACK-J	62e45dc1c2	- prepped ml files for hyper param run for paper	2022-08-09 01:08:13 -04:00
ACK-J	51a1bbd7f1	Merge remote-tracking branch 'origin/main' into main	2022-08-03 18:57:11 -04:00
ACK-J	8b01f5b3de	- ml hyperparam tuning	2022-08-03 18:57:03 -04:00
Jack	f40227d96a	added timing infor for datasets	2022-08-02 15:22:51 -04:00
Jack	b3b0679d16	Added diagrams of dataset fields	2022-08-01 16:36:17 -04:00
Jack	05f786780e	Added testnet dataset link	2022-08-01 16:09:37 -04:00
Jack	8b5823e292	Placeholder for testnet dataset added	2022-08-01 14:16:31 -04:00
Jack	096d35ea0e	Added download link	2022-07-31 21:55:44 -04:00
Jack	a3a740e8c4	Centered Table columns	2022-07-31 21:11:02 -04:00
Jack	8d79ee31e5	updated file info	2022-07-31 21:07:50 -04:00
ACK-J	a22aa609eb	Merge remote-tracking branch 'origin/main' into main	2022-07-30 22:05:35 -04:00
ACK-J	6756ceac1e	-restructure of ML files - Added keep alive to postgresql queries - code cleanup	2022-07-30 22:05:24 -04:00
Jack	801f69261f	Update README.md	2022-07-29 20:05:57 -04:00
Jack	873fd2018d	Update README.md	2022-07-29 20:04:08 -04:00
Jack	a1e9e5d2af	Update README.md	2022-07-29 20:03:43 -04:00
Jack	af740fc16c	Update README.md	2022-07-28 23:31:51 -04:00
Jack	0c987f2042	Update README.md	2022-07-28 21:35:17 -04:00
ACK-J	87b418daf2	- small clean up of ML files	2022-07-17 21:53:02 -04:00
ACK-J	b455246a1d	- added comments - fixed bug in undersampling rename - improved error checking - drops undersampled columns which provide no value to ML	2022-07-08 12:54:47 -04:00
ACK-J	8a412c500a	- fixed csv detections for errors - added params for database	2022-07-07 14:47:14 -04:00
ACK-J	faea9efe17	- Removed old commented out decoy code - Fixed SQL query and added new decoy features to dataset - added more error checks	2022-07-06 20:25:37 -04:00
ACK-J	67fab266c6	- Commented out irrelevant processing for future decoys which would have no impact on past spends - Added a SQL query to Neptunes database to find relevant decoys - Split ML into multiple files	2022-06-29 21:45:05 -04:00
ACK-J	5cd0542d68	- removed output future decoys since they were not used - Added option to not undersample if predicting - Fixed undersample validation	2022-06-21 22:01:51 -04:00
ACK-J	fda45cf1ca	- added mainnet option to collect.sh - cleaned up create_dataset.py comments - fixed data integrity bug of undersampled dataset	2022-06-18 15:14:58 -04:00
ACK-J	d8a9da8deb	- two critical bug fixes which processed the data incorrectly and messed up the order of the records - added two integrity checks throughout the processing to ensure data stays in the correct order - improved memory management	2022-06-16 20:42:29 -04:00
ACK-J	94b4141ae2	- rewrote undersampling to use pandas series instead of single row dataframes. - Chunked the enumerated_X into 10 files to reduce RAM usage.	2022-06-14 14:12:24 -04:00
ACK-J	9a81f11b31	minor bug fix	2022-06-13 21:45:09 -04:00
ACK-J	fe7372bb8e	major efficiency improvements	2022-06-13 21:43:06 -04:00
ACK-J	fd60b3af54	Added dataset export to CSV	2022-06-10 00:02:06 -04:00
ACK-J	f67ed53150	Added multiprocessing of the undersampling process	2022-06-08 19:23:29 -04:00
ACK-J	78a338f115	Removed caching	2022-06-07 10:58:09 -04:00
ACK-J	8bca257b4b	Pandas concat was used 4 bytes of ram for each byte of data... this did not scale well. After a ton of debugging I found a slightly more memory efficient way that wouldn/'t crash the server.	2022-06-07 00:12:42 -04:00

1 2 3 4 5 ...

327 Commits