327 Commits

Author SHA1 Message Date
Jack
c7deececab Added Paper 2022-08-31 19:34:11 -04:00
Jack
164ea4ad15 pip requirements command added 2022-08-28 12:51:23 -04:00
ACK-J
f1cbd8c354 Merge remote-tracking branch 'origin/main' into main 2022-08-17 22:04:21 -04:00
ACK-J
f77f48e02c - Filled in comments 2022-08-17 22:04:14 -04:00
Jack
792ed7d22e Added model weights 2022-08-17 21:22:54 -04:00
ACK-J
e7a292e1d4 - changed samples to weighted 2022-08-15 22:21:58 -04:00
ACK-J
551bf1f458 - changed micro f1 to avg f1 2022-08-15 10:14:51 -04:00
ACK-J
02990dc2bd - added micro f1 to testing data 2022-08-14 12:56:02 -04:00
ACK-J
4e6db7f16c - fixed NN plots 2022-08-14 11:54:31 -04:00
ACK-J
eca377f1f9 - changed weighted f1 to macro 2022-08-13 15:56:54 -04:00
ACK-J
8c22ac88b6 - fixed charts plotting precision
- reworked neural network
2022-08-13 13:12:48 -04:00
ACK-J
187a1010b0 fixed small training bugs 2022-08-12 11:45:03 -04:00
ACK-J
0ea9be222a Merge remote-tracking branch 'origin/main' into main 2022-08-10 17:15:48 -04:00
ACK-J
1ed0157962 reduced runs from 10 to 5 2022-08-10 17:15:34 -04:00
Jack
eb641af17a Added name of paper 2022-08-09 22:59:27 -04:00
Jack
07e6708d5e Added comments around sample code 2022-08-09 22:52:46 -04:00
Jack
354b93d0af Added more debugging detail 2022-08-09 22:46:41 -04:00
ACK-J
7b1c511408 - added multi processing for training GBC and RF 2022-08-09 22:28:34 -04:00
ACK-J
62e45dc1c2 - prepped ml files for hyper param run for paper 2022-08-09 01:08:13 -04:00
ACK-J
51a1bbd7f1 Merge remote-tracking branch 'origin/main' into main 2022-08-03 18:57:11 -04:00
ACK-J
8b01f5b3de - ml hyperparam tuning 2022-08-03 18:57:03 -04:00
Jack
f40227d96a added timing infor for datasets 2022-08-02 15:22:51 -04:00
Jack
b3b0679d16 Added diagrams of dataset fields 2022-08-01 16:36:17 -04:00
Jack
05f786780e Added testnet dataset link 2022-08-01 16:09:37 -04:00
Jack
8b5823e292 Placeholder for testnet dataset added 2022-08-01 14:16:31 -04:00
Jack
096d35ea0e Added download link 2022-07-31 21:55:44 -04:00
Jack
a3a740e8c4 Centered Table columns 2022-07-31 21:11:02 -04:00
Jack
8d79ee31e5 updated file info 2022-07-31 21:07:50 -04:00
ACK-J
a22aa609eb Merge remote-tracking branch 'origin/main' into main 2022-07-30 22:05:35 -04:00
ACK-J
6756ceac1e -restructure of ML files
- Added keep alive to postgresql queries
- code cleanup
2022-07-30 22:05:24 -04:00
Jack
801f69261f Update README.md 2022-07-29 20:05:57 -04:00
Jack
873fd2018d Update README.md 2022-07-29 20:04:08 -04:00
Jack
a1e9e5d2af Update README.md 2022-07-29 20:03:43 -04:00
Jack
af740fc16c Update README.md 2022-07-28 23:31:51 -04:00
Jack
0c987f2042 Update README.md 2022-07-28 21:35:17 -04:00
ACK-J
87b418daf2 - small clean up of ML files 2022-07-17 21:53:02 -04:00
ACK-J
b455246a1d - added comments
- fixed bug in undersampling rename
- improved error checking
- drops undersampled columns which provide no value to ML
2022-07-08 12:54:47 -04:00
ACK-J
8a412c500a - fixed csv detections for errors
- added params for database
2022-07-07 14:47:14 -04:00
ACK-J
faea9efe17 - Removed old commented out decoy code
- Fixed SQL query and added new decoy features to dataset
- added more error checks
2022-07-06 20:25:37 -04:00
ACK-J
67fab266c6 - Commented out irrelevant processing for future decoys which would have no impact on past spends
- Added a SQL query to Neptunes database to find relevant decoys
- Split ML into multiple files
2022-06-29 21:45:05 -04:00
ACK-J
5cd0542d68 - removed output future decoys since they were not used
- Added option to not undersample if predicting
- Fixed undersample validation
2022-06-21 22:01:51 -04:00
ACK-J
fda45cf1ca - added mainnet option to collect.sh
- cleaned up create_dataset.py comments
- fixed data integrity bug of undersampled dataset
2022-06-18 15:14:58 -04:00
ACK-J
d8a9da8deb - two critical bug fixes which processed the data incorrectly and messed up the order of the records
- added two integrity checks throughout the processing to ensure data stays in the correct order
- improved memory management
2022-06-16 20:42:29 -04:00
ACK-J
94b4141ae2 - rewrote undersampling to use pandas series instead of single row dataframes.
- Chunked the enumerated_X into 10 files to reduce RAM usage.
2022-06-14 14:12:24 -04:00
ACK-J
9a81f11b31 minor bug fix 2022-06-13 21:45:09 -04:00
ACK-J
fe7372bb8e major efficiency improvements 2022-06-13 21:43:06 -04:00
ACK-J
fd60b3af54 Added dataset export to CSV 2022-06-10 00:02:06 -04:00
ACK-J
f67ed53150 Added multiprocessing of the undersampling process 2022-06-08 19:23:29 -04:00
ACK-J
78a338f115 Removed caching 2022-06-07 10:58:09 -04:00
ACK-J
8bca257b4b Pandas concat was used 4 bytes of ram for each byte of data... this did not scale well. After a ton of debugging I found a slightly more memory efficient way that wouldn/'t crash the server. 2022-06-07 00:12:42 -04:00