Compare commits

..

661 Commits

Author SHA1 Message Date
Evgeny Medvedev
fbd57fc079 Merge pull request #523 from GarmashAlex/fix/broken-link
docs: fix broken link
2025-08-27 09:07:52 +07:00
GarmashAlex
8204c0827d docs: fix broken link 2025-08-26 19:09:29 +03:00
Evgeny Medvedev
46b91a9ff2 Merge pull request #522 from mdqst/patch-1
docs: fix broken video link
2025-08-22 20:49:33 +07:00
Dmitry
b5fd64bdca docs: fix broken video link 2025-08-22 14:50:56 +03:00
Evgeny Medvedev
d8547e9c7c Merge pull request #521 from Galoretka/fix/broken-link
fix(docs): update Ethereum JSON-RPC links in ETL jobs
2025-08-21 14:54:21 +07:00
Galoretka
7ef53859c1 fix:broken link 2025-08-21 10:51:51 +03:00
Galoretka
e38d1c1f2f fix: broken link 2025-08-21 10:51:22 +03:00
Evgeny Medvedev
43fe6b49b3 Merge pull request #519 from blockchain-etl/medvedev1088-patch-1
Remove gitter link in README.md
2025-04-30 15:26:44 +07:00
Evgeny Medvedev
db274c8a85 Update README.md 2025-04-30 15:24:40 +07:00
Evgeny Medvedev
69247042a4 Merge pull request #518 from oksanaphmn/patch-1
docs: add license badge
2025-04-30 15:23:15 +07:00
oksanaphmn
218e1e4356 Update README.md 2025-04-27 13:16:05 +03:00
Evgeny Medvedev
5e0fc8cc75 Merge pull request #516 from gap-editor/develop
deleted link to discord from 'contact.md'
2025-04-05 09:13:55 +07:00
Maximilian Hubert
77efda5106 Update contact.md 2025-04-04 20:35:34 +02:00
Evgeny Medvedev
ece0b7f422 Merge pull request #515 from VolodymyrBg/bg
docs: extension of documentation in index.md with the addition of adv…
2025-04-04 21:34:37 +07:00
VolodymyrBg
b31b76a73a Update index.md 2025-04-04 17:33:03 +03:00
VolodymyrBg
0cb7eb60b5 docs: extension of documentation in index.md with the addition of advanced features and new projects 2025-04-02 20:02:14 +03:00
Evgeny Medvedev
02943f7caf Merge pull request #514 from blockchain-etl/medvedev1088-patch-1
Update exporting-the-blockchain.md
2025-04-01 09:23:59 +07:00
Evgeny Medvedev
b844b95868 Update exporting-the-blockchain.md 2025-04-01 09:22:53 +07:00
Evgeny Medvedev
4d305a284f Merge pull request #513 from Hopium21/patch-1
remove broken link to D5.ai
2025-04-01 09:22:15 +07:00
Hopium
e161e6ef13 Update exporting-the-blockchain.md 2025-03-31 20:28:23 +02:00
Evgeny Medvedev
9b917b8ddd Update README.md 2025-03-04 19:15:39 +07:00
Evgeny Medvedev
383caf8331 Merge pull request #511 from Radovenchyk/patch-2
docs: removed the discord link
2025-03-04 19:13:25 +07:00
Radovenchyk
c61e91235f Update README.md 2025-03-04 11:36:05 +02:00
Evgeny Medvedev
0e4b4a894b Merge pull request #510 from Radovenchyk/patch-1
docs: added shield and twitter link
2025-03-03 21:50:15 +07:00
Radovenchyk
d58c1ebda7 Update README.md 2025-03-03 16:37:36 +02:00
Evgeny Medvedev
f0bf07e60c Merge pull request #509 from maximevtush/patch-1
Update LICENSE
2025-01-30 18:16:00 +08:00
Maxim Evtush
efe7acdc13 Update LICENSE 2025-01-30 11:07:04 +01:00
Evgeny Medvedev
20404eca9e Merge pull request #506 from romashka-btc/code/fix
typos/fix
2024-12-19 11:33:47 +08:00
Romashka
435cbe0a74 typo-Update exporters.py 2024-12-18 20:36:43 +02:00
Romashka
b606e22cd5 typo-Update exporters.py 2024-12-18 20:36:18 +02:00
Evgeny Medvedev
4943b0b795 Merge pull request #505 from XxAlex74xX/patch-1
typo README.md
2024-12-18 15:51:25 +08:00
XxAlex74xX
eed2068def Update README.md 2024-12-18 07:38:38 +01:00
Evgeny Medvedev
313b4b1237 Merge pull request #503 from Guayaba221/develop
docs fix spelling issues
2024-12-15 21:00:53 +08:00
planetBoy
ad6149155e Update exporting-the-blockchain.md 2024-12-15 10:11:54 +01:00
Evgeny Medvedev
c55c0f68dc Merge pull request #502 from futreall/develop
Fix significant typo in documentation
2024-12-15 11:43:33 +08:00
futreall
b031b04bc7 Update google-bigquery.md 2024-12-14 20:40:47 +02:00
Evgeny Medvedev
b314f1ed0c Merge pull request #501 from vtjl10/develop
fix: typos in documentation files
2024-12-15 00:21:31 +08:00
fuder.eth
61eb2e6e21 Update README.md 2024-12-14 13:27:01 +01:00
Evgeny Medvedev
9f62e7ecea Merge pull request #492 from nnsW3/docs-improvement
Docs improvement
2024-06-26 09:41:09 +08:00
Elias Rad
4da7e7b23f fix README.md 2024-06-25 20:06:41 +03:00
Elias Rad
de72ba3511 fix origin.py 2024-06-25 20:04:51 +03:00
Elias Rad
3aabf9aa54 fix schema.md 2024-06-25 20:02:55 +03:00
Elias Rad
284755bafc fix limitations.md 2024-06-25 20:02:26 +03:00
Elias Rad
23133594e8 fix index.md 2024-06-25 20:02:14 +03:00
evgeny
ca54ef6c4b Bump version 2024-04-11 19:42:39 +07:00
Evgeny Medvedev
836f30e198 Merge pull request #488 from blockchain-etl/add_dencun_fields_to_postgres_tables
Add Dencun fields to postgres_tables.py
2024-04-11 20:41:49 +08:00
evgeny
1c6508f15d Add Dencun fields to postgres_tables.py 2024-04-11 19:38:27 +07:00
Evgeny Medvedev
a4d6f8fcb1 Merge pull request #487 from blockchain-etl/add_readthedocs_yaml
Add .readthedocs.yaml
2024-04-11 10:58:07 +08:00
evgeny
bc79d7d9bf Add .readthedocs.yaml 2024-04-11 09:56:49 +07:00
medvedev1088
7fdcf0f7b7 Bump version 2024-04-03 12:42:38 +08:00
medvedev1088
d3330f7ddc Bump version 2024-04-03 12:21:40 +08:00
Evgeny Medvedev
1066ec9025 Merge pull request #484 from blockchain-etl/dencun_upgrade
Add EIP-4844 (Dencun) columns
2024-04-03 12:20:11 +08:00
medvedev1088
2a92ecbf31 Revert column width in schema.md 2024-03-29 14:38:23 +08:00
medvedev1088
c238e8b57b Add withdrawals_root and withdrawals to schema.md 2024-03-29 14:17:31 +08:00
medvedev1088
a27d2427e1 Trigger build 2024-03-29 13:47:54 +08:00
medvedev1088
c18f78506c Add tests for Dencun transactions 2024-03-29 13:37:18 +08:00
medvedev1088
23bad940db Fix slow tests 2024-03-29 13:11:35 +08:00
medvedev1088
0a52db4b8a Fix tests 2024-03-29 13:06:02 +08:00
medvedev1088
9fd1f906f2 Add EIP-4844 fields to blocks, fix missing comma, updated enrich.py, update docs 2024-03-29 12:44:44 +08:00
Evgeny Medvedev
f08f93ddfe Merge pull request #474 from haiyanghe/develop
Dencun transaction and transaction receipt fields
2024-03-29 11:55:58 +08:00
Evgeny Medvedev
9e51c3b8d4 Merge pull request #478 from blockchain-etl/delete_funding_yaml
Delete FUNDING.yml as the link is broken
2024-03-12 12:24:00 +08:00
medvedev1088
79d341ea45 Delete FUNDING.yml as the link is broken 2024-03-12 12:23:27 +08:00
haiyanghe
9db1ff104a Dencun transaction and transaction receipt fields 2024-02-07 20:19:49 +00:00
medvedev1088
952a49ba4b Bump version 2023-08-25 17:14:23 +07:00
Allen Day
aab122ebf3 Merge pull request #456 from sfsf9797/fix-datatype
Fix datatype
2023-08-23 14:02:50 +08:00
sfsf9797
438c9af751 fix error msg 2023-08-18 00:41:16 +08:00
sfsf9797
3ec2af25e1 fix 2023-08-18 00:40:18 +08:00
medvedev1088
84101407c1 Bump version 2023-08-02 20:17:29 +08:00
Evgeny Medvedev
97a0275ced Merge pull request #446 from MSilb7/develop
Add Optimism / OP Stack Transaction Receipt Fields
2023-08-02 20:11:52 +08:00
Michael Silberling
7cbfd0e533 Update receipts.sql
fee scalar to decimal
2023-07-06 17:19:27 -04:00
Michael Silberling
94ebd3f3e9 mod scalar to be a float 2023-06-02 16:33:51 -04:00
Michael Silberling
c0fd158211 add to streaming 2023-06-01 15:22:59 -04:00
Michael Silberling
7529c43f4e update 2023-06-01 11:59:55 -04:00
Michael Silberling
ce906f0af1 Merge branch 'blockchain-etl:develop' into develop 2023-06-01 11:54:06 -04:00
Michael Silberling
eaf4bf0bf2 add tests 2023-06-01 11:53:33 -04:00
Evgeny Medvedev
1a0a8cf0f8 Merge pull request #449 from vinhloc30796/fix/minor-error-msg
Message: "start_timestamp must be less than end_timestamp"
2023-06-01 16:08:43 +08:00
Loc Nguyen
f0e4302423 Fix again 2023-06-01 14:36:20 +07:00
Loc Nguyen
fb35431aa7 start_timestamp must be lesser or equal to end_timestamp 2023-06-01 14:13:57 +07:00
Evgeny Medvedev
87b1669434 Merge pull request #447 from blockchain-etl/fix_build_wrong_ssl_version
Restrict urllib3 v2 as it breaks the build
2023-05-29 18:00:46 +08:00
medvedev1088
9678bb91c7 Add version restriction for urllib3 as it breaks Travis build 2023-05-29 17:55:14 +08:00
medvedev1088
f4e2b57463 Install latest libssl-dev to fix build error in Travis CI 2023-05-29 17:36:05 +08:00
Michael Silberling
6599a438a0 add to sql 2023-05-26 13:05:22 -04:00
Michael Silberling
f8a5f25376 rm comma 2023-05-25 18:15:21 -04:00
Michael Silberling
de96e394ee rm l1 gas paid 2023-05-25 18:02:38 -04:00
Michael Silberling
a58fe4585d Revert "Update README.md"
This reverts commit aae968cd4b.
2023-05-25 18:01:25 -04:00
Michael Silberling
f8878ff320 Revert "Update README.md"
This reverts commit 84518f70ae.
2023-05-25 18:01:22 -04:00
Michael Silberling
993ebe67c8 Revert "Update README.md"
This reverts commit af2ef17832.
2023-05-25 18:01:12 -04:00
Michael Silberling
f967d73a95 Revert "Update README.md"
This reverts commit e8b0447a63.
2023-05-25 18:01:06 -04:00
Michael Silberling
e8b0447a63 Update README.md 2023-05-19 15:29:48 -04:00
Michael Silberling
af2ef17832 Update README.md 2023-05-19 15:29:39 -04:00
Michael Silberling
161aa6e472 add error check for l1_gas_used_paid 2023-05-19 15:26:17 -04:00
Michael Silberling
7c80c09500 Merge branch 'develop' of https://github.com/MSilb7/optimism-etl into develop 2023-05-19 15:19:09 -04:00
Michael Silberling
3affbadac3 comma 2023-05-19 15:19:01 -04:00
Michael Silberling
84518f70ae Update README.md 2023-05-19 15:14:37 -04:00
Michael Silberling
aae968cd4b Update README.md 2023-05-19 15:14:09 -04:00
Michael Silberling
6f44daf023 add l1 fields 2023-05-19 15:12:47 -04:00
TimNooren
2da9d050f4 Relax Click requirement (#444)
* Update post-shanghai test cases

Cases were based on Goerli while awaiting the mainnet upgrade.

* Relax Click requirement
2023-05-02 20:54:10 +08:00
TimNooren
2939c0afbf Update Github Actions runner image (#435) 2023-04-04 20:11:33 +08:00
TimNooren
2678a2a2e3 Add withdrawals (EIP-4895) (#434)
* Update IPFS gateway

* Add format parameter to test_export_blocks_job

* Add withdrawals field to block model

* Bump package version
2023-04-04 19:15:43 +08:00
Evgeny Medvedev
d801da96dd Merge pull request #432 from blockchain-etl/update_docs_nansen_link
Update link to Nansen Query in docs
2023-03-17 15:05:26 +07:00
medvedev1088
b876f2059e Update link to Nansen Query in docs 2023-03-17 16:02:18 +08:00
Evgeny Medvedev
204bcb65f6 Merge pull request #431 from blockchain-etl/update_readme_nansen_link
Update link to Nansen Query
2023-03-17 14:59:35 +07:00
medvedev1088
92c07982c4 Update link to Nansen Query 2023-03-17 15:55:53 +08:00
Maxim Razhev
b6dbf07dbf Bump package version (#420) 2022-12-09 18:11:14 +07:00
sleepy-tiger
f0732961f5 Upgrade eth-abi lower bound to >=2.2.0 (#419) 2022-12-09 18:03:32 +07:00
medvedev1088
8498a775da Update link to Telegram group 2022-10-14 18:33:28 +08:00
medvedev1088
f0e98871a2 Bump version 2022-10-14 18:21:00 +08:00
Evgeny Medvedev
f7f192510b Merge pull request #302 from blockjoe/develop
Fallback to `web3.eth.getLogs` when calling to nodes without `eth_newFilter`
2022-10-14 17:19:57 +07:00
Evgeny Medvedev
b1acfa3be7 Merge pull request #343 from sfsf9797/set_csv_limit
fix for #306 'field larger than field limit' error
2022-10-14 17:17:03 +07:00
Evgeny Medvedev
372bf2cb16 Merge pull request #365 from ezioruan/patch-1
Update schema.md
2022-10-14 17:16:20 +07:00
Evgeny Medvedev
45a089fe0c Merge pull request #383 from blockchain-etl/update_docs
Add --txlookuplimit 0 in commands.md
2022-10-14 17:15:08 +07:00
Evgeny Medvedev
688ecdfa3f Merge pull request #399 from m0ssc0de/develop
set max field size limit for export token transfers
2022-10-14 17:14:17 +07:00
medvedev1088
0f6234ade3 Bump version 2022-10-14 14:22:22 +08:00
Evgeny Medvedev
47308f4891 Merge pull request #359 from CoinStatsHQ/pr/aws-kinesis-support
Added support for AWS Kinesis
2022-10-14 13:19:43 +07:00
Moss
2c91a31061 set max field size limit for export token transfers 2022-10-02 22:56:57 +00:00
Evgeny Medvedev
956695b77b Merge pull request #396 from blockchain-intel/kafka-export-rm-print
Change print(data) in Kafka exporter to instead log at the debug level
2022-09-26 22:25:48 +07:00
Evgeny Medvedev
533f516296 Merge pull request #395 from blockchain-etl/feat/remove-quickfix-traces
Remove TempFix for insufficient funds since this has been resolved on the node
2022-09-23 11:23:03 +07:00
Max Cruz
d34b28e4bf rm print data in kafka exporter 2022-09-22 10:39:23 -04:00
Akshay
3ed8b8bc3e remove traces quickfix 2022-09-21 17:48:22 +08:00
medvedev1088
e1f658bc36 Bump version 2022-09-16 03:06:22 +08:00
Evgeny Medvedev
aae2edb20b Merge pull request #393 from blockchain-etl/bump_pubsub_version
Bump google-cloud-pubsub version
2022-09-16 02:05:29 +07:00
medvedev1088
12851c17a5 Bump google-cloud-pubsub version 2022-09-16 03:02:38 +08:00
medvedev1088
f5115547a3 Bump version 2022-09-16 00:27:05 +08:00
Evgeny Medvedev
58f5d9020c Merge pull request #392 from blockchain-etl/path_insufficient_funds_error_on_erigon
Temporary fix for the insufficient funds error when tracing a block
2022-09-15 23:26:21 +07:00
medvedev1088
f5fa89a916 Temporary fix for the insufficient funds error https://github.com/ledgerwatch/erigon/issues/5284 2022-09-16 00:05:49 +08:00
medvedev1088
262e5f65f1 Bump version 2022-08-16 22:27:40 +08:00
Evgeny Medvedev
6b64c2338b Merge pull request #371 from FeSens/bugfix/contract-logic-error
Bugfix: Ignore ContractLogicError raised by Web3.py
2022-08-16 21:26:24 +07:00
Evgeny Medvedev
be64a901ab Merge pull request #372 from yongchand/develop
Fix typo in export receipts and logs
2022-08-15 14:43:47 +07:00
medvedev1088
97e2749f2a Add --txlookuplimit 0 in commands.md 2022-08-13 16:50:06 +08:00
Evgeny Medvedev
ca9eb6696b Merge pull request #382 from blockchain-etl/update_docs
Add sudo apt-get install python-dev in .travis.yml
2022-08-12 19:55:22 +07:00
medvedev1088
6c3a0694a3 Lock grpcio version 2022-08-12 20:50:50 +08:00
medvedev1088
837c324448 Add sudo apt-get install python-dev in .travis.yml 2022-08-12 20:42:31 +08:00
medvedev1088
7ef53acee0 Remove unused file 2022-08-12 20:08:05 +08:00
Evgeny Medvedev
119a54fca1 Merge pull request #379 from blockchain-etl/update_docs
Remove --ipcapi debug from geth as it was deprecated
2022-08-11 12:12:43 +07:00
medvedev1088
cb0f955c27 Update Discord invite 2022-08-11 13:11:21 +08:00
medvedev1088
9725ff9122 Remove --ipcapi debug from geth as it was deprecated 2022-08-10 20:41:49 +08:00
yongchand
a142542ef9 Fix typo in export receipts and logs 2022-07-28 13:22:26 +09:00
FeSens
342c5df3bb ignore ContractLogicError 2022-07-25 19:15:22 -03:00
ezio ruan
d189e7a344 Update schema.md
add block_number to token schema
2022-07-08 17:48:01 +08:00
Evgeny Medvedev
f8f22f93a1 Merge pull request #358 from CoinStatsHQ/pr/aws-schemas-updated
AWS Athena schemas update to JSON
2022-06-19 01:56:01 +07:00
Anton Bryzgalov @ CoinStats
f4403a7e3f Added support for AWS Kinesis
Sponsored by CoinStats.app
2022-06-17 22:21:08 +04:00
Anton Bryzgalov @ CoinStats
4ee070627c schemas/aws: removed debug queries
Sponsored by CoinStats.app
2022-06-10 14:36:42 +04:00
Anton Bryzgalov @ CoinStats
7a337e724a schemas: aws_partition_by_date -> aws
Sponsored by CoinStats.app
2022-06-10 13:56:43 +04:00
Anton Bryzgalov @ CoinStats
ac812a0f36 schemas/aws_partition_by_date: schemas updated to JSON format
Sponsored by CoinStats.app
2022-06-10 13:56:02 +04:00
medvedev1088
1711d2e809 Bump version 2022-05-24 15:45:34 +08:00
Evgeny Medvedev
d251f21b04 Merge pull request #352 from ytrezq/patch-1
Transfer extractor : fix string compare bug by switching to case insensitive string comparison.
2022-05-24 15:43:47 +08:00
ytrezq
dcdc776c1b Fix string compare bug by switching to case insensitive string comparison.
Some nodes as a service don’t return the result in lowercase but use the ᴇɪᴘ-55 checksum format or in uppercase.
This results in some transfers being rejected whereas the topic match.
2022-05-23 21:06:25 +02:00
medvedev1088
59ddb23f45 Fix token addresses in commands.md 2022-05-15 22:37:01 +08:00
medvedev1088
64adeb77a8 Bump version 2022-05-09 00:49:56 +08:00
Evgeny Medvedev
caff3065f7 Merge pull request #346 from blockchain-etl/bump_pg8000_version
Bump pg8000 version
2022-05-09 00:48:41 +08:00
medvedev1088
d5567bf343 Bump pg8000 version to fix https://github.com/blockchain-etl/ethereum-etl/issues/345 2022-05-09 00:36:00 +08:00
medvedev1088
26e940224b Add notes about Apple M1 chip to README 2022-05-06 19:43:06 +08:00
medvedev1088
5efa6e0eb9 Bump version 2022-05-06 19:34:32 +08:00
Evgeny Medvedev
53c1b59c84 Merge pull request #339 from dbfreem/develop
web3 upgrade
2022-05-06 19:30:29 +08:00
sfsf9797
8c9d6a62cc set max field size limit 2022-05-05 00:08:33 +08:00
medvedev1088
d085d5a5a4 Bump version 2022-05-04 16:57:45 +08:00
Evgeny Medvedev
43227e54b2 Merge pull request #316 from ninjascant/fix/update-click
Update click version; update package version
2022-05-04 16:56:27 +08:00
Maxim Razhev
00e63d2b83 Update version at cli 2022-05-04 13:40:44 +05:00
Maxim Razhev
d58e72974a Resolve conflicts 2022-04-28 17:41:54 +05:00
Maxim Razhev
817660199c Resolve conflicts 2022-04-28 17:37:39 +05:00
DB
50925fc94d 3.7.2 in travis 2022-04-24 13:24:58 -04:00
DB
e63e703390 testing travix 2022-04-24 13:19:15 -04:00
DB
8a87ba85e3 trying to setup travix to run in 3.7.2 2022-04-24 11:54:14 -04:00
DB
15ff2a2ecb set min version to 3.7.2 in setup.py 2022-04-24 11:51:13 -04:00
DB
e511dac818 travis remove 3.6 2022-04-19 05:19:17 -04:00
DB
64d16f581b upgraded web3,py and eth-abi
removed python 3.6
2022-04-17 22:02:48 -04:00
Evgeny Medvedev
898ce3f3bf Merge pull request #326 from alexleventer/docs-improvements
Docs improvements
2022-04-11 15:01:26 +08:00
Evgeny Medvedev
da6cc6f653 Merge pull request #325 from alexleventer/links
GitHub Edit Links
2022-04-11 15:00:15 +08:00
Alex Leventer
53c74e9996 Convert text to a link 2022-04-10 12:35:38 -07:00
Alex Leventer
67e27a6536 add missing comma 2022-04-10 12:34:28 -07:00
Alex Leventer
3a28eb116d Various, small docs improvements 2022-04-10 12:33:53 -07:00
Alex Leventer
b80eac42a6 add trailing slash 2022-04-10 12:28:48 -07:00
Alex Leventer
72dcfd4979 Fix broken edit links 2022-04-10 12:27:52 -07:00
medvedev1088
4bfa3e6ba4 Bump version 2022-04-01 21:20:41 +07:00
Evgeny Medvedev
1883a01e3f Merge pull request #293 from bsh98/develop
Adds contract and token support for PostgreSQL when streaming
2022-04-01 21:17:16 +07:00
ninjascant
1883e5cdac Fix/test mocks (#323)
* Fix import in eth service test

* Fix mocks and expected values in export blocks test: set proper tx type values

* Fix mocks and expected values in export receipts tests: set proper effectiveGasPrice values

* Fix mocks and expected values in stream tests
2022-03-13 19:22:08 +08:00
bsh98
8a49edcae3 Merge branch 'develop' into develop 2022-03-05 11:14:03 -08:00
Maxim Razhev
ce2ce23ccd Fix import in test 2022-02-22 18:49:55 +05:00
Maxim Razhev
d1189ad721 Update click version; update package version 2022-02-22 17:47:19 +05:00
medvedev1088
c135afc4bc Bump version 2022-02-11 23:16:06 +08:00
Evgeny Medvedev
65feed595a Merge pull request #313 from blockchain-etl/lib_version_upgrade
Limit python-dateutil major version in case of breaking changes
2022-02-11 22:15:00 +07:00
medvedev1088
e82a86ca7f Limit python-dateutil major version in case of breaking changes 2022-02-11 23:03:12 +08:00
Evgeny Medvedev
ed31940391 Merge pull request #311 from emlazzarin/develop
bump python-dateutil
2022-02-11 21:58:47 +07:00
Eddy Lazzarin
a0689730e4 bumpb python-dateutil 2022-02-10 18:12:38 -08:00
Evgeny Medvedev
0beebb139d Merge pull request #303 from blockchain-etl/fix_tests2
Lock version of libcst to fix build and tests
2022-01-20 19:13:38 +07:00
medvedev1088
5dea830c16 Move kafka dependency to extras 2022-01-20 20:01:13 +08:00
medvedev1088
37d89e9c9d Fix broken build 2022-01-20 19:55:37 +08:00
medvedev1088
baa79e74c9 Add pip install --upgrade pip to travis 2022-01-17 14:50:38 +08:00
blockjoe
db590188d1 Added client-side log filtering for calling to ETH clients that don't support eth_newFilter 2022-01-14 14:09:11 -05:00
medvedev1088
87f5e45d17 Update docs 2022-01-12 17:26:24 +08:00
medvedev1088
b772ec7fd7 Update error message for tracing 2022-01-12 14:25:10 +08:00
medvedev1088
69bb6f9bb3 Update error message for tracing 2022-01-12 14:24:25 +08:00
medvedev1088
2a9e468c1e Bump version 2022-01-07 03:54:24 +08:00
Evgeny Medvedev
be1892dffa Merge pull request #299 from blockchain-etl/poa_support
Add POA support
2022-01-07 02:52:32 +07:00
medvedev1088
31fb4efc48 Add POA support 2022-01-07 03:33:43 +08:00
Evgeny Medvedev
167b38b6bc Merge pull request #271 from numonedad/bugfix/poachain
adds support for non-mainnet in etl stream
2022-01-07 02:15:06 +07:00
Evgeny Medvedev
7d47dd34d6 Merge pull request #295 from blockchain-etl/fix_timeout_travisci
Fix travis ci timeout
2021-12-27 13:32:58 +07:00
medvedev1088
c6fbd10ef3 Fix travis ci timeout 2021-12-27 14:20:02 +08:00
medvedev1088
114cd60b5a Fix travis ci timeout 2021-12-27 14:10:40 +08:00
bsh98
1a0bac2e2c blknum and addr composite pk for contracts, tokens 2021-12-24 11:13:42 -08:00
Evgeny Medvedev
2a17fb67ad Merge pull request #294 from blockchain-etl/add_python39_to_tests
Add python 3.9 to tests
2021-12-24 22:17:54 +07:00
medvedev1088
dba7adf8f1 Add python 3.9 to tests 2021-12-24 18:07:44 +08:00
medvedev1088
75847dd6ba Bump version 2021-12-24 18:00:46 +08:00
medvedev1088
e3b83639c2 Update docs 2021-12-24 17:59:37 +08:00
Evgeny Medvedev
6bb0fffd38 Merge pull request #291 from ayush3298/develop
Added exporter for kafka
2021-12-24 16:55:49 +07:00
bsh98
b62a2f1b30 adds support for python3.6 2021-12-23 20:33:26 -08:00
bsh98
9d9c383ab8 tokens, contracts support for postgresql 2021-12-23 19:14:25 -08:00
bsh98
79ad41aad9 postgres support 2021-12-23 16:19:03 -08:00
Evgeny Medvedev
38c2c1beec Merge pull request #292 from blockchain-etl/fix_tests
Fix tests
2021-12-23 22:37:55 +07:00
medvedev1088
a582f73cd2 Remove Python 3.5 support 2021-12-23 20:55:56 +08:00
deq
257da16c48 Fixed file name typo and used exporters 2021-12-23 18:10:57 +05:30
medvedev1088
1b9c07862c Remove Python 3.5 support 2021-12-23 20:29:27 +08:00
medvedev1088
0667b68cb6 Fix tests 2021-12-23 20:23:34 +08:00
deq
28acabe45e Made kafka generic for output, now it can be in format of kafka/127.0.0.1:9092 2021-12-23 13:13:43 +05:30
deq
f593053af3 Added param helper for kafka 2021-12-22 18:58:58 +05:30
deq
8df7d901ee Resolved Conflicts 2021-12-22 13:02:02 +05:30
medvedev1088
a2b678167b Bump version 2021-12-20 01:45:45 +08:00
Evgeny Medvedev
c4c9207474 Merge pull request #290 from blockchain-etl/feature/pubsub_message_ordering
GCS exporter plus Pub/Sub message ordering
2021-12-20 00:43:36 +07:00
medvedev1088
289b9005a0 Update docs 2021-12-20 01:40:25 +08:00
medvedev1088
eefffb0aa6 Parameterize pubsub item exporter for batch params 2021-12-20 01:23:28 +08:00
medvedev1088
967c1ad37a Allow path in GCS item exporter 2021-12-20 01:22:26 +08:00
medvedev1088
b0408582db Fix output validation in stream command 2021-12-20 01:22:11 +08:00
medvedev1088
8f93376232 Merge branch 'develop' into feature/pubsub_message_ordering
# Conflicts:
#	docs/dockerhub.md
2021-12-20 01:07:22 +08:00
deq
de4380fb89 Added exporter for kafka 2021-12-17 17:45:05 +05:30
medvedev1088
e0ca8f9a8c Merge remote-tracking branch 'origin/develop' into develop 2021-11-26 14:25:26 +08:00
medvedev1088
589cb06ef0 Add note about states to docs 2021-11-26 14:25:21 +08:00
medvedev1088
54d9220130 Bump version 2021-11-13 00:49:25 +08:00
Evgeny Medvedev
c2f24c6d18 Merge pull request #283 from kunalmodi/export_contracts_param
Export Contracts: Fix cli args
2021-11-13 00:48:08 +08:00
Kunal Modi
fedf6e60a4 Export Contracts: Fix cli args 2021-11-12 07:03:24 -08:00
medvedev1088
629aed5bc8 Update link to Travis CI 2021-09-27 02:53:51 +08:00
Evgeny Medvedev
25fc768f39 Merge pull request #269 from psych0xpomp/eip1559_columns
Add EIP-1559 columns
2021-08-15 20:54:30 +07:00
Drew Wells
42b96bcf7b adds support for non-mainnet in etl stream
relates #178
2021-08-12 19:33:05 -05:00
psych0xpomp
cf80415fcf Add EIP-1559 columns
Enable streaming of EIP-1559 related columns to blocks and transactions tables.
2021-08-09 16:22:06 +10:00
medvedev1088
104576d5eb Bump version 2021-08-08 15:08:41 +07:00
Evgeny Medvedev
135a475d46 Merge pull request #268 from blockchain-etl/change_log_level_in_export_tokens
Change log level to debug in eth_token_service.py
2021-08-08 15:07:19 +07:00
medvedev1088
90afaabce6 Suppress warning Symbolic Execution not available: No module named 'mythril' 2021-08-08 14:40:42 +07:00
medvedev1088
55a9371b2b Change log level to debug in eth_token_service.py 2021-08-08 14:17:00 +07:00
medvedev1088
1a8ac0630f Bump version 2021-08-04 23:01:08 +07:00
Evgeny Medvedev
3d79a22370 Merge pull request #266 from blockchain-etl/fix_utf8_decoding_of_token_data
Fix UnicodeDecodeError thrown when token returns undecodeable symbol
2021-08-04 23:00:07 +07:00
medvedev1088
d2b84bd643 Fix UnicodeDecodeError thrown when token returns undecodeable symbol or name 2021-08-04 22:48:30 +07:00
medvedev1088
1a212405ed Bump version 2021-08-02 22:39:14 +07:00
medvedev1088
a808330950 Fix receipt_effective_gas_price in streaming 2021-08-02 22:37:58 +07:00
medvedev1088
9ff51f993c Bump version 2021-08-02 18:48:56 +07:00
Evgeny Medvedev
f2f88e64c5 Merge pull request #263 from blockchain-etl/eip1559-fields
EIP-1559 fields
2021-08-02 18:10:41 +07:00
medvedev1088
7ee3497431 Fix tests 2021-08-02 16:34:14 +07:00
medvedev1088
170e7979fe Add option to convert values to strings in JSON output for extact_token_transfers and extract_tokens command. Needed for ethereum-etl-airflow 2021-08-02 16:14:37 +07:00
medvedev1088
5dd95554ef Merge branch 'develop' into eip1559-fields
# Conflicts:
#	setup.py
2021-08-01 22:19:13 +07:00
medvedev1088
45c3baffe6 Refactor 2021-07-30 00:54:58 +07:00
medvedev1088
86bb20e9d1 Fix tests 2021-07-29 23:46:53 +07:00
medvedev1088
8aa076bfb7 Bump version 2021-07-29 23:17:45 +07:00
medvedev1088
d9378e7d17 Add bytes32 support for symbol and name in ERC20 tokens 2021-07-29 23:17:01 +07:00
medvedev1088
55332cde00 Bump python-dateutil version 2021-07-29 22:35:53 +07:00
medvedev1088
eaf6a8f9b6 Bump version 2021-07-25 16:04:59 +07:00
Evgeny Medvedev
040849c66b Merge pull request #261 from blockchain-etl/fix_dependencies
Fix dependencies
2021-07-25 16:03:47 +07:00
medvedev1088
c2a878e175 Update link to travis ci 2021-07-25 14:36:13 +07:00
medvedev1088
083cbd6891 Trigger build 2021-07-25 14:32:36 +07:00
medvedev1088
c7ffffa5a8 Fix eth-utils version, bump click to 7.1.2 2021-07-25 14:20:54 +07:00
medvedev1088
240982bac1 Fix slow tests 2021-07-25 14:20:28 +07:00
Evgeny Medvedev
53fa461001 Merge pull request #256 from ninjascant/feature/eip1559-fields
EIP1559 fields
2021-07-19 19:37:59 +07:00
Maxim Razhev
efeeb297df Add missing transaction_type field to streaming tx enrichment 2021-07-05 01:12:31 +05:00
Maxim Razhev
1e00335b71 Add new fields to streamer tx enrichment 2021-07-01 17:00:17 +05:00
Maxim Razhev
e70698e8b5 Fix tx fee per gas fields type 2021-07-01 12:59:30 +05:00
Maxim Razhev
5f41b1ef15 Add effective_gas_price field to receipts 2021-07-01 12:58:55 +05:00
Maxim Razhev
926c0afad1 Fix schema 2021-07-01 11:28:34 +05:00
Maxim Razhev
47049e0697 Fix field names 2021-07-01 10:56:51 +05:00
Maxim Razhev
1bacd89423 Fix field name case 2021-06-30 16:17:31 +05:00
Maxim Razhev
686107b313 Fix new field export/ 2021-06-28 17:03:23 +05:00
Maxim Razhev
4dba6a1e8c Add baseFeePerGas for block export 2021-06-24 17:44:58 +05:00
medvedev1088
ecc4484034 Enable message ordering if topic name contains sorted 2021-06-22 18:58:26 +07:00
medvedev1088
b568101c9c Bump version 2021-06-08 16:53:14 +07:00
Evgeny Medvedev
d25bd078f3 Merge pull request #251 from blockchain-etl/fix_typo_postgres_exporter
Fix typo in item_exporter_creator.py
2021-06-08 16:52:08 +07:00
medvedev1088
cb5dcac8c0 Fix typo in item_exporter_creator.py 2021-06-08 16:49:52 +07:00
medvedev1088
e79c32e422 Merge remote-tracking branch 'origin/develop' into develop 2021-06-04 19:24:44 +07:00
medvedev1088
479d8ece72 Add link to Public datasets in BigQuery 2021-06-04 19:24:36 +07:00
Evgeny Medvedev
b4a385e915 Merge pull request #249 from a6b8/develop
Update README.md
2021-05-29 02:33:56 +07:00
Andreas Banholzer
0e11db80f0 Update README.md 2021-05-28 19:53:10 +02:00
medvedev1088
dbb7248206 Simplify README 2021-05-15 21:40:42 +07:00
medvedev1088
de2a9ed5aa Update link in readme 2021-04-28 22:02:48 +07:00
medvedev1088
b3fab3c089 Bump version 2021-03-05 22:07:51 +07:00
Evgeny Medvedev
895bf818a2 Merge pull request #238 from blockchain-etl/erc20_functions_coverage
Add NAME, SYMBOL, DECIMALS to erc20_abi.py and eth_token_service.py
2021-03-05 22:05:59 +07:00
medvedev1088
d83fcd4307 Add NAME, SYMBOL, DECIMALS to erc20_abi.py and eth_token_service.py 2021-03-05 21:50:30 +07:00
Evgeny Medvedev
d7283ba301 Merge pull request #233 from blockchain-etl/bug/converters_module_broken
Fix broken converters module
2021-01-09 19:55:39 +07:00
medvedev1088
111633874a Bump version 2021-01-09 19:39:04 +07:00
medvedev1088
3c6291a873 Fix missing converters module 2021-01-09 19:38:16 +07:00
medvedev1088
48f11fc9e1 Add export to GCS and message ordering in pubsub 2021-01-09 19:34:53 +07:00
medvedev1088
511b60ecfa Enable message ordering for pubsub exporter 2020-12-06 19:42:29 +07:00
medvedev1088
fcf576f6bc Add link to Ethereum 2.0 ETL to README 2020-10-26 19:13:03 +07:00
medvedev1088
15b0f683b9 Add Programming Language Python 3.8 to setup.py 2020-10-09 17:17:42 +07:00
medvedev1088
742e78b7f7 Bump version 2020-10-09 17:14:57 +07:00
Evgeny Medvedev
68f6bec10b Merge pull request #225 from blockchain-etl/feature/python38
Add py38 in setup.py and tests
2020-10-09 17:13:31 +07:00
medvedev1088
04b179aadf Add py38 setup.py and tests 2020-10-09 17:02:00 +07:00
medvedev1088
8d159a58c0 Update docs 2020-10-03 18:27:13 +07:00
medvedev1088
10087aecbb Remove latest tag from dockerhub workflow 2020-08-21 20:42:25 +07:00
medvedev1088
e340074ce6 Bump the version 2020-08-21 20:05:04 +07:00
Evgeny Medvedev
a74f53f351 Merge pull request #222 from blockchain-etl/bug/tokens_param_recognizes_single_value
Fix --tokens in export_token_transfers.py recognizes only 1 parameter
2020-08-21 20:03:08 +07:00
medvedev1088
e61248e798 Fix --tokens in export_token_transfers.py recognizes only 1 parameter 2020-08-21 19:43:19 +07:00
medvedev1088
e78a856438 Update Infura id 2020-08-14 19:35:39 +07:00
medvedev1088
40b98215b6 Update citing.md 2020-07-22 15:01:57 +07:00
medvedev1088
c19bdf053f Fix extra comma in citing.md 2020-07-22 00:10:13 +07:00
medvedev1088
8ccb6dfe77 Merge remote-tracking branch 'origin/develop' into develop 2020-07-22 00:08:40 +07:00
medvedev1088
4ce02de2e0 Add Citing section to the docs 2020-07-22 00:08:28 +07:00
Evgeny Medvedev
56d232781a Merge pull request #214 from franckc/originprotocol
Add support for extracting Origin Protocol data
2020-06-14 19:54:21 +07:00
Franck Chastagnol
c5a67b0fd4 Use null rather than empty string as default for shop product fields 2020-06-08 09:39:20 -07:00
Franck Chastagnol
2498bf5560 Fix unit tests 2020-06-07 23:21:07 -07:00
Franck Chastagnol
4c0a06fc36 Minor fixes 2020-06-07 22:45:31 -07:00
Franck Chastagnol
101f0dbd67 Add dependency on requests package 2020-06-07 22:09:32 -07:00
Franck Chastagnol
bc40a13ec6 Merge branch 'develop' into originprotocol 2020-06-07 22:01:42 -07:00
Franck Chastagnol
1bca49b31f Clean up, Add unit tests 2020-06-07 21:55:07 -07:00
Evgeny Medvedev
8df8407137 Merge pull request #217 from blockchain-etl/fix/update-nansen-url
Updated Nansen link in docs
2020-05-25 23:40:51 +07:00
askeluv
4958c1e264 Updated Nansen link in docs 2020-05-25 18:00:01 +02:00
medvedev1088
60f5340754 Add a link to Ivan on Tech video 2020-05-23 16:59:00 +07:00
Franck Chastagnol
c84a6d1195 Extract origin protocol data 2020-05-15 19:11:52 -07:00
Evgeny Medvedev
04bc4a888b Merge pull request #212 from blockchain-etl/fix/update-project-links
Updated Nansen link + added projects to README.md
2020-05-05 21:01:24 +07:00
askeluv
84886c7f48 Updated Nansen link + added projects to README.md 2020-05-05 15:44:44 +02:00
Evgeny Medvedev
c1e5691d1d Merge pull request #210 from blockchain-etl/feature/publish_to_dockerhub_workflow
Add publish-to-dockerhub.yml
2020-04-21 21:55:43 +07:00
medvedev1088
16dfcb24ed Fix organization name in publish-to-dockerhub.yml 2020-04-16 23:58:41 +07:00
medvedev1088
8164ee105d Add publish-to-dockerhub.yml 2020-04-16 23:49:12 +07:00
medvedev1088
ac866f6459 Update README 2020-04-16 23:25:30 +07:00
medvedev1088
90c4982a6b Add Infura project id to commands in docs 2020-04-16 23:23:17 +07:00
medvedev1088
ae131baa0e Update docs 2020-04-16 23:09:32 +07:00
medvedev1088
cb3ee69123 Bump the version 2020-04-16 23:06:04 +07:00
Evgeny Medvedev
81374dea00 Merge pull request #209 from blockchain-etl/feature/publish_to_pypi_workflow
Feature/publish to pypi workflow
2020-04-16 23:04:45 +07:00
medvedev1088
71364a4fea Run pypi workflow only on tags push 2020-04-16 23:04:26 +07:00
medvedev1088
d612ba40b8 Add publishing to pypi 2020-04-16 23:00:57 +07:00
medvedev1088
e853d4fd19 Fix file formatting 2020-04-16 22:47:01 +07:00
medvedev1088
82045cc21c Add publish to PyPi workflow 2020-04-16 22:44:41 +07:00
Evgeny Medvedev
eeabd57b98 Merge pull request #208 from obsh/develop
Add block_timestamp attribute to exported PubSub message
2020-04-16 21:59:13 +07:00
oleksandr.bushkovskyi
dec070e812 Update streaming tests to take into account item_timestamp attribute 2020-04-16 00:31:18 +03:00
oleksandr.bushkovskyi
156b603cb0 Add item_timestamp attribute in RFC 3339 format to exported PubSub messages 2020-04-15 23:48:23 +03:00
oleksandr.bushkovskyi
141c82005a Add block_timestamp attribute to exported PubSub message 2020-04-09 00:20:09 +03:00
medvedev1088
e0636bbb31 Bump version 2020-04-04 00:06:08 +07:00
Evgeny Medvedev
b65f37af7b Merge pull request #207 from blockchain-etl/feature/update_dependency_versions
Update eth-utils and eth-abi versions
2020-04-04 00:00:58 +07:00
medvedev1088
1a6c417ab0 Update eth-utils and eth-abi versions 2020-04-03 23:58:00 +07:00
medvedev1088
5a09102eb2 Update schema.md 2020-03-17 23:55:44 +07:00
medvedev1088
f05ce47b95 Update commands.md 2020-03-13 22:31:08 +07:00
Evgeny Medvedev
51927defc7 Merge pull request #203 from blockchain-etl/feature/postgres
Postgres support for stream command
2020-03-13 22:13:42 +07:00
medvedev1088
4b92d7b670 Bump the version 2020-03-13 21:42:31 +07:00
medvedev1088
0c4342fe11 Add trace_id to traces postgres table 2020-03-13 21:37:45 +07:00
medvedev1088
41f20435a2 Update docs 2020-03-09 20:37:38 +07:00
medvedev1088
676dfb22c5 Update commands.md 2020-03-09 17:00:00 +07:00
medvedev1088
7cf0f34785 Validate entity types for streaming 2020-03-09 16:59:49 +07:00
medvedev1088
d46528ba24 Rename ListFieldItemConverter 2020-03-09 14:55:54 +07:00
medvedev1088
87e6b57024 Add transactions, logs, traces support for postgres exporter 2020-03-06 16:42:10 +07:00
medvedev1088
70db781856 Add BLOCKS table schema 2020-03-05 22:31:21 +07:00
medvedev1088
f5836345cd Remove Dockerfile_with_streaming from README 2020-03-05 22:15:38 +07:00
medvedev1088
d7ac8fb758 Add PostgresItemExporter 2020-03-05 21:50:55 +07:00
medvedev1088
ded7a6a007 Merge branch 'develop' into feature/postgres 2020-03-05 19:10:44 +07:00
medvedev1088
dd8d2bdc38 Update links to docs 2020-03-05 13:10:01 +07:00
medvedev1088
093fe56dde Add postgres dependencies 2020-03-05 13:07:58 +07:00
Evgeny Medvedev
68fce399a8 Merge pull request #202 from blockchain-etl/feature/add-docs
Add docs folder using mkdocs
2020-03-05 12:52:45 +07:00
medvedev1088
438b911b0f Remove Tests and Running in Docker from Useful Links 2020-03-04 19:59:41 +07:00
medvedev1088
dae8deff36 Add Documentation to Useful Links 2020-03-04 19:58:12 +07:00
medvedev1088
cb84071680 Add link to Awesome BigQuery Views 2020-03-04 19:56:39 +07:00
medvedev1088
d882c64671 Add more Media links 2020-03-04 19:53:16 +07:00
medvedev1088
477eb35a39 Merge branch 'develop' into feature/add-docs
# Conflicts:
#	README.md
2020-03-04 19:47:09 +07:00
medvedev1088
ab7fd89774 Update README 2020-03-04 19:45:39 +07:00
medvedev1088
064353a993 Move Running Tests and Running in Docker to README as they depend on repo contents 2020-03-04 19:42:32 +07:00
medvedev1088
94b7ce8a4c Add Useful Links to README 2020-03-04 19:34:00 +07:00
askeluv
c8e4c840d5 Minor tweaks to documentation 2020-02-26 16:52:11 +01:00
askeluv
136ed3232a Fixed internal linking 2020-02-26 16:30:44 +01:00
askeluv
cf8c6edfb7 Testing different way to do internal linking 2020-02-26 16:25:24 +01:00
askeluv
e90e70e94f Fix broken link to /commands 2020-02-26 16:22:57 +01:00
askeluv
64614c2670 Fixed broken link 2020-02-25 17:06:13 +01:00
askeluv
1ffb592771 Minor tweaks; brought back README.md quickstart 2020-02-25 17:03:04 +01:00
askeluv
030c460f36 Fixed links 2020-02-25 16:11:51 +01:00
askeluv
92db79b8a7 Moved documentation from README.md into mkdocs docs/ folder 2020-02-25 15:48:07 +01:00
medvedev1088
2d37486970 Add link to ConsenSys Grants announcement 2020-02-23 18:00:48 +07:00
Evgeny Medvedev
499596ad3e Merge pull request #198 from blockchain-etl/feature/remove_dockerfile_with_streaming
Remove Dockerfile_with_streaming to avoid confusion
2020-02-14 14:26:08 +07:00
medvedev1088
106de42844 Remove -streaming suffix from docker tags 2020-02-14 14:25:48 +07:00
medvedev1088
aa106467b8 Remove Dockerfile_with_streaming to avoid confusion 2020-02-14 14:21:00 +07:00
Evgeny Medvedev
e53dbe13f9 Merge pull request #197 from blockchain-etl/feature/show_defaults_in_help
Show default values for click commands when using --help
2020-02-09 17:42:00 +07:00
medvedev1088
38752a557a Show default values for click commands when using --help 2020-02-09 17:36:28 +07:00
medvedev1088
e8b6fe742e Update FUNDING.yaml 2020-01-09 16:22:57 +07:00
medvedev1088
d1e2f83071 Update README 2019-12-16 11:04:57 +07:00
Evgeny Medvedev
69c64e048e Update README.md 2019-12-15 00:03:35 +07:00
medvedev1088
1d4aa94d81 Add link to Discord 2019-12-14 23:52:32 +07:00
Evgeny Medvedev
2b23e08a64 Merge pull request #189 from blockchain-etl/feature/log_token_address_on_exception
Add function name and contract address in log message
2019-10-23 19:05:37 +07:00
medvedev1088
7434d149bb Add function name and contract address in log message when function call failed. Related to https://github.com/blockchain-etl/ethereum-etl/issues/159#issuecomment-526910436 2019-09-01 20:01:01 +07:00
Evgeny Medvedev
eab288d507 Add link to Kaggle dataset 2019-08-09 19:42:56 +07:00
Evgeny Medvedev
091c7edd60 Add link to Snowflake tutorial 2019-08-09 19:35:43 +07:00
Evgeny Medvedev
0373f48956 Update link to useful queries 2019-07-27 20:34:33 +07:00
Evgeny Medvedev
32eae84170 Add link to useful queries 2019-07-27 20:24:27 +07:00
Evgeny Medvedev
359fe17ac3 Merge pull request #176 from blockchain-etl/funding
Create FUNDING.yml
2019-07-25 19:24:12 +07:00
Evgeny Medvedev
19daa86e52 Merge pull request #185 from blockchain-etl/develop
develop to master
2019-07-25 19:22:36 +07:00
Evgeny Medvedev
e428bead6d Update FUNDING.yml 2019-07-25 19:21:38 +07:00
Evgeny Medvedev
ee5de4b465 Merge branch 'develop' into funding 2019-07-25 19:20:17 +07:00
Evgeny Medvedev
ee8c68d215 Merge branch 'master' into develop 2019-07-25 19:18:42 +07:00
Evgeny Medvedev
76cdec4a5c Merge branch 'develop' into funding 2019-07-25 19:14:56 +07:00
Evgeny Medvedev
7d9892de85 Merge pull request #184 from blockchain-etl/feature/item_id
Add possibility to specify multiple provider uris for streaming
2019-07-25 19:09:20 +07:00
Evgeny Medvedev
faffca21ef Create FUNDING.yml 2019-06-13 16:41:18 +07:00
Evgeny Medvedev
a74ab02563 Merge pull request #170 from blockchain-etl/bug/153
Fix https://github.com/blockchain-etl/ethereum-etl/issues/153
2019-06-05 19:02:29 +07:00
Evgeny Medvedev
8daa06d007 Rename constant in batch_work_executor.py 2019-06-05 19:01:56 +07:00
Evgeny Medvedev
2ab3b7e9bf Fix https://github.com/blockchain-etl/ethereum-etl/issues/153 2019-06-05 18:57:26 +07:00
Evgeny Medvedev
3234f64c45 Add possibility to specify multiple provider uris for streaming 2019-05-10 19:18:37 +07:00
Evgeny Medvedev
437718083e Merge pull request #172 from blockchain-etl/feature/item_id
Add item_id to stream output
2019-05-08 20:20:01 +07:00
Evgeny Medvedev
0f28aee915 Bump version 2019-05-08 20:14:48 +07:00
Evgeny Medvedev
5e311b87da Fix google_pubsub_item_exporter.py 2019-05-08 20:12:16 +07:00
Evgeny Medvedev
fdea8ca36e Add item_id to streamer messages 2019-05-07 21:05:08 +07:00
Evgeny Medvedev
ca8cd55223 Fix https://github.com/blockchain-etl/ethereum-etl/issues/153 2019-04-25 21:06:23 +07:00
Evgeny Medvedev
f4586b1501 Update README 2019-04-22 22:14:07 +07:00
Evgeny Medvedev
f49b46363e Update README 2019-04-22 22:11:37 +07:00
Evgeny Medvedev
40d4cf374c Fix variable name 2019-04-19 20:09:56 +07:00
Evgeny Medvedev
031c5acedf Update README 2019-04-17 18:47:18 +07:00
Evgeny Medvedev
f4718a6cb9 Added link to D5 2019-04-17 18:36:35 +07:00
Evgeny Medvedev
f35b4ecde4 Update README 2019-04-16 01:12:15 +07:00
Evgeny Medvedev
8257c4bde5 Update README 2019-04-16 00:57:48 +07:00
Evgeny Medvedev
8b21e34250 Update README 2019-04-16 00:29:04 +07:00
Evgeny Medvedev
e8ea43067a Update README 2019-04-16 00:22:35 +07:00
Evgeny Medvedev
e695c55704 Merge pull request #160 from blockchain-etl/feature/streaming
Feature/streaming
2019-04-15 20:18:02 +07:00
Evgeny Medvedev
5c941a403e Bump version 2019-04-15 20:10:57 +07:00
Evgeny Medvedev
67b9ef1728 Refactor dockerhub.md 2019-04-15 20:10:44 +07:00
Evgeny Medvedev
3d5c5a3c73 Update README 2019-04-15 20:10:32 +07:00
Evgeny Medvedev
fa81a41ae5 Refactoring 2019-04-15 19:02:30 +07:00
Evgeny Medvedev
fcd963ced6 Update README 2019-04-15 18:38:34 +07:00
Evgeny Medvedev
e69148ca9e Update README 2019-04-15 18:08:45 +07:00
Evgeny Medvedev
143f59018f Merge branch 'develop' into feature/streaming
# Conflicts:
#	tests/ethereumetl/job/mock_web3_provider.py
2019-04-13 21:57:11 +07:00
Evgeny Medvedev
b46717bf2b Revert changing test file names 2019-04-13 21:56:53 +07:00
Evgeny Medvedev
66971c82e8 Revert using traceFilter https://github.com/blockchain-etl/ethereum-etl/pull/164#issuecomment-482814833 2019-04-13 21:55:48 +07:00
Evgeny Medvedev
040a42dba5 Change block enrichment in eth_streamer_adapter.py 2019-04-13 21:18:48 +07:00
Evgeny Medvedev
2e0b59553c Fix test file names 2019-04-13 21:15:41 +07:00
Evgeny Medvedev
26bcb6c9d8 Merge branch 'develop' into feature/streaming
# Conflicts:
#	tests/ethereumetl/job/mock_web3_provider.py
2019-04-13 21:10:14 +07:00
Evgeny Medvedev
e82618d1c2 Change default value for --batch-size in export_traces.py 2019-04-13 21:09:28 +07:00
Evgeny Medvedev
e6c055c3fa Merge pull request #164 from t2y/use-trace-filter
Use traceFilter instead of traceBlock
2019-04-13 21:07:08 +07:00
Evgeny Medvedev
925471b064 Change default value for block_timestamp in transaction_mapper.py 2019-04-13 21:02:52 +07:00
Evgeny Medvedev
af72640c37 Merge pull request #163 from t2y/add-block-timestamp-to-transaction
Add block timestamp to transactions.csv
2019-04-13 20:55:29 +07:00
Tetsuya Morimoto
a44637f430 change block_timestamp column position to last column to minimize breaking compatibility 2019-04-13 19:14:48 +09:00
Tetsuya Morimoto
a446b55453 add block timestamp to transactions.csv 2019-04-12 20:48:15 +09:00
Evgeny Medvedev
9072abf55d Fix filename capitalization 2 2019-04-09 22:35:18 +07:00
Evgeny Medvedev
c6118be5a5 Fix filename capitalization 1 2019-04-09 22:33:18 +07:00
Evgeny Medvedev
4ed17d4980 Refactor mock file naming 2019-04-09 22:25:22 +07:00
Evgeny Medvedev
1bf2553aed Fix tests 2019-04-09 21:58:18 +07:00
Evgeny Medvedev
04b34c5dd5 Add link to stackoverflow question 2019-04-09 19:13:00 +07:00
Evgeny Medvedev
9614aeba7f Fix exception when only log specified for -e option 2019-04-09 14:55:34 +07:00
Tetsuya Morimoto
eba4e4e58e applied a reverse patch from 0b3f4d6 since it seems paritytech/parity-ethereum/issues/9822 was fixed 2019-04-09 08:59:54 +09:00
Evgeny Medvedev
c5d155b617 Fix trace status calculation 2019-04-07 21:58:18 +07:00
Evgeny Medvedev
418b7a83d3 Fix timeout error handling 2019-04-07 12:04:46 +07:00
Evgeny Medvedev
4fccd2c181 Fix trace status 2019-04-07 12:04:25 +07:00
Evgeny Medvedev
f07752907a Add extract_tokens command 2019-04-06 20:32:26 +07:00
Evgeny Medvedev
140af3e649 Fix csv max field size in extract_contracts 2019-04-06 15:09:19 +07:00
Evgeny Medvedev
c9fa2a1873 Add pid file to streamer 2019-04-05 14:20:32 +07:00
Evgeny Medvedev
7214d771b9 Increase timeout 2019-04-02 18:08:30 +07:00
Evgeny Medvedev
a2a48f9642 Fix timeout in pubsub exporter 2019-04-02 17:58:08 +07:00
Evgeny Medvedev
ad8fda002e Merge branch 'develop' into feature/streaming 2019-04-02 14:02:55 +07:00
Evgeny Medvedev
99803a772e Disable slow tests in tox 2019-04-01 17:31:29 +07:00
Evgeny Medvedev
1defa289e5 Use comma-separated list for --entity-types option 2019-04-01 14:08:06 +07:00
Evgeny Medvedev
7f725182aa Merge pull request #161 from SteveVitali/patch-1
Update export_all.sh with ethereumetl commands
2019-04-01 14:06:40 +07:00
Steven Vitali
7afe6093b0 Update export_all.sh with ethereumetl commands 2019-04-01 02:58:04 -04:00
Evgeny Medvedev
4465222622 Refactor blockchainetl package 2019-03-30 15:20:26 +07:00
Evgeny Medvedev
2f8d901829 Refactor streamer 2019-03-30 15:12:34 +07:00
Evgeny Medvedev
e27b5c28fd Fix the tests 2019-03-28 17:35:01 +07:00
Evgeny Medvedev
47bd5957d4 Fix tests 2019-03-28 01:12:52 +07:00
Evgeny Medvedev
edc3211544 Fix extract_contracts job 2019-03-28 01:07:45 +07:00
Evgeny Medvedev
a9ee19f871 Update README 2019-03-27 23:33:35 +07:00
Evgeny Medvedev
c5ea25a200 Add timeout for sync cycle 2019-03-27 23:20:15 +07:00
Evgeny Medvedev
81033022b9 Update pubsub exporter 2019-03-27 22:21:28 +07:00
Evgeny Medvedev
ac60502f72 Configure logging 2019-03-27 22:07:16 +07:00
Evgeny Medvedev
9dfff1261d Add extract_contracts command 2019-03-27 21:23:21 +07:00
Evgeny Medvedev
69cc8a70c0 Add trace status calculation 2019-03-27 21:11:54 +07:00
Evgeny Medvedev
ba60c906f5 Add tests for streaming traces 2019-03-27 16:00:28 +07:00
Evgeny Medvedev
751f9b57ac Add entity types 2019-03-27 13:36:46 +07:00
Evgeny Medvedev
a9672ac9c1 Refactor Streamer 2019-03-26 22:09:05 +07:00
Evgeny Medvedev
ea6d0e87da Add streaming tests 2019-03-26 18:05:48 +07:00
Evgeny Medvedev
22e6795789 Remove unused file 2019-03-26 14:33:37 +07:00
Evgeny Medvedev
302fbc9947 Update dependencies versions 2019-03-26 14:30:46 +07:00
Evgeny Medvedev
3483d77aa4 Merge branch 'develop' into feature/streaming 2019-03-26 13:48:21 +07:00
Evgeny Medvedev
871af57840 Update README 2019-03-22 00:35:51 +07:00
Evgeny Medvedev
c76d25bf3f Update README 2019-03-12 21:34:35 +07:00
Evgeny Medvedev
2c3ece7010 Merge pull request #158 from blockchain-etl/develop
Updates to README, fix dependencies conflict, add timeout to export_traces, refactor file utils
2019-03-03 14:54:26 +07:00
Evgeny Medvedev
930efe5a0e Bump version 2019-03-03 14:48:23 +07:00
Evgeny Medvedev
aac00bf7d0 Add documentation to commands 2019-03-03 14:48:10 +07:00
Evgeny Medvedev
6f19ff0756 Merge pull request #157 from blockchain-etl/feature/remove-legacy-scripts
Remove legacy files
2019-03-03 14:38:20 +07:00
Evgeny Medvedev
f18f303fa9 Merge pull request #156 from tpmccallum/patch-3
Update setup.py
2019-03-03 14:37:25 +07:00
Evgeny Medvedev
b5e290e2c1 Remove legacy files 2019-03-03 14:34:34 +07:00
Timothy McCallum
a10fb2fac9 Update eth utils in setup.py
Updated from eth-utils>=1.2.0 to eth-utils==1.3.0

Also ran all of the installation and tests again and everything passed!
2019-03-03 17:34:23 +10:00
Evgeny Medvedev
83a7b5383f Merge pull request #155 from tpmccallum/patch-2
Changing python and pip -> python3 and pip3
2019-03-03 14:12:11 +07:00
Timothy McCallum
978513efc0 Update setup.py
I was getting this error
```
eth-keys 0.2.1 has requirement eth-utils<2.0.0,>=1.3.0, but you'll have eth-utils 1.2.0 which is incompatible.
```
It relates to the following issue
https://github.com/blockchain-etl/ethereum-etl/issues/141
which has the following fix
bde116ad06
I just tested it and also created this PR which you can now merge.
2019-03-03 13:48:24 +10:00
Timothy McCallum
65f5de1df1 Changing python and pip -> python3 and pip3 2019-03-03 13:41:49 +10:00
Evgeny Medvedev
df10702486 Add links to articles 2019-02-27 20:00:06 +07:00
Evgeny Medvedev
a288b51b73 Merge pull request #152 from blockchain-etl/feature/export_traces_timeout_option
Add timeout option to export_traces
2019-02-27 19:55:51 +07:00
Evgeny Medvedev
a6337d0817 Add timeout option to export_traces 2019-02-21 18:29:13 +07:00
Evgeny Medvedev
d63713ece1 Update Docker image tag for streaming 2019-02-18 17:45:59 +07:00
Evgeny Medvedev
ed2466d16d Update Docker image tag for streaming 2019-02-18 17:45:09 +07:00
Evgeny Medvedev
aab657da9b Add comments 2019-02-15 21:55:04 +07:00
Evgeny Medvedev
79b9a46bae Remove unused file 2019-02-15 18:03:28 +07:00
Evgeny Medvedev
cac7305f53 Refactor streaming 2019-02-15 17:10:06 +07:00
Evgeny Medvedev
80cd37bdde Remove requirements.txt 2019-02-15 16:15:42 +07:00
Evgeny Medvedev
ff4218c0b8 Merge branch 'develop' into feature/streaming 2019-02-15 16:15:15 +07:00
Evgeny Medvedev
f50cc7253b Merge branch 'master' into feature/streaming
# Conflicts:
#	.dockerignore
#	Dockerfile
#	requirements.txt
2019-02-15 16:11:22 +07:00
Evgeny Medvedev
4fc495342b Update LICENSE 2019-02-04 15:44:15 +07:00
Evgeny Medvedev
b0a5e02dd5 Update README 2019-02-04 15:25:52 +07:00
Evgeny Medvedev
f7af95d6c7 Update the version 2019-02-04 15:25:30 +07:00
Evgeny Medvedev
706eb8a9c9 Refactor misc utils 2019-01-30 16:04:23 +07:00
medvedev1088
e30e58f032 Merge pull request #144 from blockchain-etl/develop
Ethereum Classic Support, Python 3.5 support, Bug fixes
2019-01-19 01:00:53 +07:00
Evgeny Medvedev
3b866f4f32 Provide default value for chain argument for cli command callbacks 2019-01-17 17:48:20 +07:00
Evgeny Medvedev
d437f58eb9 Add exception logging to EthTokenService 2019-01-17 17:46:56 +07:00
medvedev1088
ecea237187 Merge pull request #115 from blockchain-etl/bug/value-error-export-tokens
Fix ValueError when exporting contracts
2019-01-17 13:52:35 +07:00
Evgeny Medvedev
aa1a0ee32a Update README 2018-12-13 16:29:59 +07:00
Evgeny Medvedev
4c3d67d442 Update README 2018-12-13 16:29:06 +07:00
Evgeny Medvedev
061f131919 Retry requests when node is not synced 2018-11-18 23:43:23 +07:00
Evgeny Medvedev
1e793f3d48 Add Ethereum Classic Support to table of contents 2018-11-18 23:05:45 +07:00
Evgeny Medvedev
3876957917 Update README 2018-11-18 23:03:37 +07:00
Evgeny Medvedev
76879e593d Fix typo 2018-11-18 23:01:34 +07:00
medvedev1088
f9b353d803 Merge pull request #131 from YazzyYaz/develop
Add chain command line argument for EVM Agnostic
2018-11-18 23:00:31 +07:00
Yaz Khoury
fb2c7fb149 feat: Default classic chain to https://ethereumclassic.network if not specified 2018-11-16 14:48:26 -05:00
Yaz Khoury
21808fb1c8 doc: Update README with classic info 2018-11-16 14:05:49 -05:00
Yaz Khoury
a4a15cb534 refactor: Move classic infura check function into utils.py 2018-11-16 14:01:50 -05:00
Yaz Khoury
04aa34dca4 feat: Add cli arg for chain network type 2018-11-16 12:33:24 -05:00
Evgeny Medvedev
5c98d95a5a Bump version and update README 2018-11-16 01:51:36 +07:00
medvedev1088
49faafa3e0 Merge pull request #127 from evgeniuz/develop
Python 3.5/3.7 compatibility and tox integration
2018-11-16 00:52:17 +07:00
medvedev1088
eb69307ddb Merge pull request #128 from blockchain-etl/develop
Traces
2018-11-14 13:10:12 +07:00
Evgeny Medvedev
c8202d9533 Change export_all command description 2018-11-14 13:05:18 +07:00
Evgeniy Filatov
01c1792ca5 replaced f-strings with str.format 2018-11-12 16:12:06 +02:00
Evgeniy Filatov
32e7f593be python 3.7 is not available on trusty 2018-11-12 15:53:12 +02:00
Evgeniy Filatov
538d841906 fixed typo 2018-11-12 15:49:37 +02:00
Evgeniy Filatov
3050f50893 updated travis to use tox 2018-11-12 15:47:32 +02:00
Evgeniy Filatov
49c6f042d7 added comment about minimum python version 2018-11-12 15:30:43 +02:00
Evgeniy Filatov
320f592e51 added tox tests, fixed some incompatibilities with python 3.5 2018-11-12 15:25:45 +02:00
medvedev1088
c0c8fd5845 Merge pull request #126 from blockchain-etl/feature/genesis-allocations
Feature/genesis allocations
2018-11-10 23:30:31 +07:00
Evgeny Medvedev
7b9276c5a2 Add licence header 2018-11-10 23:26:01 +07:00
Evgeny Medvedev
e5e15b262d Added daofork traces 2018-11-10 23:15:25 +07:00
Evgeny Medvedev
4092ce92b9 Add genesis traces support 2018-11-10 21:37:13 +07:00
Evgeny Medvedev
819f26e09e Print deprecation warning to stderr 2018-11-09 17:02:38 +07:00
medvedev1088
b500542437 Merge pull request #125 from blockchain-etl/cli-changes
Prepare for pip
2018-11-09 16:33:08 +07:00
Evgeny Medvedev
652193a2f2 Update Dockerfile 2018-11-09 16:23:10 +07:00
Evgeny Medvedev
01d7ece2f0 Add __main__ and change project description 2018-11-09 01:54:01 +07:00
Evgeny Medvedev
d93fcbcbf7 Refactor cli names 2018-11-08 00:55:45 +07:00
Evgeny Medvedev
b94c5ff9e2 Add documentation 2018-11-08 00:47:44 +07:00
Evgeny Medvedev
379dfce791 Change Telegram link 2018-11-07 21:15:36 +07:00
Evgeny Medvedev
459a9f1950 Change version 2018-11-07 21:01:50 +07:00
Evgeny Medvedev
8fb5e3f15f Merge branch 'develop' into cli-changes 2018-11-07 20:36:38 +07:00
Evgeny Medvedev
51fc2cf86e Fix Exception: missing bytecode or contract address 2018-11-07 20:36:18 +07:00
Evgeny Medvedev
436b64cee3 Add required=True for required fields 2018-11-07 20:35:35 +07:00
Evgeny Medvedev
d7f9056e3c Add validation for export_blocks_and_transactions command 2018-11-07 18:29:59 +07:00
Evgeny Medvedev
f516cbed57 Move remaining scripts to cli 2018-11-07 18:26:30 +07:00
Evgeny Medvedev
5f0f111b36 Shorten command descriptions 2018-11-07 18:11:27 +07:00
Evgeny Medvedev
2453db08f0 Add Telegram group 2018-11-07 13:59:11 +07:00
Evgeny Medvedev
336161b484 Update README 2018-11-06 20:43:00 +07:00
Evgeny Medvedev
59754c3598 Add __init__.py to packages 2018-11-06 17:54:07 +07:00
Evgeny Medvedev
c36e2acbbe Remove contract_address field in traces 2018-11-06 17:04:13 +07:00
Evgeny Medvedev
69511bc21c Update README 2018-11-06 16:51:30 +07:00
Evgeny Medvedev
6ab4ffd0c7 Add deprecation warning 2018-11-06 13:15:09 +07:00
Evgeny Medvedev
ba9cac62b5 Add export_geth_traces and extract_geth_traces to cli 2018-11-06 13:08:23 +07:00
Evgeny Medvedev
d920e2ea68 Remove export_geth_traces from above-the-fold 2018-11-05 16:47:59 +07:00
Evgeny Medvedev
3da713742f Add cli command group 2018-11-05 16:40:07 +07:00
Evgeny Medvedev
59e33dec92 Reformat code 2018-11-05 16:07:05 +07:00
Evgeny Medvedev
acd50af6fa Include ethereum-dasm via pip. Remove ethereum-dasm from source code 2018-11-05 14:18:05 +07:00
Evgeny Medvedev
6c67a49c92 Update traces schema in README 2018-11-05 13:31:12 +07:00
Evgeny Medvedev
98f349e236 Update README 2018-11-05 13:29:28 +07:00
Evgeny Medvedev
7439eb611d Remove unnecessary None in dictionary.get() 2018-11-04 23:13:45 +07:00
medvedev1088
b90c8688d8 Merge pull request #116 from evgeniuz/develop
implementation of geth traces exporter and extractor
2018-11-04 22:51:22 +07:00
medvedev1088
79394d72bc Merge pull request #122 from elim0322/pip
ethereumetl.cli function name changes & setup.py
2018-11-01 18:38:23 +07:00
elim0322
9b472d6f64 ethereumetl.cli function name changes & setup.py
The original function names in ethereumetl.cli.* scripts are all
changed to cli for `entry_points` in setup.py
2018-10-30 23:49:56 +13:00
Evgeniy Filatov
eb38bde281 updated readme 2018-10-30 12:31:25 +02:00
Evgeniy Filatov
b027892ed6 added transaction index field both to parity and geth traces 2018-10-30 11:11:20 +02:00
Evgeniy Filatov
616c905fd1 improved format to more closely follow parity traces 2018-10-30 10:46:20 +02:00
Evgeniy Filatov
f4666947f2 added tests for geth traces exporter, refactored a bit 2018-10-30 10:05:11 +02:00
Evgeniy Filatov
ae246fec43 implemented crude version of extract_geth_traces.py 2018-10-30 09:56:31 +02:00
Evgeniy Filatov
b602be97fd started implementation of geth traces exporter 2018-10-30 09:56:31 +02:00
medvedev1088
a04187df93 Merge pull request #121 from blockchain-etl/feature/traces-export-changes
Add reward_type and call_type to traces, use traceBlock instead of traceFilter
2018-10-30 13:41:40 +07:00
Evgeny Medvedev
481e764107 Remove to_address in traces of type create for consistency with transactions 2018-10-30 13:32:59 +07:00
Evgeny Medvedev
8cd6c89aa9 Use code field as output for traces of create type 2018-10-30 13:30:50 +07:00
Evgeny Medvedev
0b3f4d6be1 Use traceBlock instead of traceFilter due to bug in parity https://github.com/paritytech/parity-ethereum/issues/9822 2018-10-30 13:25:52 +07:00
Evgeny Medvedev
cdbc554e77 Add call_type and reward_type fields 2018-10-30 13:18:39 +07:00
medvedev1088
6b9af1e8df Merge pull request #120 from blockchain-etl/feature/traces-export-changes
Add output field to traces
2018-10-28 21:08:33 +07:00
Evgeny Medvedev
8fe091d8f3 Add output field to traces 2018-10-28 20:59:51 +07:00
Evgeny Medvedev
9cc0743a25 Remove brackets from traces.trace_address 2018-10-28 20:50:43 +07:00
medvedev1088
7a79f42a9a Merge pull request #119 from blockchain-etl/bug/key-error-export-contracts
Handle KeyError in export_contracts_job.py
2018-10-28 00:43:53 +07:00
Evgeny Medvedev
3f803cf88e Handle KeyError in export_contracts_job.py 2018-10-28 00:41:38 +07:00
medvedev1088
978ccb219d Merge pull request #112 from elim0322/click
Implement CLI with Click package
2018-10-25 21:00:45 +07:00
elim0322
d442e462e1 ethereumetl/cli package
(1) original scripts in root directory are moved to `ethereumetl/cli`
(2) `main()` is renamed to corresponding script names
(3) root scripts invoke functions in `ethereumetl/cli`
2018-10-25 17:17:22 +13:00
Evgeny Medvedev
7ecdfa4fb7 Fix ValueError when exporting contracts https://github.com/blockchain-etl/ethereum-etl/issues/113 2018-10-23 23:32:54 +07:00
medvedev1088
ae7337cd6d Merge pull request #111 from elim0322/develop
Output cache files to .tmp folder
2018-10-21 22:54:06 +07:00
elim0322
2f0d3bff35 Implement CLI with Click package
(1) Click package is implemented to replace argparse,
(2) README.md is updated and (3) click is added to requirements
2018-10-20 20:40:33 +13:00
elim0322
572a42ba12 Output all cache files to .tmp and clean up .tmp/* 2018-10-18 23:04:50 +13:00
elim0322
138c7e3ce6 Merge branch 'develop' of https://github.com/blockchain-etl/ethereum-etl into develop 2018-10-18 21:05:15 +13:00
Evgeny Medvedev
10e95f19d0 Update .dockerignore 2018-10-14 20:38:59 +07:00
Eric
6ddb96dd36 Merge pull request #109 from elim0322/develop
Remove cached files directory
2018-10-13 06:52:04 +13:00
Eric Lim
c6ad3c355e Merge branch 'master' of https://github.com/blockchain-etl/ethereum-etl into develop 2018-10-12 16:21:11 +13:00
Eric Lim
b26c6a31dc Remove cached files directory
transaction_hahes, contract_addresses and token_addresses directories
are removed at every iteration to limit overall disk usage, using a
Python standard library
2018-10-12 07:40:18 +13:00
Evgeny Medvedev
d560d0b69b Add executable bit to export_all.sh 2018-10-11 19:43:21 +07:00
Evgeny Medvedev
1db56b7a69 Update link to Travis 2018-10-11 13:26:54 +07:00
medvedev1088
9b315926d4 Merge pull request #104 from evgeniuz/master
implementation of internal transaction exporter
2018-10-11 12:52:08 +07:00
medvedev1088
6f729bbac9 Merge pull request #106 from blockchain-etl/develop
Rewrite export_all.sh in Python and add Dockerfile
2018-10-11 12:50:28 +07:00
Evgeniy Filatov
dc5488803c renamed internal txs to traces, improved fields, added tests 2018-10-10 23:08:31 +03:00
Evgeniy Filatov
48a1056238 implemented basic version of internal transaction exporter 2018-10-10 23:08:31 +03:00
Evgeniy Filatov
1fdde0c7c0 started implementation of internal transaction exporter 2018-10-10 23:08:31 +03:00
Evgeny Medvedev
8af607aedc Remove test.txt 2018-10-11 00:16:55 +07:00
Evgeny Medvedev
e6a2cb208c Add test.txt 2018-10-11 00:14:35 +07:00
Evgeny Medvedev
8dcd343e67 Put back export_all.sh until Dockerfile supports IPC 2018-10-10 23:20:48 +07:00
Evgeny Medvedev
c239e70509 Reformatting and refactoring 2018-10-10 23:19:20 +07:00
Evgeny Medvedev
da68fe948b Upload last_synced_block.txt 2018-10-10 22:32:20 +07:00
Evgeny Medvedev
cc3ed86f3b Download last_synced_block_file.txt from GCS bucket 2018-10-10 21:26:25 +07:00
Evgeny Medvedev
60017a5abe Add initialization with start block 2018-10-10 20:22:52 +07:00
Evgeny Medvedev
8cc869694d Update kube.yml 2018-10-10 20:22:39 +07:00
medvedev1088
256d6e1cae Merge pull request #105 from elim0322/develop
Merging of export_all_partition_by_date.py and Dockerfile
2018-10-10 15:54:04 +07:00
Eric Lim
edf607f807 Updated README.md with Docker instructions under Exporting the Blockchain. 2018-10-10 19:04:34 +13:00
Eric Lim
6054d31492 Merged export_all_partition_by_date.py into export_all.py
Changes are:
(1) <start_block> is changed to more generic <start>, which can
take either block number, date or Unix timestamp. Type is changed
to str and default value is removed (which goes well with the
original usage message).
(2) Some help functions are defined to check the argument type.
(3) The rest of the change is checking for argument types and
computing appropriate partitions for export_all()
2018-10-08 14:25:23 +13:00
Eric Lim
bc3e5c964a Pulled out batch_size as -B --export-batch-size with default=100. 2018-10-08 09:51:58 +13:00
Eric Lim
9460c9e158 Added Dockerfile. Tested with Infura.io 2018-10-07 15:50:12 +13:00
medvedev1088
451c19f1ff Merge pull request #101 from elim0322/develop
Fixed an issue where range() returns nothing
2018-10-05 13:04:47 +07:00
Eric Lim
a94f6cba6d Fixed an issue where range() returns nothing to break the loop when start_block and end_block are the same. Removed now redundant export_all.sh 2018-10-05 17:55:15 +13:00
Evgeny Medvedev
624f5aff25 Refactor 2018-10-04 16:00:21 +07:00
Evgeny Medvedev
acf9b1b0d3 Extract common part for exporting all 2018-10-04 15:53:22 +07:00
Evgeny Medvedev
d63b4778e6 Minor fixes 2018-10-04 13:48:39 +07:00
medvedev1088
0daaa80a42 Merge pull request #100 from elim0322/issue#8/rewrite_export_all
Issue#8/rewrite export all
2018-10-04 13:23:27 +07:00
Eric Lim
ecdfe60811 Fixed a mistake of specifying the wrong output file. And removed (no longer being used) datetime loading 2018-10-04 14:51:06 +13:00
Eric Lim
4038856618 Added to be consistent with the other script 2018-10-04 12:53:07 +13:00
Eric Lim
ea3813427d Fixed 3 issues: (1) fixed 'usage', (2) added logging module and (3) changed function name to . Plus a couple of mistakes (removing duplicated lines and adding an equal sign) 2018-10-04 12:41:55 +13:00
Eric Lim
f41a3901e8 Rewrote export_all in python. Notable changes are (1) including functions from extract_csv_column.py and get_block_range_for_date.py, (2) adding a condition for tokens and token_transfers to skip when using infura and (3) extract_csv_column() only writes unique rows 2018-10-03 15:39:05 +13:00
Eric Lim
121d86d958 Rewrote export_all.sh in python 2018-10-02 17:16:37 +13:00
Evgeny Medvedev
3fbf70fb4f Add type when joining 2018-09-28 13:12:04 +07:00
Evgeny Medvedev
f7e7e55441 Fix if condition 2018-09-28 00:05:21 +07:00
Evgeny Medvedev
d677d442bd Add enrichment to streaming.py 2018-09-27 23:49:36 +07:00
Evgeny Medvedev
7a47d93d9e Add docker configs 2018-09-27 16:54:08 +07:00
Evgeny Medvedev
e102f76631 Add pubsub_publish_test.py 2018-09-27 16:54:01 +07:00
Evgeny Medvedev
dbc9be25d0 Update README 2018-09-14 16:16:19 +07:00
Evgeny Medvedev
9bd9d4347b Improve logging 2018-09-13 00:15:59 +07:00
Evgeny Medvedev
54494aef6c Optimize publishing to PubSub 2018-09-13 00:13:17 +07:00
Evgeny Medvedev
c4c3ccc79a Add streaming with Google PubSub 2018-09-12 23:50:49 +07:00
Evgeny Medvedev
47573f3eab Fix broken link 2018-08-31 15:58:00 +07:00
Evgeny Medvedev
e69c2ef74b Add details to README 2018-08-31 01:02:36 +07:00
Evgeny Medvedev
96848936dc Fix version of eth-abi to fix ModuleNotFoundError: No module named 'eth_utils.toolz' 2018-08-30 15:51:03 +07:00
Evgeny Medvedev
663246ee03 Fix versions in requirements.txt 2018-08-30 15:32:26 +07:00
Evgeny Medvedev
d23586e8b2 Add LIMITATIONS to README 2018-08-30 15:27:42 +07:00
Evgeny Medvedev
bf1d7cf152 Update comments in graph_operations.py 2018-08-28 22:01:33 +07:00
Evgeny Medvedev
27913b2610 Update comments in graph_operations.py 2018-08-28 16:43:54 +07:00
Evgeny Medvedev
99f0a3afeb Add reference for get_keccak_hash.py 2018-08-28 15:31:38 +07:00
medvedev1088
1c53815eec Merge pull request #96 from medvedev1088/develop
Move GCP schemas and instructions to ethereum-etl-airflow
2018-08-28 15:28:26 +07:00
Evgeny Medvedev
d7fd26b7b8 Update README 2018-08-23 19:04:13 +07:00
Evgeny Medvedev
65fbc686ce Update README - move GCP instructions to another repository 2018-08-22 13:31:08 +07:00
Evgeny Medvedev
ff1068208a Remove GCP schemas 2018-08-17 23:00:13 +07:00
medvedev1088
32ed837de8 Merge pull request #89 from medvedev1088/develop
Update table and column descriptions
2018-08-14 23:02:16 +07:00
Evgeny Medvedev
0dcbe6b00c Update table and column descriptions 2018-08-14 15:44:40 +07:00
medvedev1088
cf73ccb581 Merge pull request #88 from medvedev1088/develop
Add enrich contracts sql, Add date partitionized version of aws schemas
2018-08-14 15:22:37 +07:00
Evgeny Medvedev
5402611739 Add enrich contracts sql 2018-08-14 15:19:27 +07:00
medvedev1088
f1e8e93f56 Merge pull request #87 from tokusyu/develop
I made AWS schemas for partition by date.
2018-08-14 12:18:53 +07:00
tokusyu
9a19eab23f delete mismaked schemas 2018-08-14 05:51:42 +09:00
tokusyu
190384fe50 Add date partitionized version of aws schemas 2018-08-14 05:49:11 +09:00
medvedev1088
30682aa68d Merge pull request #85 from medvedev1088/develop
Add scripts for enriching tables in BigQuery
2018-08-14 02:17:16 +07:00
medvedev1088
4d9b68dd97 Merge pull request #84 from medvedev1088/feature/enrich-tables
Add scripts for enriching tables in BigQuery
2018-08-14 02:06:19 +07:00
Evgeny Medvedev
dc67caacbc Remove unnecessary parameter 2018-08-14 02:05:13 +07:00
Evgeny Medvedev
1dd2d810d9 Add scripts for enriching tables in BigQuery 2018-08-14 02:02:46 +07:00
medvedev1088
c309af3f02 Merge pull request #83 from tokusyu/develop
I have made a "export_all_partition_by_date.sh"
2018-08-13 22:03:46 +07:00
Yoshinori Seki
01a77ab127 log messages are changed. 2018-08-13 14:51:21 +00:00
Yoshinori Seki
fb0b69be93 Add export_all_partition_by_date. 2018-08-13 14:23:09 +00:00
Evgeny Medvedev
a2a6cc660f Use STRING type for tokens.total_supply 2018-08-12 15:30:01 +07:00
medvedev1088
ff21a7b3bc Merge pull request #79 from medvedev1088/develop
Add get_block_range_for_timestamps.py
2018-08-12 01:05:18 +07:00
Evgeny Medvedev
a630f5515e Add get_block_range_for_timestamps.py 2018-08-12 00:47:08 +07:00
Evgeny Medvedev
63da0941de Add comments 2018-08-11 23:13:20 +07:00
Evgeny Medvedev
c0ff6c4679 Update README 2018-08-11 00:40:38 +07:00
medvedev1088
c9bf46bd7f Merge pull request #77 from medvedev1088/develop
Update README and column descriptions
2018-08-09 15:09:39 +07:00
Evgeny Medvedev
dda14145aa Update comment 2018-08-07 23:28:01 +07:00
Evgeny Medvedev
66ecaabd0c Update column descriptions 2018-08-07 12:59:38 +07:00
Evgeny Medvedev
3c9e7f9cbb Update README 2018-08-07 12:59:09 +07:00
medvedev1088
d777070441 Merge pull request #76 from medvedev1088/develop
Add extract_field.py and filter_items.py scripts
2018-08-06 23:43:56 +07:00
Evgeny Medvedev
d7f9a24826 Add extract_field.py and filter_items.py scripts 2018-08-06 23:26:09 +07:00
medvedev1088
453e36c5a9 Merge pull request #75 from medvedev1088/develop
Refactor table and column names
2018-08-05 02:08:16 +07:00
medvedev1088
6df6b03d39 Merge pull request #73 from medvedev1088/develop
Add Infura tests
2018-08-04 21:56:43 +07:00
medvedev1088
12b4abb545 Merge pull request #71 from medvedev1088/develop
Fix bug in ContractWrapper
2018-07-30 20:37:11 +07:00
medvedev1088
7e9332304d Merge pull request #70 from medvedev1088/develop
Add function_sighashes, is_erc20, is_erc721 to contracts.csv
2018-07-29 23:54:25 +07:00
medvedev1088
9145d40809 Merge pull request #68 from medvedev1088/develop
Change default log steps
2018-07-26 02:59:49 +07:00
311 changed files with 23641 additions and 2344 deletions

4
.dockerignore Normal file
View File

@@ -0,0 +1,4 @@
.*
last_synced_block.txt
pid.txt
output

View File

@@ -0,0 +1,20 @@
name: Publish DockerHub
on:
push:
tags:
- '*'
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@master
- name: Publish to DockerHub
if: startsWith(github.event.ref, 'refs/tags/v')
uses: elgohr/Publish-Docker-Github-Action@master
with:
name: blockchainetl/ethereum-etl
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
tag_semver: true

30
.github/workflows/publish-to-pypi.yml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: Publish to PyPI and TestPyPI
on:
push:
tags:
- '*'
jobs:
build-n-publish:
name: Build and publish to PyPI and TestPyPI
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@master
- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Build a binary wheel and a source tarball
run: python setup.py sdist
- name: Publish distribution to Test PyPI
if: startsWith(github.event.ref, 'refs/tags/v')
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.test_pypi_password }}
repository_url: https://test.pypi.org/legacy/
- name: Publish distribution to PyPI
if: startsWith(github.event.ref, 'refs/tags/v')
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.pypi_password }}

3
.gitignore vendored
View File

@@ -47,3 +47,6 @@ coverage.xml
.venv
venv/
ENV/
# etl
/last_synced_block.txt

14
.readthedocs.yaml Normal file
View File

@@ -0,0 +1,14 @@
# Read the Docs configuration file for MkDocs projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
mkdocs:
configuration: mkdocs.yml

View File

@@ -1,7 +1,14 @@
language: python
python:
- "3.6"
dist: xenial
matrix:
include:
- python: "3.7.2"
env: TOX_POSARGS="-e py37"
- python: "3.8"
env: TOX_POSARGS="-e py38"
- python: "3.9"
env: TOX_POSARGS="-e py39"
install:
- travis_retry pip install -r requirements.txt
- travis_retry pip install tox
script:
- pytest -vv
- travis_wait tox $TOX_POSARGS

15
Dockerfile Normal file
View File

@@ -0,0 +1,15 @@
FROM python:3.7
MAINTAINER Evgeny Medvedev <evge.medvedev@gmail.com>
ENV PROJECT_DIR=ethereum-etl
RUN mkdir /$PROJECT_DIR
WORKDIR /$PROJECT_DIR
COPY . .
RUN pip install --upgrade pip && pip install -e /$PROJECT_DIR/[streaming]
# Add Tini
ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--", "python", "ethereumetl"]

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
Copyright (c) 2018-2025 Evgeny Medvedev, evge.medvedev@gmail.com, https://twitter.com/EvgeMedvedev
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
@@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.

529
README.md
View File

@@ -1,482 +1,123 @@
# Ethereum ETL
[![Join the chat at https://gitter.im/ethereum-eth](https://badges.gitter.im/ethereum-etl.svg)](https://gitter.im/ethereum-etl/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/medvedev1088/ethereum-etl.png)](https://travis-ci.org/medvedev1088/ethereum-etl)
[![Build Status](https://app.travis-ci.com/blockchain-etl/ethereum-etl.svg?branch=develop)](https://travis-ci.com/github/blockchain-etl/ethereum-etl)
[![License](https://img.shields.io/github/license/blockchain-etl/ethereum-etl)](https://github.com/blockchain-etl/ethereum-etl/blob/develop/LICENSE)
[![Telegram](https://img.shields.io/badge/telegram-join%20chat-blue.svg)](https://t.me/BlockchainETL)
[![Twitter](https://img.shields.io/twitter/follow/EthereumETL)](https://x.com/EthereumETL)
Export blocks and transactions ([Reference](#export_blocks_and_transactionspy)):
Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
*Do you just want to query Ethereum data right away? Use the [public dataset in BigQuery](https://console.cloud.google.com/marketplace/details/ethereum/crypto-ethereum-blockchain).*
[Full documentation available here](http://ethereum-etl.readthedocs.io/).
## Quickstart
Install Ethereum ETL:
```bash
> python export_blocks_and_transactions.py --start-block 0 --end-block 500000 \
--provider-uri https://mainnet.infura.io --blocks-output blocks.csv --transactions-output transactions.csv
pip3 install ethereum-etl
```
Export ERC20 and ERC721 transfers ([Reference](#export_token_transferspy)):
Export blocks and transactions ([Schema](docs/schema.md#blockscsv), [Reference](docs/commands.md#export_blocks_and_transactions)):
```bash
> python export_token_transfers.py --start-block 0 --end-block 500000 \
> ethereumetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--blocks-output blocks.csv --transactions-output transactions.csv \
--provider-uri https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c
```
Export ERC20 and ERC721 transfers ([Schema](docs/schema.md#token_transferscsv), [Reference](docs/commands.md#export_token_transfers)):
```bash
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv
```
Export receipts and logs ([Reference](#export_receipts_and_logspy)):
Export traces ([Schema](docs/schema.md#tracescsv), [Reference](docs/commands.md#export_traces)):
```bash
> python export_receipts_and_logs.py --transaction-hashes transaction_hashes.csv \
--provider-uri https://mainnet.infura.io --receipts-output receipts.csv --logs-output logs.csv
> ethereumetl export_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/parity.ipc --output traces.csv
```
Export ERC20 and ERC721 token details ([Reference](#export_tokenspy)):
---
Stream blocks, transactions, logs, token_transfers continually to console ([Reference](docs/commands.md#stream)):
```bash
> python export_tokens.py --token-addresses token_addresses.csv \
--provider-uri https://mainnet.infura.io --output tokens.csv
> pip3 install ethereum-etl[streaming]
> ethereumetl stream --start-block 500000 -e block,transaction,log,token_transfer --log-file log.txt \
--provider-uri https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c
```
Read this article https://medium.com/@medvedev1088/exporting-and-analyzing-ethereum-blockchain-f5353414a94e
## Table of Contents
- [Schema](#schema)
- [blocks.csv](#blockscsv)
- [transactions.csv](#transactionscsv)
- [token_transfers.csv](#token_transferscsv)
- [receipts.csv](#receiptscsv)
- [logs.csv](#logscsv)
- [contracts.csv](#contractscsv)
- [tokens.csv](#tokenscsv)
- [Exporting the Blockchain](#exporting-the-blockchain)
- [Export in 2 Hours](#export-in-2-hours)
- [Command Reference](#command-reference)
- [Querying in Amazon Athena](#querying-in-amazon-athena)
- [Querying in Google BigQuery](#querying-in-google-bigquery)
- [Public Dataset](#public-dataset)
## Schema
### blocks.csv
Column | Type |
------------------------|--------------------|
number | bigint |
hash | hex_string |
parent_hash | hex_string |
nonce | hex_string |
sha3_uncles | hex_string |
logs_bloom | hex_string |
transactions_root | hex_string |
state_root | hex_string |
receipts_root | hex_string |
miner | address |
difficulty | numeric |
total_difficulty | numeric |
size | bigint |
extra_data | hex_string |
gas_limit | bigint |
gas_used | bigint |
timestamp | bigint |
transaction_count | bigint |
### transactions.csv
Column | Type |
--------------------|-------------|
hash | hex_string |
nonce | bigint |
block_hash | hex_string |
block_number | bigint |
transaction_index| bigint |
from_address | address |
to_address | address |
value | numeric |
gas | bigint |
gas_price | bigint |
input | hex_string |
### token_transfers.csv
Column | Type |
--------------------|-------------|
token_address | address |
from_address | address |
to_address | address |
value | numeric |
transaction_hash | hex_string |
log_index | bigint |
block_number | bigint |
### receipts.csv
Column | Type |
-----------------------------|-------------|
transaction_hash | hex_string |
transaction_index | bigint |
block_hash | hex_string |
block_number | bigint |
cumulative_gas_used | bigint |
gas_used | bigint |
contract_address | address |
root | hex_string |
status | bigint |
### logs.csv
Column | Type |
-----------------------------|-------------|
log_index | bigint |
transaction_hash | hex_string |
transaction_index | bigint |
block_hash | hex_string |
block_number | bigint |
address | address |
data | hex_string |
topics | string |
### contracts.csv
Column | Type |
-----------------------------|-------------|
address | address |
bytecode | hex_string |
function_sighashes | string |
is_erc20 | boolean |
is_erc721 | boolean |
### tokens.csv
Column | Type |
-----------------------------|-------------|
address | address |
symbol | string |
name | string |
decimals | bigint |
total_supply | numeric |
You can find column descriptions in [schemas/gcp](schemas/gcp)
Note: `symbol`, `name`, `decimals`, `total_supply`
columns in `tokens.csv` can have empty values in case the contract doesn't implement the corresponding methods
or implements it incorrectly (e.g. wrong return type).
Note: for the `address` type all hex characters are lower-cased.
`boolean` type can have 2 values: `True` or `False`.
## Exporting the Blockchain
1. Install python 3.5 or 3.6 https://www.python.org/downloads/
1. You can use Infura if you don't need ERC20 transfers (Infura doesn't support eth_getFilterLogs JSON RPC method).
For that use `-p https://mainnet.infura.io` option for the commands below. If you need ERC20 transfers or want to
export the data ~40 times faster, you will need to set up a local Ethereum node:
1. Install geth https://github.com/ethereum/go-ethereum/wiki/Installing-Geth
1. Start geth.
Make sure it downloaded the blocks that you need by executing `eth.syncing` in the JS console.
You can export blocks below `currentBlock`,
there is no need to wait until the full sync as the state is not needed (unless you also need contracts bytecode
and token details).
You can export blocks below `currentBlock`,
there is no need to wait until the full sync as the state is not needed.
1. Clone Ethereum ETL and install the dependencies:
```bash
> git clone https://github.com/medvedev1088/ethereum-etl.git
> cd ethereum-etl
> pip install -r requirements.txt
```
1. Export all:
```bash
> ./export_all.sh -h
Usage: ./export_all.sh -s <start_block> -e <end_block> -b <batch_size> -p <provider_uri> [-o <output_dir>]
> ./export_all.sh -s 0 -e 5499999 -b 100000 -p file://$HOME/Library/Ethereum/geth.ipc -o output
```
The result will be in the `output` subdirectory, partitioned in Hive style:
```bash
output/blocks/start_block=00000000/end_block=00099999/blocks_00000000_00099999.csv
output/blocks/start_block=00100000/end_block=00199999/blocks_00100000_00199999.csv
...
output/transactions/start_block=00000000/end_block=00099999/transactions_00000000_00099999.csv
...
output/token_transfers/start_block=00000000/end_block=00099999/token_transfers_00000000_00099999.csv
...
```
Should work with geth and parity, on Linux, Mac, Windows.
Tested with Python 3.6, geth 1.8.7, Ubuntu 16.04.4
If you see weird behavior, e.g. wrong number of rows in the CSV files or corrupted files,
check this issue: https://github.com/medvedev1088/ethereum-etl/issues/28
#### Export in 2 Hours
You can use AWS Auto Scaling and Data Pipeline to reduce the exporting time to a few hours.
Read this article for details https://medium.com/@medvedev1088/how-to-export-the-entire-ethereum-blockchain-to-csv-in-2-hours-for-10-69fef511e9a2
#### Running in Windows
Additional steps:
1. Install Visual C++ Build Tools https://landinghub.visualstudio.com/visual-cpp-build-tools
1. Install Git Bash with Git for Windows https://git-scm.com/download/win
1. Run in Git Bash:
```bash
> ./export_all.sh -s 0 -e 999999 -b 100000 -p 'file:\\\\.\pipe\geth.ipc' -o output
```
#### Command Reference
- [export_blocks_and_transactions.py](#export_blocks_and_transactionspy)
- [export_token_transfers.py](#export_token_transferspy)
- [extract_token_transfers.py](#extract_token_transferspy)
- [export_receipts_and_logs.py](#export_receipts_and_logspy)
- [export_contracts.py](#export_contractspy)
- [export_tokens.py](#export_tokenspy)
- [get_block_range_for_date.py](#get_block_range_for_datepy)
All the commands accept `-h` parameter for help, e.g.:
Find other commands [here](https://ethereum-etl.readthedocs.io/en/latest/commands/).
For the latest version, check out the repo and call
```bash
> python export_blocks_and_transactions.py -h
usage: export_blocks_and_transactions.py [-h] [-s START_BLOCK] -e END_BLOCK
[-b BATCH_SIZE] --provider-uri PROVIDER_URI
[-w MAX_WORKERS]
[--blocks-output BLOCKS_OUTPUT]
[--transactions-output TRANSACTIONS_OUTPUT]
Export blocks and transactions.
> pip3 install -e .
> python3 ethereumetl.py
```
For the `--output` parameters the supported types are csv and json. The format type is inferred from the output file name.
## Useful Links
##### export_blocks_and_transactions.py
```bash
> python export_blocks_and_transactions.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --blocks-output blocks.csv --transactions-output transactions.csv
```
Omit `--blocks-output` or `--transactions-output` options if you want to export only transactions/blocks.
You can tune `--batch-size`, `--max-workers` for performance.
##### export_token_transfers.py
The API used in this command is not supported by Infura, so you will need a local node.
If you want to use Infura for exporting ERC20 transfers refer to [extract_token_transfers.py](#extract_token_transferspy)
```bash
> python export_token_transfers.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output token_transfers.csv
```
Include `--tokens <token1> <token2>` to filter only certain tokens, e.g.
```bash
> python export_token_transfers.py --start-block 0 --end-block 500000 --provider-uri file://$HOME/Library/Ethereum/geth.ipc \
--output token_transfers.csv --tokens 0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0 0x06012c8cf97bead5deae237070f9587f8e7a266d
```
You can tune `--batch-size`, `--max-workers` for performance.
##### export_receipts_and_logs.py
First extract transaction hashes from `transactions.csv`
(Exported with [export_blocks_and_transactions.py](#export_blocks_and_transactionspy)):
```bash
> python extract_csv_column.py --input transactions.csv --column transaction_hash --output transaction_hashes.csv
```
Then export receipts and logs:
```bash
> python export_receipts_and_logs.py --transaction-hashes transaction_hashes.csv \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --receipts-output receipts.csv --logs-output logs.csv
```
Omit `--receipts-output` or `--logs-output` options if you want to export only logs/receipts.
You can tune `--batch-size`, `--max-workers` for performance.
Upvote this feature request https://github.com/paritytech/parity/issues/9075,
it will make receipts and logs export much faster.
##### extract_token_transfers.py
First export receipt logs with [export_receipts_and_logs.py](#export_receipts_and_logspy).
Then extract transfers from the logs.csv file:
```bash
> python extract_token_transfers.py --logs logs.csv --output token_transfers.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
##### export_contracts.py
First extract contract addresses from `receipts.csv`
(Exported with [export_receipts_and_logs.py](#export_receipts_and_logspy)):
```bash
> python extract_csv_column.py --input receipts.csv --column contract_address --output contract_addresses.csv
```
Then export contracts:
```bash
> python export_contracts.py --contract-addresses contract_addresses.csv \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output contracts.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
##### export_tokens.py
First extract token addresses from `token_transfers.csv`
(Exported with [export_token_transfers.py](#export_token_transferspy)):
```bash
> python extract_csv_column.py -i token_transfers.csv -c token_address -o - | sort | uniq > token_addresses.csv
```
Then export ERC20 tokens:
```bash
> python export_tokens.py --token-addresses token_addresses.csv \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output tokens.csv
```
You can tune `--max-workers` for performance.
Note that there will be duplicate tokens across different partitions,
which need to be deduplicated (see Querying in Google BigQuery section).
Upvote this pull request to make tokens export faster
https://github.com/ethereum/web3.py/pull/944#issuecomment-403957468
##### get_block_range_for_date.py
```bash
> python get_block_range_for_date.py --provider-uri=https://mainnet.infura.io --date 2018-01-01
4832686,4838611
```
#### Running Tests
- [Schema](https://ethereum-etl.readthedocs.io/en/latest/schema/)
- [Command Reference](https://ethereum-etl.readthedocs.io/en/latest/commands/)
- [Documentation](https://ethereum-etl.readthedocs.io/)
- [Public Datasets in BigQuery](https://github.com/blockchain-etl/public-datasets)
- [Exporting the Blockchain](https://ethereum-etl.readthedocs.io/en/latest/exporting-the-blockchain/)
- [Querying in Amazon Athena](https://ethereum-etl.readthedocs.io/en/latest/amazon-athena/)
- [Querying in Google BigQuery](https://ethereum-etl.readthedocs.io/en/latest/google-bigquery/)
- [Querying in Kaggle](https://www.kaggle.com/bigquery/ethereum-blockchain)
- [Airflow DAGs](https://github.com/blockchain-etl/ethereum-etl-airflow)
- [Postgres ETL](https://github.com/blockchain-etl/ethereum-etl-postgresql)
- [Ethereum 2.0 ETL](https://github.com/blockchain-etl/ethereum2-etl)
## Running Tests
```bash
> pip3 install -e .[dev,streaming]
> export ETHEREUM_ETL_RUN_SLOW_TESTS=True
> export PROVIDER_URL=<your_provider_uri>
> pytest -vv
```
```
## Querying in Amazon Athena
- Upload the files to S3:
### Running Tox Tests
```bash
> cd output
> aws s3 sync . s3://<your_bucket>/ethereumetl/export --region ap-southeast-1
> pip3 install tox
> tox
```
- Sign in to Athena https://console.aws.amazon.com/athena/home
## Running in Docker
- Create a database:
1. Install Docker: https://docs.docker.com/get-docker/
```sql
CREATE DATABASE ethereumetl;
2. Build a docker image
> docker build -t ethereum-etl:latest .
> docker image ls
3. Run a container out of the image
> docker run -v $HOME/output:/ethereum-etl/output ethereum-etl:latest export_all -s 0 -e 5499999 -b 100000 -p https://mainnet.infura.io
> docker run -v $HOME/output:/ethereum-etl/output ethereum-etl:latest export_all -s 2018-01-01 -e 2018-01-01 -p https://mainnet.infura.io
4. Run streaming to console or Pub/Sub
> docker build -t ethereum-etl:latest .
> echo "Stream to console"
> docker run ethereum-etl:latest stream --start-block 500000 --log-file log.txt
> echo "Stream to Pub/Sub"
> docker run -v /path_to_credentials_file/:/ethereum-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/ethereum-etl/credentials_file.json ethereum-etl:latest stream --start-block 500000 --output projects/<your_project>/topics/crypto_ethereum
If running on an Apple M1 chip add the `--platform linux/x86_64` option to the `build` and `run` commands e.g.:
```
docker build --platform linux/x86_64 -t ethereum-etl:latest .
docker run --platform linux/x86_64 ethereum-etl:latest stream --start-block 500000
```
- Create the tables:
- blocks: [schemas/aws/blocks.sql](schemas/aws/blocks.sql)
- transactions: [schemas/aws/transactions.sql](schemas/aws/transactions.sql)
- token_transfers: [schemas/aws/token_transfers.sql](schemas/aws/token_transfers.sql)
- contracts: [schemas/aws/contracts.sql](schemas/aws/contracts.sql)
- receipts: [schemas/aws/receipts.sql](schemas/aws/receipts.sql)
- logs: [schemas/aws/logs.sql](schemas/aws/logs.sql)
- tokens: [schemas/aws/tokens.sql](schemas/aws/tokens.sql)
### Tables for Parquet Files
Read this article on how to convert CSVs to Parquet https://medium.com/@medvedev1088/converting-ethereum-etl-files-to-parquet-399e048ddd30
- Create the tables:
- parquet_blocks: [schemas/aws/parquet/parquet_blocks.sql](schemas/aws/parquet/parquet_blocks.sql)
- parquet_transactions: [schemas/aws/parquet/parquet_transactions.sql](schemas/aws/parquet/parquet_transactions.sql)
- parquet_token_transfers: [schemas/aws/parquet/parquet_token_transfers.sql](schemas/aws/parquet/parquet_token_transfers.sql)
Note that DECIMAL type is limited to 38 digits in Hive https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-decimal
so values greater than 38 decimals will be null.
## Querying in Google BigQuery
To upload CSVs to BigQuery:
- Install Google Cloud SDK https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu
- Create a new Google Storage bucket https://console.cloud.google.com/storage/browser
- Upload the files:
```bash
> cd output
> gsutil -m rsync -r . gs://<your_bucket>/ethereumetl/export
```
- Sign in to BigQuery https://bigquery.cloud.google.com/
- Create a new dataset called `ethereum`
- Load the files from the bucket to BigQuery:
```bash
> cd ethereum-etl
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.blocks gs://<your_bucket>/ethereumetl/export/blocks/*.csv ./schemas/gcp/blocks.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.transactions gs://<your_bucket>/ethereumetl/export/transactions/*.csv ./schemas/gcp/transactions.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.token_transfers gs://<your_bucket>/ethereumetl/export/token_transfers/*.csv ./schemas/gcp/token_transfers.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.receipts gs://<your_bucket>/ethereumetl/export/receipts/*.csv ./schemas/gcp/receipts.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.logs gs://<your_bucket>/ethereumetl/export/logs/*.json ./schemas/gcp/logs.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.contracts gs://<your_bucket>/ethereumetl/export/contracts/*.json ./schemas/gcp/contracts.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 --allow_quoted_newlines ethereum.tokens_duplicates gs://<your_bucket>/ethereumetl/export/tokens/*.csv ./schemas/gcp/tokens.json
```
Note that NEWLINE_DELIMITED_JSON is used to support REPEATED mode for the columns with lists.
Join `transactions` and `receipts`:
```bash
> bq mk --table --description "Exported using https://github.com/medvedev1088/ethereum-etl" --time_partitioning_field timestamp_partition ethereum.transactions_join_receipts ./schemas/gcp/transactions_join_receipts.json
> bq --location=US query --replace --destination_table ethereum.transactions_join_receipts --use_legacy_sql=false "$(cat ./schemas/gcp/transactions_join_receipts.sql | tr '\n' ' ')"
```
Deduplicate `tokens`:
```bash
> bq mk --table --description "Exported using https://github.com/medvedev1088/ethereum-etl" ethereum.tokens ./schemas/gcp/tokens.json
> bq --location=US query --replace --destination_table ethereum.tokens --use_legacy_sql=false "$(cat ./schemas/gcp/tokens_deduplicate.sql | tr '\n' ' ')"
```
### Public Dataset
You can query the data that I exported in the public BigQuery dataset
https://medium.com/@medvedev1088/ethereum-blockchain-on-google-bigquery-283fb300f579
### SQL for Blockchain
I'm currently working on a SaaS solution for analysts and developers. The MVP will have the following:
- Built on top of AWS, cost efficient
- Can provide access to raw CSV data if needed
- Support for internal transactions in the future
- Support for Bitcoin and other blockchains in the future
- ERC20 token metrics in the future
Contact me if you would like to contribute evge.medvedev@gmail.com
## Projects using Ethereum ETL
* [Google](https://goo.gl/oY5BCQ) - Public BigQuery Ethereum datasets
* [Nansen](https://nansen.ai/query?ref=ethereumetl) - Analytics platform for Ethereum

View File

View File

@@ -0,0 +1,35 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import itertools
# https://stackoverflow.com/a/27062830/1580227
class AtomicCounter:
def __init__(self):
self._counter = itertools.count()
# init to 0
next(self._counter)
def increment(self, increment=1):
assert increment > 0
return [next(self._counter) for _ in range(0, increment)][-1]

View File

@@ -0,0 +1,42 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072
import sys
import csv
def set_max_field_size_limit():
max_int = sys.maxsize
decrement = True
while decrement:
# decrease the maxInt value by factor 10
# as long as the OverflowError occurs.
decrement = False
try:
csv.field_size_limit(max_int)
except OverflowError:
max_int = int(max_int / 10)
decrement = True

220
blockchainetl/exporters.py Normal file
View File

@@ -0,0 +1,220 @@
# Copyright (c) Scrapy developers.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions, and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions, and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of Scrapy nor the names of its contributors may be used
# to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
Item Exporters are used to export/serialize items into different formats.
"""
import csv
import io
import threading
from json import JSONEncoder
import decimal
import six
class BaseItemExporter(object):
def __init__(self, **kwargs):
self._configure(kwargs)
def _configure(self, options, dont_fail=False):
"""Configure the exporter by popping options from the ``options`` dict.
If dont_fail is set, it won't raise an exception on unexpected options
(useful for using with keyword arguments in subclasses constructors)
"""
self.encoding = options.pop('encoding', None)
self.fields_to_export = options.pop('fields_to_export', None)
self.export_empty_fields = options.pop('export_empty_fields', False)
self.indent = options.pop('indent', None)
if not dont_fail and options:
raise TypeError("Unexpected options: %s" % ', '.join(options.keys()))
def export_item(self, item):
raise NotImplementedError
def serialize_field(self, field, name, value):
serializer = field.get('serializer', lambda x: x)
return serializer(value)
def start_exporting(self):
pass
def finish_exporting(self):
pass
def _get_serialized_fields(self, item, default_value=None, include_empty=None):
"""Return the fields to export as an iterable of tuples
(name, serialized_value)
"""
if include_empty is None:
include_empty = self.export_empty_fields
if self.fields_to_export is None:
if include_empty and not isinstance(item, dict):
field_iter = six.iterkeys(item.fields)
else:
field_iter = six.iterkeys(item)
else:
if include_empty:
field_iter = self.fields_to_export
else:
field_iter = (x for x in self.fields_to_export if x in item)
for field_name in field_iter:
if field_name in item:
field = {} if isinstance(item, dict) else item.fields[field_name]
value = self.serialize_field(field, field_name, item[field_name])
else:
value = default_value
yield field_name, value
class CsvItemExporter(BaseItemExporter):
def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
self._configure(kwargs, dont_fail=True)
if not self.encoding:
self.encoding = 'utf-8'
self.include_headers_line = include_headers_line
self.stream = io.TextIOWrapper(
file,
line_buffering=False,
write_through=True,
encoding=self.encoding
) if six.PY3 else file
self.csv_writer = csv.writer(self.stream, **kwargs)
self._headers_not_written = True
self._join_multivalued = join_multivalued
self._write_headers_lock = threading.Lock()
def serialize_field(self, field, name, value):
serializer = field.get('serializer', self._join_if_needed)
return serializer(value)
def _join_if_needed(self, value):
def to_string(x):
if isinstance(x, dict):
# Separators without whitespace for compact format.
return JSONEncoder(separators=(',', ':')).encode(x)
else:
return str(x)
if isinstance(value, (list, tuple)):
try:
return self._join_multivalued.join(to_string(x) for x in value)
except TypeError: # list in value may not contain strings
pass
return value
def export_item(self, item):
# Double-checked locking (safe in Python because of GIL) https://en.wikipedia.org/wiki/Double-checked_locking
if self._headers_not_written:
with self._write_headers_lock:
if self._headers_not_written:
self._write_headers_and_set_fields_to_export(item)
self._headers_not_written = False
fields = self._get_serialized_fields(item, default_value='',
include_empty=True)
values = list(self._build_row(x for _, x in fields))
self.csv_writer.writerow(values)
def _build_row(self, values):
for s in values:
try:
yield to_native_str(s, self.encoding)
except TypeError:
yield s
def _write_headers_and_set_fields_to_export(self, item):
if self.include_headers_line:
if not self.fields_to_export:
if isinstance(item, dict):
# for dicts try using fields of the first item
self.fields_to_export = list(item.keys())
else:
# use fields declared in Item
self.fields_to_export = list(item.fields.keys())
row = list(self._build_row(self.fields_to_export))
self.csv_writer.writerow(row)
def EncodeDecimal(o):
if isinstance(o, decimal.Decimal):
return float(round(o, 8))
raise TypeError(repr(o) + " is not JSON serializable")
class JsonLinesItemExporter(BaseItemExporter):
def __init__(self, file, **kwargs):
self._configure(kwargs, dont_fail=True)
self.file = file
kwargs.setdefault('ensure_ascii', not self.encoding)
# kwargs.setdefault('default', EncodeDecimal)
self.encoder = JSONEncoder(default=EncodeDecimal, **kwargs)
def export_item(self, item):
itemdict = dict(self._get_serialized_fields(item))
data = self.encoder.encode(itemdict) + '\n'
self.file.write(to_bytes(data, self.encoding))
def to_native_str(text, encoding=None, errors='strict'):
""" Return str representation of `text`
(bytes in Python 2.x and unicode in Python 3.x). """
if six.PY2:
return to_bytes(text, encoding, errors)
else:
return to_unicode(text, encoding, errors)
def to_bytes(text, encoding=None, errors='strict'):
"""Return the binary representation of `text`. If `text`
is already a bytes object, return it as-is."""
if isinstance(text, bytes):
return text
if not isinstance(text, six.string_types):
raise TypeError('to_bytes must receive a unicode, str or bytes '
'object, got %s' % type(text).__name__)
if encoding is None:
encoding = 'utf-8'
return text.encode(encoding, errors)
def to_unicode(text, encoding=None, errors='strict'):
"""Return the unicode representation of a bytes object `text`. If `text`
is already an unicode object, return it as-is."""
if isinstance(text, six.text_type):
return text
if not isinstance(text, (bytes, six.text_type)):
raise TypeError('to_unicode must receive a bytes, str or unicode '
'object, got %s' % type(text).__name__)
if encoding is None:
encoding = 'utf-8'
return text.decode(encoding, errors)

View File

View File

View File

@@ -21,26 +21,29 @@
# SOFTWARE.
import logging
from ethereumetl.atomic_counter import AtomicCounter
from ethereumetl.exporters import CsvItemExporter, JsonLinesItemExporter
from ethereumetl.file_utils import get_file_handle, close_silently
from blockchainetl.atomic_counter import AtomicCounter
from blockchainetl.exporters import CsvItemExporter, JsonLinesItemExporter
from blockchainetl.file_utils import get_file_handle, close_silently
from blockchainetl.jobs.exporters.converters.composite_item_converter import CompositeItemConverter
class CompositeItemExporter:
def __init__(self, filename_mapping, field_mapping):
def __init__(self, filename_mapping, field_mapping=None, converters=()):
self.filename_mapping = filename_mapping
self.field_mapping = field_mapping
self.field_mapping = field_mapping or {}
self.file_mapping = {}
self.exporter_mapping = {}
self.counter_mapping = {}
self.converter = CompositeItemConverter(converters)
self.logger = logging.getLogger('CompositeItemExporter')
def open(self):
for item_type, filename in self.filename_mapping.items():
file = get_file_handle(filename, binary=True)
fields = self.field_mapping[item_type]
fields = self.field_mapping.get(item_type)
self.file_mapping[item_type] = file
if str(filename).endswith('.json'):
item_exporter = JsonLinesItemExporter(file, fields_to_export=fields)
@@ -50,17 +53,21 @@ class CompositeItemExporter:
self.counter_mapping[item_type] = AtomicCounter()
def export_item(self, item):
item_type = item.get('type', None)
if item_type is None:
raise ValueError('type key is not found in item {}'.format(repr(item)))
def export_items(self, items):
for item in items:
self.export_item(item)
exporter = self.exporter_mapping[item_type]
def export_item(self, item):
item_type = item.get('type')
if item_type is None:
raise ValueError('"type" key is not found in item {}'.format(repr(item)))
exporter = self.exporter_mapping.get(item_type)
if exporter is None:
raise ValueError('Exporter for item type {} not found'.format(item_type))
exporter.export_item(item)
exporter.export_item(self.converter.convert_item(item))
counter = self.counter_mapping[item_type]
counter = self.counter_mapping.get(item_type)
if counter is not None:
counter.increment()
@@ -68,4 +75,5 @@ class CompositeItemExporter:
for item_type, file in self.file_mapping.items():
close_silently(file)
counter = self.counter_mapping[item_type]
self.logger.info('{} items exported: {}'.format(item_type, counter.increment() - 1))
if counter is not None:
self.logger.info('{} items exported: {}'.format(item_type, counter.increment() - 1))

View File

@@ -0,0 +1,38 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
class ConsoleItemExporter:
def open(self):
pass
def export_items(self, items):
for item in items:
self.export_item(item)
def export_item(self, item):
print(json.dumps(item))
def close(self):
pass

View File

@@ -0,0 +1,45 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# MIT License
#
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
class CompositeItemConverter:
def __init__(self, converters=()):
self.converters = converters
def convert_item(self, item):
if self.converters is None:
return item
for converter in self.converters:
item = converter.convert_item(item)
return item

View File

@@ -0,0 +1,47 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# MIT License
#
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
#
from decimal import Decimal
from blockchainetl.jobs.exporters.converters.simple_item_converter import SimpleItemConverter
# Large ints are not handled correctly by pg8000 so we use Decimal instead:
# https://github.com/mfenniak/pg8000/blob/412eace074514ada824e7a102765e37e2cda8eaa/pg8000/core.py#L1703
class IntToDecimalItemConverter(SimpleItemConverter):
def convert_field(self, key, value):
if isinstance(value, int):
return Decimal(value)
else:
return value

View File

@@ -0,0 +1,46 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# MIT License
#
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
from blockchainetl.jobs.exporters.converters.simple_item_converter import SimpleItemConverter
class IntToStringItemConverter(SimpleItemConverter):
def __init__(self, keys=None):
self.keys = set(keys) if keys else None
def convert_field(self, key, value):
if isinstance(value, int) and (self.keys is None or key in self.keys):
return str(value)
else:
return value

View File

@@ -0,0 +1,56 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# MIT License
#
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
class ListFieldItemConverter:
def __init__(self, field, new_field_prefix, fill=0, fill_with=None):
self.field = field
self.new_field_prefix = new_field_prefix
self.fill = fill
self.fill_with = fill_with
def convert_item(self, item):
if not item:
return item
lst = item.get(self.field)
result = item
if lst is not None and isinstance(lst, list):
result = item.copy()
del result[self.field]
for lst_item_index, lst_item in enumerate(lst):
result[self.new_field_prefix + str(lst_item_index)] = lst_item
if len(lst) < self.fill:
for i in range(len(lst), self.fill):
result[self.new_field_prefix + str(i)] = self.fill_with
return result

View File

@@ -0,0 +1,47 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# MIT License
#
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
class SimpleItemConverter:
def __init__(self, field_converters=None):
self.field_converters = field_converters
def convert_item(self, item):
return {
key: self.convert_field(key, value) for key, value in item.items()
}
def convert_field(self, key, value):
if self.field_converters is not None and key in self.field_converters:
return self.field_converters[key](value)
else:
return value

View File

@@ -0,0 +1,41 @@
# MIT License
#
# Copyright (c) 2020 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from datetime import datetime
from blockchainetl.jobs.exporters.converters.simple_item_converter import SimpleItemConverter
class UnixTimestampItemConverter(SimpleItemConverter):
def convert_field(self, key, value):
if key is not None and key.endswith('timestamp'):
return to_timestamp(value)
else:
return value
def to_timestamp(value):
if isinstance(value, int):
return datetime.utcfromtimestamp(value).strftime('%Y-%m-%d %H:%M:%S')
else:
return value

View File

@@ -0,0 +1,111 @@
# MIT License
#
# Copyright (c) 2020 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
import logging
from collections import defaultdict
from google.cloud import storage
def build_block_bundles(items):
blocks = defaultdict(list)
transactions = defaultdict(list)
logs = defaultdict(list)
token_transfers = defaultdict(list)
traces = defaultdict(list)
for item in items:
item_type = item.get('type')
if item_type == 'block':
blocks[item.get('number')].append(item)
elif item_type == 'transaction':
transactions[item.get('block_number')].append(item)
elif item_type == 'log':
logs[item.get('block_number')].append(item)
elif item_type == 'token_transfer':
token_transfers[item.get('block_number')].append(item)
elif item_type == 'trace':
traces[item.get('block_number')].append(item)
else:
logging.info(f'Skipping item with type {item_type}')
block_bundles = []
for block_number in sorted(blocks.keys()):
if len(blocks[block_number]) != 1:
raise ValueError(f'There must be a single block for a given block number, was {len(blocks[block_number])} for block number {block_number}')
block_bundles.append({
'block': blocks[block_number][0],
'transactions': transactions[block_number],
'logs': logs[block_number],
'token_transfers': token_transfers[block_number],
'traces': traces[block_number],
})
return block_bundles
class GcsItemExporter:
def __init__(
self,
bucket,
path='blocks',
build_block_bundles_func=build_block_bundles):
self.bucket = bucket
self.path = normalize_path(path)
self.build_block_bundles_func = build_block_bundles_func
self.storage_client = storage.Client()
def open(self):
pass
def export_items(self, items):
block_bundles = self.build_block_bundles_func(items)
for block_bundle in block_bundles:
block = block_bundle.get('block')
if block is None:
raise ValueError('block_bundle must include the block field')
block_number = block.get('number')
if block_number is None:
raise ValueError('block_bundle must include the block.number field')
destination_blob_name = f'{self.path}/{block_number}.json'
bucket = self.storage_client.bucket(self.bucket)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(json.dumps(block_bundle))
logging.info(f'Uploaded file gs://{self.bucket}/{destination_blob_name}')
def close(self):
pass
def normalize_path(p):
if p is None:
p = ''
if p.startswith('/'):
p = p[1:]
if p.endswith('/'):
p = p[:len(p) - 1]
return p

View File

@@ -0,0 +1,105 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
import logging
from google.cloud import pubsub_v1
from timeout_decorator import timeout_decorator
class GooglePubSubItemExporter:
def __init__(self, item_type_to_topic_mapping, message_attributes=(),
batch_max_bytes=1024 * 5, batch_max_latency=1, batch_max_messages=1000,
enable_message_ordering=False):
self.item_type_to_topic_mapping = item_type_to_topic_mapping
self.batch_max_bytes = batch_max_bytes
self.batch_max_latency = batch_max_latency
self.batch_max_messages = batch_max_messages
self.enable_message_ordering = enable_message_ordering
self.publisher = self.create_publisher()
self.message_attributes = message_attributes
def open(self):
pass
def export_items(self, items):
try:
self._export_items_with_timeout(items)
except timeout_decorator.TimeoutError as e:
# A bug in PubSub publisher that makes it stalled after running for some time.
# Exception in thread Thread-CommitBatchPublisher:
# details = "channel is in state TRANSIENT_FAILURE"
# https://stackoverflow.com/questions/55552606/how-can-one-catch-exceptions-in-python-pubsub-subscriber-that-are-happening-in-i?noredirect=1#comment97849067_55552606
logging.info('Recreating Pub/Sub publisher.')
self.publisher = self.create_publisher()
raise e
@timeout_decorator.timeout(300)
def _export_items_with_timeout(self, items):
futures = []
for item in items:
message_future = self.export_item(item)
futures.append(message_future)
for future in futures:
# result() blocks until the message is published.
future.result()
def export_item(self, item):
item_type = item.get('type')
if item_type is not None and item_type in self.item_type_to_topic_mapping:
topic_path = self.item_type_to_topic_mapping.get(item_type)
data = json.dumps(item).encode('utf-8')
ordering_key = 'all' if self.enable_message_ordering else ''
message_future = self.publisher.publish(topic_path, data=data, ordering_key=ordering_key, **self.get_message_attributes(item))
return message_future
else:
logging.warning('Topic for item type "{}" is not configured.'.format(item_type))
def get_message_attributes(self, item):
attributes = {}
for attr_name in self.message_attributes:
if item.get(attr_name) is not None:
attributes[attr_name] = str(item.get(attr_name))
return attributes
def create_publisher(self):
batch_settings = pubsub_v1.types.BatchSettings(
max_bytes=self.batch_max_bytes,
max_latency=self.batch_max_latency,
max_messages=self.batch_max_messages,
)
publisher_options = pubsub_v1.types.PublisherOptions(enable_message_ordering=self.enable_message_ordering)
return pubsub_v1.PublisherClient(batch_settings=batch_settings, publisher_options=publisher_options)
def close(self):
pass

View File

@@ -0,0 +1,44 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
class InMemoryItemExporter:
def __init__(self, item_types):
self.item_types = item_types
self.items = {}
def open(self):
for item_type in self.item_types:
self.items[item_type] = []
def export_item(self, item):
item_type = item.get('type', None)
if item_type is None:
raise ValueError('type key is not found in item {}'.format(repr(item)))
self.items[item_type].append(item)
def close(self):
pass
def get_items(self, item_type):
return self.items[item_type]

View File

@@ -0,0 +1,54 @@
import collections
import json
import logging
from kafka import KafkaProducer
from blockchainetl.jobs.exporters.converters.composite_item_converter import CompositeItemConverter
class KafkaItemExporter:
def __init__(self, output, item_type_to_topic_mapping, converters=()):
self.item_type_to_topic_mapping = item_type_to_topic_mapping
self.converter = CompositeItemConverter(converters)
self.connection_url = self.get_connection_url(output)
print(self.connection_url)
self.producer = KafkaProducer(bootstrap_servers=self.connection_url)
def get_connection_url(self, output):
try:
return output.split('/')[1]
except KeyError:
raise Exception('Invalid kafka output param, It should be in format of "kafka/127.0.0.1:9092"')
def open(self):
pass
def export_items(self, items):
for item in items:
self.export_item(item)
def export_item(self, item):
item_type = item.get('type')
if item_type is not None and item_type in self.item_type_to_topic_mapping:
data = json.dumps(item).encode('utf-8')
logging.debug(data)
return self.producer.send(self.item_type_to_topic_mapping[item_type], value=data)
else:
logging.warning('Topic for item type "{}" is not configured.'.format(item_type))
def convert_items(self, items):
for item in items:
yield self.converter.convert_item(item)
def close(self):
pass
def group_by_item_type(items):
result = collections.defaultdict(list)
for item in items:
result[item.get('type')].append(item)
return result

View File

@@ -0,0 +1,82 @@
# MIT License
#
# Copyright (c) 2022 CoinStats LLC
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
import typing as t
import uuid
from itertools import zip_longest
import boto3
_KINESIS_BATCH_LIMIT = 500
def _uuid_partition_key(_: dict) -> str:
return uuid.uuid4().hex
class KinesisItemExporter:
def __init__(
self,
stream_name: str,
partition_key_callable: t.Callable[[dict], str] = _uuid_partition_key,
):
import boto3
self._stream_name = stream_name
self._partition_key_callable = partition_key_callable
self._kinesis_client = None # initialized in .open
def open(self) -> None:
self._kinesis_client = boto3.client('kinesis')
def export_items(self, items: t.Iterable[dict]) -> None:
sentinel = object()
chunks = zip_longest(
*(iter(items),) * _KINESIS_BATCH_LIMIT,
fillvalue=sentinel,
)
for chunk in chunks:
self._kinesis_client.put_records(
StreamName=self._stream_name,
Records=[
{
'Data': _serialize_item(item),
'PartitionKey': self._partition_key_callable(item),
}
for item in chunk
if item is not sentinel
],
)
def export_item(self, item: dict) -> None:
self._kinesis_client.put_record(
StreamName=self._stream_name,
Data=_serialize_item(item),
PartitionKey=self._partition_key_callable(item),
)
def close(self):
pass
def _serialize_item(item: dict) -> bytes:
return json.dumps(item).encode()

View File

@@ -0,0 +1,42 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
class MultiItemExporter:
def __init__(self, item_exporters):
self.item_exporters = item_exporters
def open(self):
for exporter in self.item_exporters:
exporter.open()
def export_items(self, items):
for exporter in self.item_exporters:
exporter.export_items(items)
def export_item(self, item):
for exporter in self.item_exporters:
exporter.export_item(item)
def close(self):
for exporter in self.item_exporters:
exporter.close()

View File

@@ -0,0 +1,70 @@
# MIT License
#
# Copyright (c) 2020 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import collections
from sqlalchemy import create_engine
from blockchainetl.jobs.exporters.converters.composite_item_converter import CompositeItemConverter
class PostgresItemExporter:
def __init__(self, connection_url, item_type_to_insert_stmt_mapping, converters=(), print_sql=True):
self.connection_url = connection_url
self.item_type_to_insert_stmt_mapping = item_type_to_insert_stmt_mapping
self.converter = CompositeItemConverter(converters)
self.print_sql = print_sql
self.engine = self.create_engine()
def open(self):
pass
def export_items(self, items):
items_grouped_by_type = group_by_item_type(items)
for item_type, insert_stmt in self.item_type_to_insert_stmt_mapping.items():
item_group = items_grouped_by_type.get(item_type)
if item_group:
connection = self.engine.connect()
converted_items = list(self.convert_items(item_group))
connection.execute(insert_stmt, converted_items)
def convert_items(self, items):
for item in items:
yield self.converter.convert_item(item)
def create_engine(self):
engine = create_engine(self.connection_url, echo=self.print_sql, pool_recycle=3600)
return engine
def close(self):
pass
def group_by_item_type(items):
result = collections.defaultdict(list)
for item in items:
result[item.get('type')].append(item)
return result

View File

@@ -0,0 +1,11 @@
import logging
def logging_basic_config(filename=None):
format = '%(asctime)s - %(name)s [%(levelname)s] - %(message)s'
if filename is not None:
logging.basicConfig(level=logging.INFO, format=format, filename=filename)
else:
logging.basicConfig(level=logging.INFO, format=format)
logging.getLogger('ethereum_dasm.evmdasm').setLevel(logging.ERROR)

View File

@@ -0,0 +1,23 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

View File

@@ -0,0 +1,16 @@
from sqlalchemy.dialects.postgresql import insert
def create_insert_statement_for_table(table):
insert_stmt = insert(table)
primary_key_fields = [column.name for column in table.columns if column.primary_key]
if primary_key_fields:
insert_stmt = insert_stmt.on_conflict_do_update(
index_elements=primary_key_fields,
set_={
column.name: insert_stmt.excluded[column.name] for column in table.columns if not column.primary_key
}
)
return insert_stmt

View File

@@ -0,0 +1,139 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import logging
import os
import time
from blockchainetl.streaming.streamer_adapter_stub import StreamerAdapterStub
from blockchainetl.file_utils import smart_open
class Streamer:
def __init__(
self,
blockchain_streamer_adapter=StreamerAdapterStub(),
last_synced_block_file='last_synced_block.txt',
lag=0,
start_block=None,
end_block=None,
period_seconds=10,
block_batch_size=10,
retry_errors=True,
pid_file=None):
self.blockchain_streamer_adapter = blockchain_streamer_adapter
self.last_synced_block_file = last_synced_block_file
self.lag = lag
self.start_block = start_block
self.end_block = end_block
self.period_seconds = period_seconds
self.block_batch_size = block_batch_size
self.retry_errors = retry_errors
self.pid_file = pid_file
if self.start_block is not None or not os.path.isfile(self.last_synced_block_file):
init_last_synced_block_file((self.start_block or 0) - 1, self.last_synced_block_file)
self.last_synced_block = read_last_synced_block(self.last_synced_block_file)
def stream(self):
try:
if self.pid_file is not None:
logging.info('Creating pid file {}'.format(self.pid_file))
write_to_file(self.pid_file, str(os.getpid()))
self.blockchain_streamer_adapter.open()
self._do_stream()
finally:
self.blockchain_streamer_adapter.close()
if self.pid_file is not None:
logging.info('Deleting pid file {}'.format(self.pid_file))
delete_file(self.pid_file)
def _do_stream(self):
while True and (self.end_block is None or self.last_synced_block < self.end_block):
synced_blocks = 0
try:
synced_blocks = self._sync_cycle()
except Exception as e:
# https://stackoverflow.com/a/4992124/1580227
logging.exception('An exception occurred while syncing block data.')
if not self.retry_errors:
raise e
if synced_blocks <= 0:
logging.info('Nothing to sync. Sleeping for {} seconds...'.format(self.period_seconds))
time.sleep(self.period_seconds)
def _sync_cycle(self):
current_block = self.blockchain_streamer_adapter.get_current_block_number()
target_block = self._calculate_target_block(current_block, self.last_synced_block)
blocks_to_sync = max(target_block - self.last_synced_block, 0)
logging.info('Current block {}, target block {}, last synced block {}, blocks to sync {}'.format(
current_block, target_block, self.last_synced_block, blocks_to_sync))
if blocks_to_sync != 0:
self.blockchain_streamer_adapter.export_all(self.last_synced_block + 1, target_block)
logging.info('Writing last synced block {}'.format(target_block))
write_last_synced_block(self.last_synced_block_file, target_block)
self.last_synced_block = target_block
return blocks_to_sync
def _calculate_target_block(self, current_block, last_synced_block):
target_block = current_block - self.lag
target_block = min(target_block, last_synced_block + self.block_batch_size)
target_block = min(target_block, self.end_block) if self.end_block is not None else target_block
return target_block
def delete_file(file):
try:
os.remove(file)
except OSError:
pass
def write_last_synced_block(file, last_synced_block):
write_to_file(file, str(last_synced_block) + '\n')
def init_last_synced_block_file(start_block, last_synced_block_file):
if os.path.isfile(last_synced_block_file):
raise ValueError(
'{} should not exist if --start-block option is specified. '
'Either remove the {} file or the --start-block option.'
.format(last_synced_block_file, last_synced_block_file))
write_last_synced_block(last_synced_block_file, start_block)
def read_last_synced_block(file):
with smart_open(file, 'r') as last_synced_block_file:
return int(last_synced_block_file.read())
def write_to_file(file, content):
with smart_open(file, 'w') as file_handle:
file_handle.write(content)

View File

@@ -0,0 +1,13 @@
class StreamerAdapterStub:
def open(self):
pass
def get_current_block_number(self):
return 0
def export_all(self, start_block, end_block):
pass
def close(self):
pass

View File

@@ -0,0 +1,19 @@
import logging
import signal
import sys
from blockchainetl.logging_utils import logging_basic_config
def configure_signals():
def sigterm_handler(_signo, _stack_frame):
# Raises SystemExit(0):
sys.exit(0)
signal.signal(signal.SIGTERM, sigterm_handler)
def configure_logging(filename):
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
logging_basic_config(filename=filename)

42
docs/amazon-athena.md Normal file
View File

@@ -0,0 +1,42 @@
# Amazon Athena
## Querying in Amazon Athena
- Upload the files to S3:
```bash
> cd output
> aws s3 sync . s3://<your_bucket>/ethereumetl/export --region ap-southeast-1
```
- Sign in to Athena https://console.aws.amazon.com/athena/home
- Create a database:
```sql
CREATE DATABASE ethereumetl;
```
- Create the tables:
- blocks: [schemas/aws/blocks.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/blocks.sql)
- transactions: [schemas/aws/transactions.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/transactions.sql)
- token_transfers: [schemas/aws/token_transfers.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/token_transfers.sql)
- contracts: [schemas/aws/contracts.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/contracts.sql)
- receipts: [schemas/aws/receipts.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/receipts.sql)
- logs: [schemas/aws/logs.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/logs.sql)
- tokens: [schemas/aws/tokens.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/tokens.sql)
## Airflow DAGs
Refer to https://github.com/medvedev1088/ethereum-etl-airflow for the instructions.
## Tables for Parquet Files
Read [this article](https://medium.com/@medvedev1088/converting-ethereum-etl-files-to-parquet-399e048ddd30) on how to convert CSVs to Parquet.
- Create the tables:
- parquet_blocks: [schemas/aws/parquet/parquet_blocks.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/parquet/parquet_blocks.sql)
- parquet_transactions: [schemas/aws/parquet/parquet_transactions.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/parquet/parquet_transactions.sql)
- parquet_token_transfers: [schemas/aws/parquet/parquet_token_transfers.sql](https://github.com/blockchain-etl/ethereum-etl/blob/master/schemas/aws/parquet/parquet_token_transfers.sql)
Note that [DECIMAL type is limited to 38 digits in Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-decimal) so values greater than 38 decimals will be null.

10
docs/citing.md Normal file
View File

@@ -0,0 +1,10 @@
## How to Cite
```
@misc{ethereumetl,
author = {Evgeny Medvedev and the D5 team},
title = {Ethereum ETL},
year = {2018},
url = {https://github.com/blockchain-etl/ethereum-etl}
}
```

246
docs/commands.md Normal file
View File

@@ -0,0 +1,246 @@
# Commands
All the commands accept `-h` parameter for help, e.g.:
```bash
> ethereumetl export_blocks_and_transactions -h
Usage: ethereumetl export_blocks_and_transactions [OPTIONS]
Export blocks and transactions.
Options:
-s, --start-block INTEGER Start block
-e, --end-block INTEGER End block [required]
-b, --batch-size INTEGER The number of blocks to export at a time.
-p, --provider-uri TEXT The URI of the web3 provider e.g.
file://$HOME/Library/Ethereum/geth.ipc or
https://mainnet.infura.io
-w, --max-workers INTEGER The maximum number of workers.
--blocks-output TEXT The output file for blocks. If not provided
blocks will not be exported. Use "-" for stdout
--transactions-output TEXT The output file for transactions. If not
provided transactions will not be exported. Use
"-" for stdout
-h, --help Show this message and exit.
```
For the `--output` parameters the supported types are csv and json. The format type is inferred from the output file name.
#### export_blocks_and_transactions
```bash
> ethereumetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc \
--blocks-output blocks.csv --transactions-output transactions.csv
```
Omit `--blocks-output` or `--transactions-output` options if you want to export only transactions/blocks.
You can tune `--batch-size`, `--max-workers` for performance.
[Blocks and transactions schema](schema.md#blockscsv).
#### export_token_transfers
The API used in this command is not supported by Infura, so you will need a local node.
If you want to use Infura for exporting ERC20 transfers refer to [extract_token_transfers](#extract_token_transfers)
```bash
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output token_transfers.csv
```
Include `--tokens <token1> --tokens <token2>` to filter only certain tokens, e.g.
```bash
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv \
--tokens 0x1F573D6Fb3F13d689FF844B4cE37794d79a7FF1C --tokens 0x80fB784B7eD66730e8b1DBd9820aFD29931aab03
```
You can tune `--batch-size`, `--max-workers` for performance.
[Token transfers schema](schema.md#token_transferscsv).
#### export_receipts_and_logs
First extract transaction hashes from `transactions.csv`
(Exported with [export_blocks_and_transactions](#export_blocks_and_transactions)):
```bash
> ethereumetl extract_csv_column --input transactions.csv --column hash --output transaction_hashes.txt
```
Then export receipts and logs:
```bash
> ethereumetl export_receipts_and_logs --transaction-hashes transaction_hashes.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --receipts-output receipts.csv --logs-output logs.csv
```
Omit `--receipts-output` or `--logs-output` options if you want to export only logs/receipts.
You can tune `--batch-size`, `--max-workers` for performance.
Upvote this feature request https://github.com/paritytech/parity/issues/9075,
it will make receipts and logs export much faster.
[Receipts and logs schema](schema.md#receiptscsv).
#### extract_token_transfers
First export receipt logs with [export_receipts_and_logs](#export_receipts_and_logs).
Then extract transfers from the logs.csv file:
```bash
> ethereumetl extract_token_transfers --logs logs.csv --output token_transfers.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
[Token transfers schema](schema.md#token_transferscsv).
#### export_contracts
First extract contract addresses from `receipts.csv`
(Exported with [export_receipts_and_logs](#export_receipts_and_logs)):
```bash
> ethereumetl extract_csv_column --input receipts.csv --column contract_address --output contract_addresses.txt
```
Then export contracts:
```bash
> ethereumetl export_contracts --contract-addresses contract_addresses.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output contracts.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
[Contracts schema](schema.md#contractscsv).
#### export_tokens
First extract token addresses from `contracts.json`
(Exported with [export_contracts](#export_contracts)):
```bash
> ethereumetl filter_items -i contracts.json -p "item['is_erc20'] or item['is_erc721']" | \
ethereumetl extract_field -f address -o token_addresses.txt
```
Then export ERC20 / ERC721 tokens:
```bash
> ethereumetl export_tokens --token-addresses token_addresses.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output tokens.csv
```
You can tune `--max-workers` for performance.
[Tokens schema](schema.md#tokenscsv).
#### export_traces
Also called internal transactions.
The API used in this command is not supported by Infura,
so you will need a local Parity archive node (`parity --tracing on`).
Make sure your node has at least 8GB of memory, or else you will face timeout errors.
See [this issue](https://github.com/blockchain-etl/ethereum-etl/issues/137)
```bash
> ethereumetl export_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/parity.ipc --batch-size 100 --output traces.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
[Traces schema](schema.md#tracescsv).
#### export_geth_traces
Read [Differences between geth and parity traces.csv](schema.md#differences-between-geth-and-parity-tracescsv)
The API used in this command is not supported by Infura,
so you will need a local Geth archive node (`geth --gcmode archive --syncmode full --txlookuplimit 0`).
When using rpc, add `--rpc --rpcapi debug` options.
```bash
> ethereumetl export_geth_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output geth_traces.json
```
You can tune `--batch-size`, `--max-workers` for performance.
#### extract_geth_traces
```bash
> ethereumetl extract_geth_traces --input geth_traces.json --output traces.csv
```
You can tune `--batch-size`, `--max-workers` for performance.
#### get_block_range_for_date
```bash
> ethereumetl get_block_range_for_date --provider-uri=https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c --date 2018-01-01
4832686,4838611
```
#### get_keccak_hash
```bash
> ethereumetl get_keccak_hash -i "transfer(address,uint256)"
0xa9059cbb2ab09eb219583f4a59a5d0623ade346d962bcd4e46b11da047c9049b
```
#### stream
```bash
> pip3 install ethereum-etl[streaming]
> ethereumetl stream --provider-uri https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c --start-block 500000
```
- This command outputs blocks, transactions, logs, token_transfers to the console by default.
- Entity types can be specified with the `-e` option,
e.g. `-e block,transaction,log,token_transfer,trace,contract,token`.
- Use `--output` option to specify the Google Pub/Sub topic, Postgres database or GCS bucket where to publish blockchain data,
- For Google PubSub: `--output=projects/<your-project>/topics/crypto_ethereum`.
Data will be pushed to `projects/<your-project>/topics/crypto_ethereum.blocks`, `projects/<your-project>/topics/crypto_ethereum.transactions` etc. topics.
- For Postgres: `--output=postgresql+pg8000://<user>:<password>@<host>:<port>/<database_name>`,
e.g. `--output=postgresql+pg8000://postgres:admin@127.0.0.1:5432/ethereum`.
- For GCS: `--output=gs://<bucket_name>`. Make sure to install and initialize `gcloud` cli.
- For Kafka: `--output=kafka/<host>:<port>`, e.g. `--output=kafka/127.0.0.1:9092`
- Those output types can be combined with a comma e.g. `--output=gs://<bucket_name>,projects/<your-project>/topics/crypto_ethereum`
The [schema](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/schema)
and [indexes](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/indexes) can be found in this
repo [ethereum-etl-postgres](https://github.com/blockchain-etl/ethereum-etl-postgres).
- The command saves its state to `last_synced_block.txt` file where the last synced block number is saved periodically.
- Specify either `--start-block` or `--last-synced-block-file` option. `--last-synced-block-file` should point to the
file where the block number, from which to start streaming the blockchain data, is saved.
- Use the `--lag` option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to
handle chain reorganizations - they are less likely the further a block from the head.
- You can tune `--period-seconds`, `--batch-size`, `--block-batch-size`, `--max-workers` for performance.
- Refer to [blockchain-etl-streaming](https://github.com/blockchain-etl/blockchain-etl-streaming) for
instructions on deploying it to Kubernetes.
Stream blockchain data continually to Google Pub/Sub:
```bash
> export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json
> ethereumetl stream --start-block 500000 --output projects/<your-project>/topics/crypto_ethereum
```
Stream blockchain data to a Postgres database:
```bash
ethereumetl stream --start-block 500000 --output postgresql+pg8000://<user>:<password>@<host>:5432/<database>
```
The [schema](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/schema)
and [indexes](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/indexes) can be found in this
repo [ethereum-etl-postgres](https://github.com/blockchain-etl/ethereum-etl-postgres).

3
docs/contact.md Normal file
View File

@@ -0,0 +1,3 @@
# Contact
- [Telegram Group](https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ)

11
docs/dockerhub.md Normal file
View File

@@ -0,0 +1,11 @@
# Uploading to Docker Hub
```bash
ETHEREUMETL_VERSION=1.11.0
docker build -t ethereum-etl:${ETHEREUMETL_VERSION} -f Dockerfile .
docker tag ethereum-etl:${ETHEREUMETL_VERSION} blockchainetl/ethereum-etl:${ETHEREUMETL_VERSION}
docker push blockchainetl/ethereum-etl:${ETHEREUMETL_VERSION}
docker tag ethereum-etl:${ETHEREUMETL_VERSION} blockchainetl/ethereum-etl:latest
docker push blockchainetl/ethereum-etl:latest
```

4
docs/ethereum-classic.md Normal file
View File

@@ -0,0 +1,4 @@
# Ethereum Classic
For getting ETC csv files, make sure you pass in the `--chain classic` param where it's required for the scripts you want to export.
ETC won't run if your `--provider-uri` is Infura. It will provide a warning and change the provider-uri to `https://ethereumclassic.network` instead. For faster performance, run a client instead locally for classic such as `parity chain=classic` and Geth-classic.

View File

@@ -0,0 +1,51 @@
## Exporting the Blockchain
1. Install python 3.5.3+: [https://www.python.org/downloads/](https://www.python.org/downloads/)
1. You can use Infura if you don't need ERC20 transfers (Infura doesn't support eth_getFilterLogs JSON RPC method).
For that use `-p https://mainnet.infura.io` option for the commands below. If you need ERC20 transfers or want to
export the data ~40 times faster, you will need to set up a local Ethereum node:
1. Install geth: [https://github.com/ethereum/go-ethereum/wiki/Installing-Geth](https://github.com/ethereum/go-ethereum/wiki/Installing-Geth)
1. Start geth.
Make sure it downloaded the blocks that you need by executing `eth.syncing` in the JS console.
You can export blocks below `currentBlock`,
there is no need to wait until the full sync as the state is not needed (unless you also need contracts bytecode
and token details; for those you need to wait until the full sync). Note that you may need to wait for another day or
two for the node to download the states. See this issue [https://github.com/blockchain-etl/ethereum-etl/issues/265#issuecomment-970451522](https://github.com/blockchain-etl/ethereum-etl/issues/265#issuecomment-970451522).
Make sure to set `--txlookuplimit 0` if you use geth.
1. Install Ethereum ETL: `> pip3 install ethereum-etl`
1. Export all:
```bash
> ethereumetl export_all --help
> ethereumetl export_all -s 0 -e 5999999 -b 100000 -p file://$HOME/Library/Ethereum/geth.ipc -o output
```
In case `ethereumetl` command is not available in PATH, use `python3 -m ethereumetl` instead.
The result will be in the `output` subdirectory, partitioned in Hive style:
```bash
output/blocks/start_block=00000000/end_block=00099999/blocks_00000000_00099999.csv
output/blocks/start_block=00100000/end_block=00199999/blocks_00100000_00199999.csv
...
output/transactions/start_block=00000000/end_block=00099999/transactions_00000000_00099999.csv
...
output/token_transfers/start_block=00000000/end_block=00099999/token_transfers_00000000_00099999.csv
...
```
Should work with geth and parity, on Linux, Mac, Windows.
If you use Parity you should disable warp mode with `--no-warp` option because warp mode
does not place all of the block or receipt data into the database [https://wiki.parity.io/Getting-Synced](https://wiki.parity.io/Getting-Synced)
If you see weird behavior, e.g. wrong number of rows in the CSV files or corrupted files,
check out this issue: https://github.com/medvedev1088/ethereum-etl/issues/28
### Export in 2 Hours
You can use AWS Auto Scaling and Data Pipeline to reduce the exporting time to a few hours.
Read [this article](https://medium.com/@medvedev1088/how-to-export-the-entire-ethereum-blockchain-to-csv-in-2-hours-for-10-69fef511e9a2) for details.

19
docs/google-bigquery.md Normal file
View File

@@ -0,0 +1,19 @@
# Google BigQuery
## Querying in BigQuery
If you'd rather not export the blockchain data yourself, we publish all tables as a public dataset in [BigQuery](https://medium.com/@medvedev1088/ethereum-blockchain-on-google-bigquery-283fb300f579).
Data is updated near real-time (~4-minute delay to account for block finality).
### How to Query Balances for all Ethereum Addresses
Read [this article](https://medium.com/google-cloud/how-to-query-balances-for-all-ethereum-addresses-in-bigquery-fb594e4034a7).
### Building Token Recommender in Google Cloud Platform
Read [this article](https://medium.com/google-cloud/building-token-recommender-in-google-cloud-platform-1be5a54698eb).
### Awesome BigQuery Views
[https://github.com/blockchain-etl/awesome-bigquery-views](https://github.com/blockchain-etl/awesome-bigquery-views)

47
docs/index.md Normal file
View File

@@ -0,0 +1,47 @@
# Overview
Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
With 1,700+ likes on GitHub, Ethereum ETL is the most popular open-source project for Ethereum data.
Data is available for you to query right away in [Google BigQuery](https://goo.gl/oY5BCQ).
## Features
Easily export:
* Blocks
* Transactions
* ERC20 / ERC721 tokens
* Token transfers
* Receipts
* Logs
* Contracts
* Internal transactions (traces)
## Advanced Features
* Stream blockchain data to Pub/Sub, Postgres, or other destinations in real-time
* Filter and transform data using flexible command-line options
* Support for multiple Ethereum node providers (Geth, Parity, Infura, etc.)
* Handles chain reorganizations through configurable lag
* Export data by block range or by date
* Scalable architecture with configurable batch sizes and worker counts
## Use Cases
* Data analysis and visualization
* Machine learning on blockchain data
* Building analytics dashboards
* Market research and token analysis
* Compliance and audit reporting
* Academic research on blockchain economics
## Projects using Ethereum ETL
* [Google](https://goo.gl/oY5BCQ) - Public BigQuery Ethereum datasets
* [Nansen](https://nansen.ai/query?ref=ethereumetl) - Analytics platform for Ethereum
* [Ethereum Blockchain ETL on GCP](https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics) - Official Google Cloud reference architecture
## Getting Started
Check the [Quickstart](quickstart.md) guide to begin using Ethereum ETL or explore the [Commands](commands.md) page for detailed usage instructions.

15
docs/limitations.md Normal file
View File

@@ -0,0 +1,15 @@
# Limitation
- In case the contract is a proxy, which forwards all calls to a delegate, interface detection doesnt work,
which means `is_erc20` and `is_erc721` will always be false for proxy contracts and they will be missing in the `tokens`
table.
- The metadata methods (`symbol`, `name`, `decimals`, `total_supply`) for ERC20 are optional, so around 10% of the
contracts are missing this data. Also some contracts (EOS) implement these methods but with the wrong return type,
so the metadata columns are missing in this case as well.
- `token_transfers.value`, `tokens.decimals` and `tokens.total_supply` have type `STRING` in BigQuery tables,
because numeric types there can't handle 32-byte integers. You should use
`cast(value as FLOAT64)` (possible loss of precision) or
`safe_cast(value as NUMERIC)` (possible overflow) to convert to numbers.
- The contracts that don't implement `decimals()` function but have the
[fallback function](https://solidity.readthedocs.io/en/v0.4.21/contracts.html#fallback-function) that returns a `boolean`
will have `0` or `1` in the `decimals` column in the CSVs.

10
docs/media.md Normal file
View File

@@ -0,0 +1,10 @@
## Ethereum ETL in the Media
- [A Technical Breakdown Of Google's New Blockchain Search Tools](https://www.forbes.com/sites/michaeldelcastillo/2019/02/05/google-launches-search-for-bitcoin-ethereum-bitcoin-cash-dash-dogecoin-ethereum-classic-litecoin-and-zcash/#394fc868c789)
- [Navigating Bitcoin, Ethereum, XRP: How Google Is Quietly Making Blockchains Searchable](https://www.forbes.com/sites/michaeldelcastillo/2019/02/04/navigating-bitcoin-ethereum-xrp-how-google-is-quietly-making-blockchains-searchable/?ss=crypto-blockchain#49e111da4248)
- [Ethereum in BigQuery: a Public Dataset for smart contract analytics](https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics)
- [Ethereum in BigQuery:how we built this dataset](https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset)
- [Introducing six new cryptocurrencies in BigQuery Public Datasets—and how to analyze them](https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them)
- [Querying the Ethereum Blockchain in Snowflake](https://community.snowflake.com/s/article/Querying-the-Ethereum-Blockchain-in-Snowflake)
- [ConsenSys Grants funds third cohort of projects to benefit the Ethereum ecosystem](https://www.cryptoninjas.net/2020/02/17/consensys-grants-funds-third-cohort-of-projects-to-benefit-the-ethereum-ecosystem/)
- [Unlocking the Power of Google BigQuery (Cloud Next '19)](https://youtu.be/KL_i5XZIaJg?t=131)

29
docs/pypi.md Normal file
View File

@@ -0,0 +1,29 @@
# Uploading to PYPI
Create `$HOME/.pypirc` with the following content:
```
[distutils]
index-servers=
testpypi
pypi
[testpypi]
repository = https://test.pypi.org/legacy/
username = <username>
password = <password>
[pypi]
repository = https://upload.pypi.org/legacy/
username = <username>
password = <password>
```
Then run:
```bash
> python setup.py sdist
> twine upload dist/* -r testpypi
> pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ethereum-etl
```

45
docs/quickstart.md Normal file
View File

@@ -0,0 +1,45 @@
# Quickstart
Install Ethereum ETL:
```bash
pip3 install ethereum-etl
```
Export blocks and transactions:
```bash
> ethereumetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--provider-uri https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c --blocks-output blocks.csv --transactions-output transactions.csv
```
Export ERC20 and ERC721 transfers:
```bash
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv
```
Export traces:
```bash
> ethereumetl export_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/parity.ipc --output traces.csv
```
Stream blocks, transactions, logs, token_transfers continually to console:
```bash
> pip3 install ethereum-etl[streaming]
> ethereumetl stream --start-block 500000 -e block,transaction,log,token_transfer --log-file log.txt
```
Find all commands [here](commands.md).
---
To run the latest version of Ethereum ETL, check out the repo and call
```bash
> pip3 install -e .
> python3 ethereumetl.py
```

167
docs/schema.md Normal file
View File

@@ -0,0 +1,167 @@
# Schema
## blocks.csv
Column | Type |
------------------|--------------------|
number | bigint |
hash | hex_string |
parent_hash | hex_string |
nonce | hex_string |
sha3_uncles | hex_string |
logs_bloom | hex_string |
transactions_root | hex_string |
state_root | hex_string |
receipts_root | hex_string |
miner | address |
difficulty | numeric |
total_difficulty | numeric |
size | bigint |
extra_data | hex_string |
gas_limit | bigint |
gas_used | bigint |
timestamp | bigint |
transaction_count | bigint |
base_fee_per_gas | bigint |
withdrawals_root | string |
withdrawals | string |
blob_gas_used | bigint |
excess_blob_gas | bigint |
---
## transactions.csv
Column | Type |
-----------------|-------------|
hash | hex_string |
nonce | bigint |
block_hash | hex_string |
block_number | bigint |
transaction_index| bigint |
from_address | address |
to_address | address |
value | numeric |
gas | bigint |
gas_price | bigint |
input | hex_string |
block_timestamp | bigint |
max_fee_per_gas | bigint |
max_priority_fee_per_gas | bigint |
transaction_type | bigint |
max_fee_per_blob_gas | bigint |
blob_versioned_hashes | string |
---
## token_transfers.csv
Column | Type |
--------------------|-------------|
token_address | address |
from_address | address |
to_address | address |
value | numeric |
transaction_hash | hex_string |
log_index | bigint |
block_number | bigint |
---
## receipts.csv
Column | Type |
-----------------------------|-------------|
transaction_hash | hex_string |
transaction_index | bigint |
block_hash | hex_string |
block_number | bigint |
cumulative_gas_used | bigint |
gas_used | bigint |
contract_address | address |
root | hex_string |
status | bigint |
effective_gas_price | bigint |
blob_gas_price | bigint |
blob_gas_used | bigint |
---
## logs.csv
Column | Type |
-------------------------|-------------|
log_index | bigint |
transaction_hash | hex_string |
transaction_index | bigint |
block_hash | hex_string |
block_number | bigint |
address | address |
data | hex_string |
topics | string |
---
## contracts.csv
Column | Type |
-----------------------------|-------------|
address | address |
bytecode | hex_string |
function_sighashes | string |
is_erc20 | boolean |
is_erc721 | boolean |
block_number | bigint |
---
## tokens.csv
Column | Type |
-----------------------------|-------------|
address | address |
symbol | string |
name | string |
decimals | bigint |
total_supply | numeric |
block_number | bigint |
---
## traces.csv
Column | Type |
-----------------------------|-------------|
block_number | bigint |
transaction_hash | hex_string |
transaction_index | bigint |
from_address | address |
to_address | address |
value | numeric |
input | hex_string |
output | hex_string |
trace_type | string |
call_type | string |
reward_type | string |
gas | bigint |
gas_used | bigint |
subtraces | bigint |
trace_address | string |
error | string |
status | bigint |
trace_id | string |
### Differences between geth and parity traces.csv
- `to_address` field differs for `callcode` trace (geth seems to return correct value, as parity value of `to_address` is the same as `to_address` of parent call);
- geth output doesn't have `reward` traces;
- geth output doesn't have `to_address`, `from_address`, `value` for `suicide` traces;
- `error` field contains human readable error message, which might differ in geth/parity output;
- geth output doesn't have `transaction_hash`;
- `gas_used` is 0 on traces with error in geth, empty in parity;
- zero output of subcalls is `0x000...` in geth, `0x` in parity;
You can find column descriptions in [https://github.com/medvedev1088/ethereum-etl-airflow](https://github.com/medvedev1088/ethereum-etl-airflow/tree/master/dags/resources/stages/raw/schemas)
Note: for the `address` type all hex characters are lower-cased.
`boolean` type can have 2 values: `True` or `False`.

View File

@@ -1 +0,0 @@
from . import evmdasm

View File

@@ -1,582 +0,0 @@
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author : <github.com/tintinweb>
# from future import print_function
"""
Verbose EthereumVM Disassembler
OPCODES taken from:
https://github.com/ethereum/go-ethereum/blob/master/core/vm/opcodes.go
https://github.com/ethereum/yellowpaper/blob/master/Paper.tex
"""
import logging
import sys
import os
import itertools
import time
import requests
try:
import ethereum_input_decoder
except ImportError:
ethereum_input_decoder = None
logger = logging.getLogger(__name__)
def hex_decode(s):
try:
return bytes.fromhex(s).decode('ascii')
except (NameError, AttributeError):
return s.decode("hex")
except (UnicodeDecodeError):
return '' #invalid
def is_ascii_subsequence(s, min_percent=0.51):
if len(s) == 0:
return False
return [128 > ord(c) > 0x20 for c in s].count(True) / float(len(s)) >= min_percent
cache_lookup_function_signature = {} # memcache for lookkup_function_signature
def lookup_function_signature(sighash):
if not ethereum_input_decoder:
return []
cache_hit = cache_lookup_function_signature.get(sighash)
if cache_hit:
return cache_hit
cache_lookup_function_signature[sighash] = list(ethereum_input_decoder.decoder.FourByteDirectory.lookup_signatures(sighash))
return cache_lookup_function_signature[sighash]
class EthJsonRpc(object):
def __init__(self, url):
self.url = url
self.id = 1
self.session = requests.session()
def call(self, method, params=None):
params = params or []
data = {
'jsonrpc': '2.0',
'method': method,
'params': params,
'id': self.id,
}
headers = {'Content-Type': 'application/json'}
resp = self.session.post(self.url, headers=headers, json=data)
self.id += 1
return resp.json()
class BasicBlock(object):
def __init__(self, address=None, name=None, instructions=None):
self.instructions = instructions or []
self.address = address
self.name = name
def __repr__(self):
return "<BasicBlock 0x%x instructions:%d>" % (self.address, len(self.instructions))
class Instruction(object):
""" Base Instruction class
doubly linked
"""
def __init__(self, opcode, name, length_of_operand=0, description=None):
self.opcode, self.name, self.length_of_operand = opcode, name, length_of_operand
self.operand = ''
self.description = description
self.address = None
self.next = None
self.previous = None
self.xrefs = set([])
self.jumpto = None
self.basicblock = None
def __repr__(self):
return "<%s name=%s address=%s size=%d>" % (self.__class__.__name__, self.name, hex(self.address), self.size())
def __str__(self):
return "%s %s" % (self.name, "0x%s" % self.operand if self.operand else '')
def size(self):
return 1 + len(self.operand) // 2 # opcode + operand
def consume(self, bytecode):
# clone
m = Instruction(opcode=self.opcode,
name=self.name,
length_of_operand=self.length_of_operand,
description=self.description)
# consume
m.operand = ''.join('%0.2x' % _ for _ in itertools.islice(bytecode, m.length_of_operand))
return m
def serialize(self):
return '%0.2x' % self.opcode + self.operand
def describe_operand(self, resolve_funcsig=False):
if not self.operand:
str_operand = ''
elif resolve_funcsig and len(self.operand) == 8 and self.address < 0x100:
# speed improvment: its very unlikely that there will be funcsigs after addr 400
# 4bytes, could be a func-sig
pot_funcsigs = lookup_function_signature(self.operand)
if len(pot_funcsigs) == 0:
ascii = ''
elif len(pot_funcsigs) == 1:
ascii = ' (\'function %s\')' % pot_funcsigs[0]
else:
ascii = ' (*ambiguous* \'function %s\')' % pot_funcsigs[0]
str_operand = "0x%s%s" % (self.operand, ascii)
elif len(self.operand) > 8:
ascii = ' (%r)' % hex_decode(self.operand) \
if self.operand and is_ascii_subsequence(hex_decode(self.operand)) else ''
str_operand = "0x%s%s" % (self.operand, ascii)
else:
ascii = ''
str_operand = "0x%s%s" % (self.operand, ascii)
extra = "@%s" % hex(self.jumpto) if self.jumpto else ''
return "%s%s" % (str_operand, extra)
OPCODES = [
# Stop and Arithmetic Operations
Instruction(opcode=0x00, name='STOP', description="Halts execution."),
Instruction(opcode=0x01, name='ADD', description="Addition operation."),
Instruction(opcode=0x02, name='MUL', description="Multiplication operation."),
Instruction(opcode=0x03, name='SUB', description="Subtraction operation."),
Instruction(opcode=0x04, name='DIV', description="Integer division operation."),
Instruction(opcode=0x05, name='SDIV', description="Signed integer"),
Instruction(opcode=0x06, name='MOD', description="Modulo"),
Instruction(opcode=0x07, name='SMOD', description="Signed modulo"),
Instruction(opcode=0x08, name='ADDMOD', description="Modulo"),
Instruction(opcode=0x09, name='MULMOD', description="Modulo"),
Instruction(opcode=0x0a, name='EXP', description="Exponential operation."),
Instruction(opcode=0x0b, name='SIGNEXTEND', description="Extend length of twos complement signed integer."),
# Comparison & Bitwise Logic Operations
Instruction(opcode=0x10, name='LT', description="Lesser-than comparison"),
Instruction(opcode=0x11, name='GT', description="Greater-than comparison"),
Instruction(opcode=0x12, name='SLT', description="Signed less-than comparison"),
Instruction(opcode=0x13, name='SGT', description="Signed greater-than comparison"),
Instruction(opcode=0x14, name='EQ', description="Equality comparison"),
Instruction(opcode=0x15, name='ISZERO', description="Simple not operator"),
Instruction(opcode=0x16, name='AND', description="Bitwise AND operation."),
Instruction(opcode=0x17, name='OR', description="Bitwise OR operation."),
Instruction(opcode=0x18, name='XOR', description="Bitwise XOR operation."),
Instruction(opcode=0x19, name='NOT', description="Bitwise NOT operation."),
Instruction(opcode=0x1a, name='BYTE', description="Retrieve single byte from word"),
# SHA3
Instruction(opcode=0x20, name='SHA3', description="Compute Keccak-256 hash."),
# Environmental Information
Instruction(opcode=0x30, name='ADDRESS', description="Get address of currently executing account."),
Instruction(opcode=0x31, name='BALANCE', description="Get balance of the given account."),
Instruction(opcode=0x32, name='ORIGIN', description="Get execution origination address."),
Instruction(opcode=0x33, name='CALLER',
description="Get caller address.This is the address of the account that is directly responsible for this execution."),
Instruction(opcode=0x34, name='CALLVALUE',
description="Get deposited value by the instruction/transaction responsible for this execution."),
Instruction(opcode=0x35, name='CALLDATALOAD', description="Get input data of current environment."),
Instruction(opcode=0x36, name='CALLDATASIZE', description="Get size of input data in current environment."),
Instruction(opcode=0x37, name='CALLDATACOPY',
description="Copy input data in current environment to memory. This pertains to the input data passed with the message call instruction or transaction."),
Instruction(opcode=0x38, name='CODESIZE', description="Get size of code running in current environment."),
Instruction(opcode=0x39, name='CODECOPY', description="Copy code running in current environment to memory."),
Instruction(opcode=0x3a, name='GASPRICE', description="Get price of gas in current environment."),
Instruction(opcode=0x3b, name='EXTCODESIZE', description="Get size of an accounts code."),
Instruction(opcode=0x3c, name='EXTCODECOPY', description="Copy an accounts code to memory."),
Instruction(opcode=0x3d, name='RETURNDATASIZE',
description="Push the size of the return data buffer onto the stack."),
Instruction(opcode=0x3e, name='RETURNDATACOPY', description="Copy data from the return data buffer."),
# Block Information
Instruction(opcode=0x40, name='BLOCKHASH',
description="Get the hash of one of the 256 most recent complete blocks."),
Instruction(opcode=0x41, name='COINBASE', description="Get the blocks beneficiary address."),
Instruction(opcode=0x42, name='TIMESTAMP', description="Get the blocks timestamp."),
Instruction(opcode=0x43, name='NUMBER', description="Get the blocks number."),
Instruction(opcode=0x44, name='DIFFICULTY', description="Get the blocks difficulty."),
Instruction(opcode=0x45, name='GASLIMIT', description="Get the blocks gas limit."),
# Stack, Memory, Storage and Flow Operations
Instruction(opcode=0x50, name='POP', description="Remove item from stack."),
Instruction(opcode=0x51, name='MLOAD', description="Load word from memory."),
Instruction(opcode=0x52, name='MSTORE', description="Save word to memory."),
Instruction(opcode=0x53, name='MSTORE8', length_of_operand=0x8, description="Save byte to memory."),
Instruction(opcode=0x54, name='SLOAD', description="Load word from storage."),
Instruction(opcode=0x55, name='SSTORE', description="Save word to storage."),
Instruction(opcode=0x56, name='JUMP', description="Alter the program counter."),
Instruction(opcode=0x57, name='JUMPI', description="Conditionally alter the program counter."),
Instruction(opcode=0x58, name='PC', description="Get the value of the program counter prior to the increment."),
Instruction(opcode=0x59, name='MSIZE', description="Get the size of active memory in bytes."),
Instruction(opcode=0x5a, name='GAS',
description="Get the amount of available gas, including the corresponding reduction"),
Instruction(opcode=0x5b, name='JUMPDEST', description="Mark a valid destination for jumps."),
# Stack Push Operations
Instruction(opcode=0x60, name='PUSH1', length_of_operand=0x1, description="Place 1 byte item on stack."),
Instruction(opcode=0x61, name='PUSH2', length_of_operand=0x2, description="Place 2-byte item on stack."),
Instruction(opcode=0x62, name='PUSH3', length_of_operand=0x3, description="Place 3-byte item on stack."),
Instruction(opcode=0x63, name='PUSH4', length_of_operand=0x4, description="Place 4-byte item on stack."),
Instruction(opcode=0x64, name='PUSH5', length_of_operand=0x5, description="Place 5-byte item on stack."),
Instruction(opcode=0x65, name='PUSH6', length_of_operand=0x6, description="Place 6-byte item on stack."),
Instruction(opcode=0x66, name='PUSH7', length_of_operand=0x7, description="Place 7-byte item on stack."),
Instruction(opcode=0x67, name='PUSH8', length_of_operand=0x8, description="Place 8-byte item on stack."),
Instruction(opcode=0x68, name='PUSH9', length_of_operand=0x9, description="Place 9-byte item on stack."),
Instruction(opcode=0x69, name='PUSH10', length_of_operand=0xa, description="Place 10-byte item on stack."),
Instruction(opcode=0x6a, name='PUSH11', length_of_operand=0xb, description="Place 11-byte item on stack."),
Instruction(opcode=0x6b, name='PUSH12', length_of_operand=0xc, description="Place 12-byte item on stack."),
Instruction(opcode=0x6c, name='PUSH13', length_of_operand=0xd, description="Place 13-byte item on stack."),
Instruction(opcode=0x6d, name='PUSH14', length_of_operand=0xe, description="Place 14-byte item on stack."),
Instruction(opcode=0x6e, name='PUSH15', length_of_operand=0xf, description="Place 15-byte item on stack."),
Instruction(opcode=0x6f, name='PUSH16', length_of_operand=0x10, description="Place 16-byte item on stack."),
Instruction(opcode=0x70, name='PUSH17', length_of_operand=0x11, description="Place 17-byte item on stack."),
Instruction(opcode=0x71, name='PUSH18', length_of_operand=0x12, description="Place 18-byte item on stack."),
Instruction(opcode=0x72, name='PUSH19', length_of_operand=0x13, description="Place 19-byte item on stack."),
Instruction(opcode=0x73, name='PUSH20', length_of_operand=0x14, description="Place 20-byte item on stack."),
Instruction(opcode=0x74, name='PUSH21', length_of_operand=0x15, description="Place 21-byte item on stack."),
Instruction(opcode=0x75, name='PUSH22', length_of_operand=0x16, description="Place 22-byte item on stack."),
Instruction(opcode=0x76, name='PUSH23', length_of_operand=0x17, description="Place 23-byte item on stack."),
Instruction(opcode=0x77, name='PUSH24', length_of_operand=0x18, description="Place 24-byte item on stack."),
Instruction(opcode=0x78, name='PUSH25', length_of_operand=0x19, description="Place 25-byte item on stack."),
Instruction(opcode=0x79, name='PUSH26', length_of_operand=0x1a, description="Place 26-byte item on stack."),
Instruction(opcode=0x7a, name='PUSH27', length_of_operand=0x1b, description="Place 27-byte item on stack."),
Instruction(opcode=0x7b, name='PUSH28', length_of_operand=0x1c, description="Place 28-byte item on stack."),
Instruction(opcode=0x7c, name='PUSH29', length_of_operand=0x1d, description="Place 29-byte item on stack."),
Instruction(opcode=0x7d, name='PUSH30', length_of_operand=0x1e, description="Place 30-byte item on stack."),
Instruction(opcode=0x7e, name='PUSH31', length_of_operand=0x1f, description="Place 31-byte item on stack."),
Instruction(opcode=0x7f, name='PUSH32', length_of_operand=0x20,
description="Place 32-byte (full word) item on stack."),
# Duplication Operations
Instruction(opcode=0x80, name='DUP1', description="Duplicate 1st stack item."),
Instruction(opcode=0x81, name='DUP2', description="Duplicate 2nd stack item."),
Instruction(opcode=0x82, name='DUP3', description="Duplicate 3rd stack item."),
Instruction(opcode=0x83, name='DUP4', description="Duplicate 4th stack item."),
Instruction(opcode=0x84, name='DUP5', description="Duplicate 5th stack item."),
Instruction(opcode=0x85, name='DUP6', description="Duplicate 6th stack item."),
Instruction(opcode=0x86, name='DUP7', description="Duplicate 7th stack item."),
Instruction(opcode=0x87, name='DUP8', description="Duplicate 8th stack item."),
Instruction(opcode=0x88, name='DUP9', description="Duplicate 9th stack item."),
Instruction(opcode=0x89, name='DUP10', description="Duplicate 10th stack item."),
Instruction(opcode=0x8a, name='DUP11', description="Duplicate 11th stack item."),
Instruction(opcode=0x8b, name='DUP12', description="Duplicate 12th stack item."),
Instruction(opcode=0x8c, name='DUP13', description="Duplicate 13th stack item."),
Instruction(opcode=0x8d, name='DUP14', description="Duplicate 14th stack item."),
Instruction(opcode=0x8e, name='DUP15', description="Duplicate 15th stack item."),
Instruction(opcode=0x8f, name='DUP16', description="Duplicate 16th stack item."),
# Exchange Operations
Instruction(opcode=0x90, name='SWAP1', description="Exchange 1st and 2nd stack items."),
Instruction(opcode=0x91, name='SWAP2', description="Exchange 1st and 3rd stack items."),
Instruction(opcode=0x92, name='SWAP3', description="Exchange 1st and 4th stack items."),
Instruction(opcode=0x93, name='SWAP4', description="Exchange 1st and 5th stack items."),
Instruction(opcode=0x94, name='SWAP5', description="Exchange 1st and 6th stack items."),
Instruction(opcode=0x95, name='SWAP6', description="Exchange 1st and 7th stack items."),
Instruction(opcode=0x96, name='SWAP7', description="Exchange 1st and 8th stack items."),
Instruction(opcode=0x97, name='SWAP8', description="Exchange 1st and 9th stack items."),
Instruction(opcode=0x98, name='SWAP9', description="Exchange 1st and 10th stack items."),
Instruction(opcode=0x99, name='SWAP10', description="Exchange 1st and 11th stack items."),
Instruction(opcode=0x9a, name='SWAP11', description="Exchange 1st and 12th stack items."),
Instruction(opcode=0x9b, name='SWAP12', description="Exchange 1st and 13th stack items."),
Instruction(opcode=0x9c, name='SWAP13', description="Exchange 1st and 14th stack items."),
Instruction(opcode=0x9d, name='SWAP14', description="Exchange 1st and 15th stack items."),
Instruction(opcode=0x9e, name='SWAP15', description="Exchange 1st and 16th stack items."),
Instruction(opcode=0x9f, name='SWAP16', description="Exchange 1st and 17th stack items."),
# Logging Operations
Instruction(opcode=0xa0, name='LOG0', length_of_operand=0x0, description="Append log record with no topics."),
Instruction(opcode=0xa1, name='LOG1', length_of_operand=0x1, description="Append log record with one topic."),
Instruction(opcode=0xa2, name='LOG2', length_of_operand=0x2, description="Append log record with two topics."),
Instruction(opcode=0xa3, name='LOG3', length_of_operand=0x3, description="Append log record with three topics."),
Instruction(opcode=0xa4, name='LOG4', length_of_operand=0x4, description="Append log record with four topics."),
# System Operations
Instruction(opcode=0xf0, name='CREATE', description="Create a new account with associated code."),
Instruction(opcode=0xf1, name='CALL', description="Message-call into an account."),
Instruction(opcode=0xf2, name='CALLCODE',
description="Message-call into this account with alternative accounts code."),
Instruction(opcode=0xf3, name='RETURN', description="Halt execution returning output data."),
# Newer opcode
Instruction(opcode=0xfd, name='REVERT', description='throw an error'),
# Halt Execution, Mark for deletion
Instruction(opcode=0xff, name='SUICIDE', description="Halt execution and register account for later deletion."), ]
OPCODE_MARKS_BASICBLOCK_END = ['JUMP', 'JUMPI', 'STOP', 'RETURN']
class EVMCode(object):
def __init__(self, debug=False):
self.dis = EVMDisAssembler(debug=debug)
self.first = None
self.last = None
self.duration = None
self.instruction_at = {} # address:instruction
self.name_for_address = {} # address:name
self.xrefs = {} # address:set(ref istruction,ref instruction)
def assemble(self, instructions):
return '0x' + ''.join(inst.serialize() for inst in instructions)
def _iter(self, first=None):
current = first or self.first
yield current
while current.next:
current = current.next
yield current
def disassemble(self, bytecode=None):
"""
for inst in self.dis.disassemble(bytecode):
# return them as we process them
yield inst
"""
if bytecode:
t_start = time.time()
disasm = list(self.dis.disassemble(bytecode))
self.first = disasm[0]
self.last = disasm[-1]
self._update_address_space(self.first)
self._update_xrefs()
self.duration = time.time() - t_start
# current = self.first
return self._iter()
def _update_address_space(self, first):
for instruction in self._iter(first):
self.instruction_at[instruction.address] = instruction
def _update_xrefs(self):
# find all JUMP, JUMPI's
for loc, instruction in ((l, i) for l, i in self.instruction_at.items() if i.name in ("JUMP", "JUMPI")):
if instruction.previous and instruction.previous.name.startswith("PUSH"):
instruction.jumpto = int(instruction.previous.operand, 16)
target_instruction = self.instruction_at.get(instruction.jumpto)
if target_instruction and target_instruction.name == "JUMPDEST":
# valid address, valid target
self.xrefs.setdefault(instruction.jumpto, set([]))
self.xrefs[instruction.jumpto] = instruction
target_instruction.xrefs.add(instruction)
def basicblocks(self, disasm):
# listify it in order to resolve xrefs, jumps
current_basicblock = BasicBlock(address=0, name="init")
for i, nm in enumerate(disasm):
if nm.name == "JUMPDEST":
# jumpdest belongs tto the new basicblock (marks the start)
yield current_basicblock
current_basicblock = BasicBlock(address=nm.address, name="loc_%s"% hex(nm.address))
# add to current basicblock
current_basicblock.instructions.append(nm)
nm.basicblock = current_basicblock
# yield the last basicblock
yield current_basicblock
class EVMDisAssembler(object):
OPCODE_TABLE = dict((obj.opcode, obj) for obj in OPCODES)
def __init__(self, debug=False):
self.errors = []
self.debug = debug
def disassemble(self, bytecode):
""" Disassemble evm bytecode to a Instruction objects """
def iterbytes(bytecode):
iter_bytecode = (b for b in bytecode if b in '1234567890abcdefABCDEFx') # 0x will bail below.
for b in zip(iter_bytecode, iter_bytecode):
b = ''.join(b)
try:
yield int(b, 16)
except ValueError:
logger.warning("skipping invalid byte: %s" % repr(b))
pc = 0
previous = None
iter_bytecode = iterbytes(bytecode)
# disassemble
seen_stop = False
for opcode in iter_bytecode:
logger.debug(opcode)
try:
instruction = self.OPCODE_TABLE[opcode].consume(iter_bytecode)
except KeyError as ke:
instruction = Instruction(opcode=opcode,
name="UNKNOWN_%s" % hex(opcode),
description="Invalid opcode")
if not seen_stop:
msg = "error: byte at address %d (%s) is not a valid operator" % (pc, hex(opcode))
if self.debug:
logger.exception(msg)
self.errors.append("%s; %r" % (msg, ke))
if instruction.name == 'STOP' and not seen_stop:
seen_stop = True
instruction.address = pc
pc += instruction.size()
# doubly link
instruction.previous = previous
if previous:
previous.next = instruction
# current is previous
previous = instruction
yield instruction
def assemble(self, instructions):
""" Assemble a list of Instruction() objects to evm bytecode"""
for instruction in instructions:
yield instruction.serialize()
class EVMDasmPrinter:
""" utility class for different output formats
"""
@staticmethod
def listing(disasm):
for i, nm in enumerate(disasm):
print("%s %s" % (nm.name, nm.operand))
@staticmethod
def detailed(disasm, resolve_funcsig=False):
print("%-3s %-4s %-3s %-15s %-36s %-30s %s" % (
"Inst", "addr", " hex ", "mnemonic", "operand", "xrefs", "description"))
print("-" * 150)
# listify it in order to resolve xrefs, jumps
for i, nm in enumerate(disasm):
if nm.name == "JUMPDEST":
print(":loc_%s" % hex(nm.address))
try:
operand = ','.join('%s@%s' % (x.name, hex(x.address)) for x in nm.xrefs) if nm.xrefs else ''
print("%4d [%3d 0x%0.3x] %-15s %-36s %-30s # %s" % (i, nm.address, nm.address, nm.name,
nm.describe_operand(resolve_funcsig=resolve_funcsig),
operand,
nm.description))
except Exception as e:
print(e)
if nm.name in OPCODE_MARKS_BASICBLOCK_END:
print("")
@staticmethod
def basicblocks_detailed(basicblocks, resolve_funcsig=False):
print("%-3s %-4s %-3s %-15s %-36s %-30s %s" % (
"Inst", "addr", " hex ", "mnemonic", "operand", "xrefs", "description"))
print("-" * 150)
i = 0
for bb in basicblocks:
# every basicblock
print(":loc_%s" % hex(bb.address))
for nm in bb.instructions:
try:
operand = ','.join('%s@%s' % (x.name, hex(x.address)) for x in nm.xrefs) if nm.xrefs else ''
print("%4d [%3d 0x%0.3x] %-15s %-36s %-30s # %s" % (i, nm.address, nm.address, nm.name,
nm.describe_operand(
resolve_funcsig=resolve_funcsig),
operand,
nm.description))
except Exception as e:
print(e)
i += 1
if nm.name in OPCODE_MARKS_BASICBLOCK_END:
print("")
def main():
logging.basicConfig(format="%(levelname)-7s - %(message)s")
from optparse import OptionParser
usage = """usage: %prog [options]
example: %prog [-L -F -v] <file_or_bytecode>
%prog [-L -F -v] # read from stdin
%prog [-L -F -a <address>] # fetch contract code from infura.io
"""
parser = OptionParser(usage=usage)
loglevels = ['CRITICAL', 'FATAL', 'ERROR', 'WARNING', 'WARN', 'INFO', 'DEBUG', 'NOTSET']
parser.add_option("-v", "--verbosity", default="critical",
help="available loglevels: %s [default: %%default]" % ','.join(l.lower() for l in loglevels))
parser.add_option("-L", "--listing", action="store_true", dest="listing",
help="disables table mode, outputs assembly only")
parser.add_option("-F", "--no-online-lookup", action="store_false", default=True, dest="function_signature_lookup",
help="disable online function signature lookup")
parser.add_option("-a", "--address",
help="fetch contract bytecode from address")
# parse args
(options, args) = parser.parse_args()
if options.verbosity.upper() in loglevels:
options.verbosity = getattr(logging, options.verbosity.upper())
logger.setLevel(options.verbosity)
else:
parser.error("invalid verbosity selected. please check --help")
if options.function_signature_lookup and not ethereum_input_decoder:
logger.warning("ethereum_input_decoder package not installed. function signature lookup not available.(pip install ethereum-input-decoder)")
# get bytecode from stdin, or arg:file or arg:bytcode
if options.address:
api = EthJsonRpc("https://mainnet.infura.io/")
evmcode = api.call(method="eth_getCode", params=[options.address, "latest"])["result"]
elif not args:
evmcode = sys.stdin.read()
else:
if os.path.isfile(args[0]):
evmcode = open(args[0], 'r').read()
else:
evmcode = args[0]
# init analyzer
evm_dasm = EVMCode(debug=options.verbosity)
logger.debug(EVMDisAssembler.OPCODE_TABLE)
# print dissasembly
if options.listing:
EVMDasmPrinter.listing(evm_dasm.disassemble(evmcode))
else:
EVMDasmPrinter.basicblocks_detailed(evm_dasm.basicblocks(evm_dasm.disassemble(evmcode)), resolve_funcsig=options.function_signature_lookup)
#EVMDasmPrinter.detailed(evm_dasm.disassemble(evmcode), resolve_funcsig=options.function_signature_lookup)
logger.info("finished in %0.3f seconds." % evm_dasm.duration)
# post a notification that disassembly might be incorrect due to errors
if evm_dasm.dis.errors:
logger.warning("disassembly finished with %d errors" % len(evm_dasm.dis.errors))
if options.verbosity >= 30:
logger.warning("use -v INFO to see the errors")
else:
for e in evm_dasm.dis.errors:
logger.info(e)
# quick check
logger.debug("assemble(disassemble(evmcode))==",
evmcode.strip() == ''.join(evm_dasm.assemble(evm_dasm.disassemble())))
sys.exit(len(evm_dasm.dis.errors))
if __name__ == "__main__":
main()

26
ethereumetl.py Normal file
View File

@@ -0,0 +1,26 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from ethereumetl.cli import cli
cli()

26
ethereumetl/__main__.py Normal file
View File

@@ -0,0 +1,26 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from ethereumetl.cli import cli
cli()

View File

@@ -0,0 +1,81 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()
import click
from ethereumetl.cli.export_all import export_all
from ethereumetl.cli.export_blocks_and_transactions import export_blocks_and_transactions
from ethereumetl.cli.export_contracts import export_contracts
from ethereumetl.cli.export_geth_traces import export_geth_traces
from ethereumetl.cli.export_origin import export_origin
from ethereumetl.cli.export_receipts_and_logs import export_receipts_and_logs
from ethereumetl.cli.export_token_transfers import export_token_transfers
from ethereumetl.cli.export_tokens import export_tokens
from ethereumetl.cli.export_traces import export_traces
from ethereumetl.cli.extract_contracts import extract_contracts
from ethereumetl.cli.extract_csv_column import extract_csv_column
from ethereumetl.cli.extract_field import extract_field
from ethereumetl.cli.extract_geth_traces import extract_geth_traces
from ethereumetl.cli.extract_token_transfers import extract_token_transfers
from ethereumetl.cli.extract_tokens import extract_tokens
from ethereumetl.cli.filter_items import filter_items
from ethereumetl.cli.get_block_range_for_date import get_block_range_for_date
from ethereumetl.cli.get_block_range_for_timestamps import get_block_range_for_timestamps
from ethereumetl.cli.get_keccak_hash import get_keccak_hash
from ethereumetl.cli.stream import stream
@click.group()
@click.version_option(version='2.4.2')
@click.pass_context
def cli(ctx):
pass
# export
cli.add_command(export_all, "export_all")
cli.add_command(export_blocks_and_transactions, "export_blocks_and_transactions")
cli.add_command(export_origin, "export_origin")
cli.add_command(export_receipts_and_logs, "export_receipts_and_logs")
cli.add_command(export_token_transfers, "export_token_transfers")
cli.add_command(extract_token_transfers, "extract_token_transfers")
cli.add_command(export_contracts, "export_contracts")
cli.add_command(export_tokens, "export_tokens")
cli.add_command(export_traces, "export_traces")
cli.add_command(export_geth_traces, "export_geth_traces")
cli.add_command(extract_geth_traces, "extract_geth_traces")
cli.add_command(extract_contracts, "extract_contracts")
cli.add_command(extract_tokens, "extract_tokens")
# streaming
cli.add_command(stream, "stream")
# utils
cli.add_command(get_block_range_for_date, "get_block_range_for_date")
cli.add_command(get_block_range_for_timestamps, "get_block_range_for_timestamps")
cli.add_command(get_keccak_hash, "get_keccak_hash")
cli.add_command(extract_csv_column, "extract_csv_column")
cli.add_command(filter_items, "filter_items")
cli.add_command(extract_field, "extract_field")

View File

@@ -0,0 +1,124 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
import re
from datetime import datetime, timedelta
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.web3_utils import build_web3
from ethereumetl.jobs.export_all_common import export_all_common
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.service.eth_service import EthService
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
def is_date_range(start, end):
"""Checks for YYYY-MM-DD date format."""
return bool(re.match('^2[0-9]{3}-[0-9]{2}-[0-9]{2}$', start) and
re.match('^2[0-9]{3}-[0-9]{2}-[0-9]{2}$', end))
def is_unix_time_range(start, end):
"""Checks for Unix timestamp format."""
return bool(re.match("^[0-9]{10}$|^[0-9]{13}$", start) and
re.match("^[0-9]{10}$|^[0-9]{13}$", end))
def is_block_range(start, end):
"""Checks for a valid block number."""
return (start.isdigit() and 0 <= int(start) <= 99999999 and
end.isdigit() and 0 <= int(end) <= 99999999)
def get_partitions(start, end, partition_batch_size, provider_uri):
"""Yield partitions based on input data type."""
if is_date_range(start, end) or is_unix_time_range(start, end):
if is_date_range(start, end):
start_date = datetime.strptime(start, '%Y-%m-%d').date()
end_date = datetime.strptime(end, '%Y-%m-%d').date()
elif is_unix_time_range(start, end):
if len(start) == 10 and len(end) == 10:
start_date = datetime.utcfromtimestamp(int(start)).date()
end_date = datetime.utcfromtimestamp(int(end)).date()
elif len(start) == 13 and len(end) == 13:
start_date = datetime.utcfromtimestamp(int(start) / 1e3).date()
end_date = datetime.utcfromtimestamp(int(end) / 1e3).date()
day = timedelta(days=1)
provider = get_provider_from_uri(provider_uri)
web3 = build_web3(provider)
eth_service = EthService(web3)
while start_date <= end_date:
batch_start_block, batch_end_block = eth_service.get_block_range_for_date(start_date)
partition_dir = '/date={start_date!s}/'.format(start_date=start_date)
yield batch_start_block, batch_end_block, partition_dir
start_date += day
elif is_block_range(start, end):
start_block = int(start)
end_block = int(end)
for batch_start_block in range(start_block, end_block + 1, partition_batch_size):
batch_end_block = batch_start_block + partition_batch_size - 1
if batch_end_block > end_block:
batch_end_block = end_block
padded_batch_start_block = str(batch_start_block).zfill(8)
padded_batch_end_block = str(batch_end_block).zfill(8)
partition_dir = '/start_block={padded_batch_start_block}/end_block={padded_batch_end_block}'.format(
padded_batch_start_block=padded_batch_start_block,
padded_batch_end_block=padded_batch_end_block,
)
yield batch_start_block, batch_end_block, partition_dir
else:
raise ValueError('start and end must be either block numbers or ISO dates or Unix times')
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start', required=True, type=str, help='Start block/ISO date/Unix time')
@click.option('-e', '--end', required=True, type=str, help='End block/ISO date/Unix time')
@click.option('-b', '--partition-batch-size', default=10000, show_default=True, type=int,
help='The number of blocks to export in partition.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-o', '--output-dir', default='output', show_default=True, type=str, help='Output directory, partitioned in Hive style.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-B', '--export-batch-size', default=100, show_default=True, type=int, help='The number of requests in JSON RPC batches.')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_all(start, end, partition_batch_size, provider_uri, output_dir, max_workers, export_batch_size,
chain='ethereum'):
"""Exports all data for a range of blocks."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
export_all_common(get_partitions(start, end, partition_batch_size, provider_uri),
output_dir, provider_uri, max_workers, export_batch_size)

View File

@@ -0,0 +1,66 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.jobs.export_blocks_job import ExportBlocksJob
from ethereumetl.jobs.exporters.blocks_and_transactions_item_exporter import blocks_and_transactions_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start-block', default=0, show_default=True, type=int, help='Start block')
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to export at a time.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('--blocks-output', default=None, show_default=True, type=str,
help='The output file for blocks. If not provided blocks will not be exported. Use "-" for stdout')
@click.option('--transactions-output', default=None, show_default=True, type=str,
help='The output file for transactions. '
'If not provided transactions will not be exported. Use "-" for stdout')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_blocks_and_transactions(start_block, end_block, batch_size, provider_uri, max_workers, blocks_output,
transactions_output, chain='ethereum'):
"""Exports blocks and transactions."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
if blocks_output is None and transactions_output is None:
raise ValueError('Either --blocks-output or --transactions-output options must be provided')
job = ExportBlocksJob(
start_block=start_block,
end_block=end_block,
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
max_workers=max_workers,
item_exporter=blocks_and_transactions_item_exporter(blocks_output, transactions_output),
export_blocks=blocks_output is not None,
export_transactions=transactions_output is not None)
job.run()

View File

@@ -0,0 +1,60 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.export_contracts_job import ExportContractsJob
from ethereumetl.jobs.exporters.contracts_item_exporter import contracts_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-ca', '--contract-addresses', required=True, type=str,
help='The file containing contract addresses, one per line.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_contracts(batch_size, contract_addresses, output, max_workers, provider_uri, chain='ethereum'):
"""Exports contracts bytecode and sighashes."""
check_classic_provider_uri(chain, provider_uri)
with smart_open(contract_addresses, 'r') as contract_addresses_file:
contract_addresses = (contract_address.strip() for contract_address in contract_addresses_file
if contract_address.strip())
job = ExportContractsJob(
contract_addresses_iterable=contract_addresses,
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
item_exporter=contracts_item_exporter(output),
max_workers=max_workers)
job.run()

View File

@@ -0,0 +1,55 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.jobs.export_geth_traces_job import ExportGethTracesJob
from ethereumetl.jobs.exporters.geth_traces_item_exporter import geth_traces_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start-block', default=0, show_default=True, type=int, help='Start block')
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to process at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str,
help='The output file for geth traces. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', required=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or http://localhost:8545/')
def export_geth_traces(start_block, end_block, batch_size, output, max_workers, provider_uri):
"""Exports traces from geth node."""
job = ExportGethTracesJob(
start_block=start_block,
end_block=end_block,
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
max_workers=max_workers,
item_exporter=geth_traces_item_exporter(output))
job.run()

View File

@@ -0,0 +1,56 @@
# A job to export data from Origin Protocol.
#
# Origin Protocol is an open source platform for implementing blockchain e-commerce.
# More details at https://www.originprotool.com
#
# The core of the platform is the marketplace smart contract:
# - Code: https://etherscan.io/address/0x698ff47b84837d3971118a369c570172ee7e54c2
# - Address: https://github.com/OriginProtocol/origin/blob/master/packages/contracts/contracts/marketplace/V01_Marketplace.sol
#
# Transactional data is stored on-chain, while side-metadata is stored in IPFS (https://ipfs.io).
#
# Given a range of block numbers, the job queries the blockchain for events emitted by the contract.
# Every event includes a hash pointing to a marketplace listing metadata stored as a JSON file on IPFS.
# A marketplace listing can either be a single self-contained listing, or the entry point for the entire
# catalog of products from a shop.
#
# The job generates 2 data sets:
# - Marketplace listings
# - Shop products.
#
import click
from ethereumetl.web3_utils import build_web3
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.jobs.export_origin_job import ExportOriginJob
from ethereumetl.jobs.exporters.origin_exporter import origin_marketplace_listing_item_exporter, origin_shop_product_item_exporter
from ethereumetl.ipfs.origin import get_origin_ipfs_client
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start-block', default=0, show_default=True, type=int, help='Start block')
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('--marketplace-output', default='-', show_default=True, type=str, help='The output file for marketplace data. If not specified stdout is used.')
@click.option('--shop-output', default='-', show_default=True, type=str, help='The output file for shop data. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', required=True, type=str,
help='The URI of the web3 provider e.g. file://$HOME/Library/Ethereum/geth.ipc or http://localhost:8545/')
def export_origin(start_block, end_block, batch_size, marketplace_output, shop_output, max_workers, provider_uri):
"""Exports Origin Protocol data."""
job = ExportOriginJob(
start_block=start_block,
end_block=end_block,
batch_size=batch_size,
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
ipfs_client=get_origin_ipfs_client(),
marketplace_listing_exporter=origin_marketplace_listing_item_exporter(marketplace_output),
shop_product_exporter=origin_shop_product_item_exporter(shop_output),
max_workers=max_workers)
job.run()

View File

@@ -0,0 +1,65 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.export_receipts_job import ExportReceiptsJob
from ethereumetl.jobs.exporters.receipts_and_logs_item_exporter import receipts_and_logs_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of receipts to export at a time.')
@click.option('-t', '--transaction-hashes', required=True, type=str,
help='The file containing transaction hashes, one per line.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('--receipts-output', default=None, show_default=True, type=str,
help='The output file for receipts. If not provided receipts will not be exported. Use "-" for stdout')
@click.option('--logs-output', default=None, show_default=True, type=str,
help='The output file for receipt logs. '
'If not provided receipt logs will not be exported. Use "-" for stdout')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_receipts_and_logs(batch_size, transaction_hashes, provider_uri, max_workers, receipts_output, logs_output,
chain='ethereum'):
"""Exports receipts and logs."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
with smart_open(transaction_hashes, 'r') as transaction_hashes_file:
job = ExportReceiptsJob(
transaction_hashes_iterable=(transaction_hash.strip() for transaction_hash in transaction_hashes_file),
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
max_workers=max_workers,
item_exporter=receipts_and_logs_item_exporter(receipts_output, logs_output),
export_receipts=receipts_output is not None,
export_logs=logs_output is not None)
job.run()

View File

@@ -0,0 +1,58 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.web3_utils import build_web3
from ethereumetl.csv_utils import set_max_field_size_limit
from ethereumetl.jobs.export_token_transfers_job import ExportTokenTransfersJob
from ethereumetl.jobs.exporters.token_transfers_item_exporter import token_transfers_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start-block', default=0, show_default=True, type=int, help='Start block')
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', required=True, type=str,
help='The URI of the web3 provider e.g. file://$HOME/Library/Ethereum/geth.ipc or http://localhost:8545/')
@click.option('-t', '--tokens', default=None, show_default=True, type=str, multiple=True, help='The list of token addresses to filter by.')
def export_token_transfers(start_block, end_block, batch_size, output, max_workers, provider_uri, tokens):
"""Exports ERC20/ERC721 transfers."""
set_max_field_size_limit()
job = ExportTokenTransfersJob(
start_block=start_block,
end_block=end_block,
batch_size=batch_size,
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
item_exporter=token_transfers_item_exporter(output),
max_workers=max_workers,
tokens=tokens)
job.run()

View File

@@ -0,0 +1,58 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.web3_utils import build_web3
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.export_tokens_job import ExportTokensJob
from ethereumetl.jobs.exporters.tokens_item_exporter import tokens_item_exporter
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-t', '--token-addresses', required=True, type=str,
help='The file containing token addresses, one per line.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_tokens(token_addresses, output, max_workers, provider_uri, chain='ethereum'):
"""Exports ERC20/ERC721 tokens."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
with smart_open(token_addresses, 'r') as token_addresses_file:
job = ExportTokensJob(
token_addresses_iterable=(token_address.strip() for token_address in token_addresses_file),
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
item_exporter=tokens_item_exporter(output),
max_workers=max_workers)
job.run()

View File

@@ -0,0 +1,66 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.web3_utils import build_web3
from ethereumetl.jobs.export_traces_job import ExportTracesJob
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.jobs.exporters.traces_item_exporter import traces_item_exporter
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-s', '--start-block', default=0, show_default=True, type=int, help='Start block')
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=5, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('-p', '--provider-uri', required=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/.local/share/io.parity.ethereum/jsonrpc.ipc or http://localhost:8545/')
@click.option('--genesis-traces/--no-genesis-traces', default=False, show_default=True, help='Whether to include genesis traces')
@click.option('--daofork-traces/--no-daofork-traces', default=False, show_default=True, help='Whether to include daofork traces')
@click.option('-t', '--timeout', default=60, show_default=True, type=int, help='IPC or HTTP request timeout.')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def export_traces(start_block, end_block, batch_size, output, max_workers, provider_uri,
genesis_traces, daofork_traces, timeout=60, chain='ethereum'):
"""Exports traces from parity node."""
if chain == 'classic' and daofork_traces == True:
raise ValueError(
'Classic chain does not include daofork traces. Disable daofork traces with --no-daofork-traces option.')
job = ExportTracesJob(
start_block=start_block,
end_block=end_block,
batch_size=batch_size,
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri, timeout=timeout))),
item_exporter=traces_item_exporter(output),
max_workers=max_workers,
include_genesis_traces=genesis_traces,
include_daofork_traces=daofork_traces)
job.run()

View File

@@ -0,0 +1,58 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import csv
import json
import click
from blockchainetl.csv_utils import set_max_field_size_limit
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.exporters.contracts_item_exporter import contracts_item_exporter
from ethereumetl.jobs.extract_contracts_job import ExtractContractsJob
from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-t', '--traces', type=str, required=True, help='The CSV file containing traces.')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
def extract_contracts(traces, batch_size, output, max_workers):
"""Extracts contracts from traces file."""
set_max_field_size_limit()
with smart_open(traces, 'r') as traces_file:
if traces.endswith('.json'):
traces_iterable = (json.loads(line) for line in traces_file)
else:
traces_iterable = csv.DictReader(traces_file)
job = ExtractContractsJob(
traces_iterable=traces_iterable,
batch_size=batch_size,
max_workers=max_workers,
item_exporter=contracts_item_exporter(output))
job.run()

View File

@@ -21,22 +21,22 @@
# SOFTWARE.
import argparse
import click
import csv
from ethereumetl.csv_utils import set_max_field_size_limit
from ethereumetl.file_utils import smart_open
from blockchainetl.file_utils import smart_open
parser = argparse.ArgumentParser(description='Extracts a single column from a given csv file.')
parser.add_argument('-i', '--input', default='-', type=str, help='The input file. If not specified stdin is used.')
parser.add_argument('-o', '--output', default='-', type=str, help='The output file. If not specified stdout is used.')
parser.add_argument('-c', '--column', required=True, type=str, help='The csv column name to extract.')
args = parser.parse_args()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-i', '--input', default='-', show_default=True, type=str, help='The input file. If not specified stdin is used.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-c', '--column', required=True, type=str, help='The csv column name to extract.')
def extract_csv_column(input, output, column):
"""Extracts column from given CSV file. Deprecated - use extract_field."""
set_max_field_size_limit()
set_max_field_size_limit()
with smart_open(args.input, 'r') as input_file, smart_open(args.output, 'w') as output_file:
reader = csv.DictReader(input_file)
for row in reader:
output_file.write(row[args.column] + '\n')
with smart_open(input, 'r') as input_file, smart_open(output, 'w') as output_file:
reader = csv.DictReader(input_file)
for row in reader:
output_file.write(row[column] + '\n')

View File

@@ -0,0 +1,35 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl import misc_utils
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-i', '--input', default='-', show_default=True, type=str, help='The input file. If not specified stdin is used.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-f', '--field', required=True, type=str, help='The field name to extract.')
def extract_field(input, output, field):
"""Extracts field from given CSV or JSON newline-delimited file."""
misc_utils.extract_field(input, output, field)

View File

@@ -0,0 +1,53 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import csv
import json
import click
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.exporters.traces_item_exporter import traces_item_exporter
from ethereumetl.jobs.extract_geth_traces_job import ExtractGethTracesJob
from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-i', '--input', required=True, type=str, help='The JSON file containing geth traces.')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
def extract_geth_traces(input, batch_size, output, max_workers):
"""Extracts geth traces from JSON lines file."""
with smart_open(input, 'r') as geth_traces_file:
if input.endswith('.json'):
traces_iterable = (json.loads(line) for line in geth_traces_file)
else:
traces_iterable = (trace for trace in csv.DictReader(geth_traces_file))
job = ExtractGethTracesJob(
traces_iterable=traces_iterable,
batch_size=batch_size,
max_workers=max_workers,
item_exporter=traces_item_exporter(output))
job.run()

View File

@@ -0,0 +1,59 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
import csv
import json
from ethereumetl.csv_utils import set_max_field_size_limit
from blockchainetl.file_utils import smart_open
from blockchainetl.jobs.exporters.converters.int_to_string_item_converter import IntToStringItemConverter
from ethereumetl.jobs.exporters.token_transfers_item_exporter import token_transfers_item_exporter
from ethereumetl.jobs.extract_token_transfers_job import ExtractTokenTransfersJob
from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()
set_max_field_size_limit()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-l', '--logs', type=str, required=True, help='The CSV file containing receipt logs.')
@click.option('-b', '--batch-size', default=100, show_default=True, type=int, help='The number of blocks to filter at a time.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('--values-as-strings', default=False, show_default=True, is_flag=True, help='Whether to convert values to strings.')
def extract_token_transfers(logs, batch_size, output, max_workers, values_as_strings=False):
"""Extracts ERC20/ERC721 transfers from logs file."""
with smart_open(logs, 'r') as logs_file:
if logs.endswith('.json'):
logs_reader = (json.loads(line) for line in logs_file)
else:
logs_reader = csv.DictReader(logs_file)
converters = [IntToStringItemConverter(keys=['value'])] if values_as_strings else []
job = ExtractTokenTransfersJob(
logs_iterable=logs_reader,
batch_size=batch_size,
max_workers=max_workers,
item_exporter=token_transfers_item_exporter(output, converters=converters))
job.run()

View File

@@ -0,0 +1,66 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import csv
import json
import click
from blockchainetl.csv_utils import set_max_field_size_limit
from blockchainetl.file_utils import smart_open
from blockchainetl.jobs.exporters.converters.int_to_string_item_converter import IntToStringItemConverter
from ethereumetl.jobs.exporters.tokens_item_exporter import tokens_item_exporter
from ethereumetl.jobs.extract_tokens_job import ExtractTokensJob
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.web3_utils import build_web3
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-c', '--contracts', type=str, required=True, help='The JSON file containing contracts.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The maximum number of workers.')
@click.option('--values-as-strings', default=False, show_default=True, is_flag=True, help='Whether to convert values to strings.')
def extract_tokens(contracts, provider_uri, output, max_workers, values_as_strings=False):
"""Extracts tokens from contracts file."""
set_max_field_size_limit()
with smart_open(contracts, 'r') as contracts_file:
if contracts.endswith('.json'):
contracts_iterable = (json.loads(line) for line in contracts_file)
else:
contracts_iterable = csv.DictReader(contracts_file)
converters = [IntToStringItemConverter(keys=['decimals', 'total_supply'])] if values_as_strings else []
job = ExtractTokensJob(
contracts_iterable=contracts_iterable,
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
max_workers=max_workers,
item_exporter=tokens_item_exporter(output, converters))
job.run()

View File

@@ -0,0 +1,37 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl import misc_utils
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-i', '--input', default='-', show_default=True, type=str, help='The input file. If not specified stdin is used.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-p', '--predicate', required=True, type=str,
help='Predicate in Python code e.g. "item[\'is_erc20\']".')
def filter_items(input, output, predicate):
"""Filters rows in given CSV or JSON newline-delimited file."""
def evaluated_predicate(item):
return eval(predicate, globals(), {'item': item})
misc_utils.filter_items(input, output, evaluated_predicate)

View File

@@ -0,0 +1,56 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from datetime import datetime
from ethereumetl.web3_utils import build_web3
from blockchainetl.file_utils import smart_open
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.service.eth_service import EthService
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-d', '--date', required=True, type=lambda d: datetime.strptime(d, '%Y-%m-%d'),
help='The date e.g. 2018-01-01.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def get_block_range_for_date(provider_uri, date, output, chain='ethereum'):
"""Outputs start and end blocks for given date."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
provider = get_provider_from_uri(provider_uri)
web3 = build_web3(provider)
eth_service = EthService(web3)
start_block, end_block = eth_service.get_block_range_for_date(date)
with smart_open(output, 'w') as output_file:
output_file.write('{},{}\n'.format(start_block, end_block))

View File

@@ -0,0 +1,55 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import click
from ethereumetl.web3_utils import build_web3
from blockchainetl.file_utils import smart_open
from blockchainetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.service.eth_service import EthService
from ethereumetl.utils import check_classic_provider_uri
logging_basic_config()
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-s', '--start-timestamp', required=True, type=int, help='Start unix timestamp, in seconds.')
@click.option('-e', '--end-timestamp', required=True, type=int, help='End unix timestamp, in seconds.')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
@click.option('-c', '--chain', default='ethereum', show_default=True, type=str, help='The chain network to connect to.')
def get_block_range_for_timestamps(provider_uri, start_timestamp, end_timestamp, output, chain='ethereum'):
"""Outputs start and end blocks for given timestamps."""
provider_uri = check_classic_provider_uri(chain, provider_uri)
provider = get_provider_from_uri(provider_uri)
web3 = build_web3(provider)
eth_service = EthService(web3)
start_block, end_block = eth_service.get_block_range_for_timestamps(start_timestamp, end_timestamp)
with smart_open(output, 'w') as output_file:
output_file.write('{},{}\n'.format(start_block, end_block))

View File

@@ -21,23 +21,21 @@
# SOFTWARE.
import argparse
import click
from eth_utils import keccak
from ethereumetl.file_utils import smart_open
from ethereumetl.logging_utils import logging_basic_config
from blockchainetl.file_utils import smart_open
from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()
parser = argparse.ArgumentParser(description='Outputs the 32-byte keccak hash of the given string.')
parser.add_argument('-i', '--input-string', default='Transfer(address,address,uint256)', type=str,
help='String to hash, e.g. Transfer(address,address,uint256)')
parser.add_argument('-o', '--output', default='-', type=str, help='The output file. If not specified stdout is used.')
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-i', '--input-string', default='Transfer(address,address,uint256)', show_default=True, type=str,
help='String to hash, e.g. Transfer(address,address,uint256)')
@click.option('-o', '--output', default='-', show_default=True, type=str, help='The output file. If not specified stdout is used.')
def get_keccak_hash(input_string, output):
"""Outputs 32-byte Keccak hash of given string."""
hash = keccak(text=input_string)
args = parser.parse_args()
hash = keccak(text=args.input_string)
with smart_open(args.output, 'w') as output_file:
output_file.write('0x{}\n'.format(hash.hex()))
with smart_open(output, 'w') as output_file:
output_file.write('0x{}\n'.format(hash.hex()))

104
ethereumetl/cli/stream.py Normal file
View File

@@ -0,0 +1,104 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import logging
import random
import click
from blockchainetl.streaming.streaming_utils import configure_signals, configure_logging
from ethereumetl.enumeration.entity_type import EntityType
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.streaming.item_exporter_creator import create_item_exporters
from ethereumetl.thread_local_proxy import ThreadLocalProxy
@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-l', '--last-synced-block-file', default='last_synced_block.txt', show_default=True, type=str, help='')
@click.option('--lag', default=0, show_default=True, type=int, help='The number of blocks to lag behind the network.')
@click.option('-p', '--provider-uri', default='https://mainnet.infura.io', show_default=True, type=str,
help='The URI of the web3 provider e.g. '
'file://$HOME/Library/Ethereum/geth.ipc or https://mainnet.infura.io')
@click.option('-o', '--output', type=str,
help='Either Google PubSub topic path e.g. projects/your-project/topics/crypto_ethereum; '
'or Postgres connection url e.g. postgresql+pg8000://postgres:admin@127.0.0.1:5432/ethereum; '
'or GCS bucket e.g. gs://your-bucket-name; '
'or kafka, output name and connection host:port e.g. kafka/127.0.0.1:9092 '
'or Kinesis, e.g. kinesis://your-data-stream-name'
'If not specified will print to console')
@click.option('-s', '--start-block', default=None, show_default=True, type=int, help='Start block')
@click.option('-e', '--entity-types', default=','.join(EntityType.ALL_FOR_INFURA), show_default=True, type=str,
help='The list of entity types to export.')
@click.option('--period-seconds', default=10, show_default=True, type=int, help='How many seconds to sleep between syncs')
@click.option('-b', '--batch-size', default=10, show_default=True, type=int, help='How many blocks to batch in single request')
@click.option('-B', '--block-batch-size', default=1, show_default=True, type=int, help='How many blocks to batch in single sync round')
@click.option('-w', '--max-workers', default=5, show_default=True, type=int, help='The number of workers')
@click.option('--log-file', default=None, show_default=True, type=str, help='Log file')
@click.option('--pid-file', default=None, show_default=True, type=str, help='pid file')
def stream(last_synced_block_file, lag, provider_uri, output, start_block, entity_types,
period_seconds=10, batch_size=2, block_batch_size=10, max_workers=5, log_file=None, pid_file=None):
"""Streams all data types to console or Google Pub/Sub."""
configure_logging(log_file)
configure_signals()
entity_types = parse_entity_types(entity_types)
from ethereumetl.streaming.eth_streamer_adapter import EthStreamerAdapter
from blockchainetl.streaming.streamer import Streamer
# TODO: Implement fallback mechanism for provider uris instead of picking randomly
provider_uri = pick_random_provider_uri(provider_uri)
logging.info('Using ' + provider_uri)
streamer_adapter = EthStreamerAdapter(
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
item_exporter=create_item_exporters(output),
batch_size=batch_size,
max_workers=max_workers,
entity_types=entity_types
)
streamer = Streamer(
blockchain_streamer_adapter=streamer_adapter,
last_synced_block_file=last_synced_block_file,
lag=lag,
start_block=start_block,
period_seconds=period_seconds,
block_batch_size=block_batch_size,
pid_file=pid_file
)
streamer.stream()
def parse_entity_types(entity_types):
entity_types = [c.strip() for c in entity_types.split(',')]
# validate passed types
for entity_type in entity_types:
if entity_type not in EntityType.ALL_FOR_STREAMING:
raise click.BadOptionUsage(
'--entity-type', '{} is not an available entity type. Supply a comma separated list of types from {}'
.format(entity_type, ','.join(EntityType.ALL_FOR_STREAMING)))
return entity_types
def pick_random_provider_uri(provider_uri):
provider_uris = [uri.strip() for uri in provider_uri.split(',')]
return random.choice(provider_uris)

View File

@@ -40,6 +40,12 @@ class EthBlock(object):
self.gas_limit = None
self.gas_used = None
self.timestamp = None
self.withdrawals_root = None
self.transactions = []
self.transaction_count = 0
self.base_fee_per_gas = 0
self.withdrawals = []
self.blob_gas_used = None
self.excess_blob_gas = None

View File

@@ -28,3 +28,4 @@ class EthContract(object):
self.function_sighashes = []
self.is_erc20 = False
self.is_erc721 = False
self.block_number = None

View File

@@ -0,0 +1,27 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
class EthGethTrace(object):
def __init__(self):
self.block_number = None
self.transaction_traces = None

View File

@@ -0,0 +1,32 @@
class OriginMarketplaceListing(object):
def __init__(self):
self.listing_id = None
self.ipfs_hash = None
self.listing_type = None
self.category = None
self.subcategory = None
self.language = None
self.title = None
self.description = None
self.price = None
self.currency = None
self.block_number = None
self.log_index = None
class OriginShopProduct(object):
def __init__(self):
self.listing_id = None
self.product_id = None
self.ipfs_path = None
self.external_id = None
self.parent_external_id = None
self.title = None
self.description = None
self.price = None
self.currency = None
self.image = None
self.option1 = None
self.option2 = None
self.option3 = None
self.block_number = None
self.log_index = None

View File

@@ -33,3 +33,10 @@ class EthReceipt(object):
self.logs = []
self.root = None
self.status = None
self.effective_gas_price = None
self.l1_fee = None
self.l1_gas_used = None
self.l1_gas_price = None
self.l1_fee_scalar = None
self.blob_gas_price = None
self.blob_gas_used = None

View File

@@ -28,3 +28,4 @@ class EthToken(object):
self.name = None
self.decimals = None
self.total_supply = None
self.block_number = None

View File

@@ -0,0 +1,44 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
class EthTrace(object):
def __init__(self):
self.block_number = None
self.transaction_hash = None
self.transaction_index = None
self.from_address = None
self.to_address = None
self.value = None
self.input = None
self.output = None
self.trace_type = None
self.call_type = None
self.reward_type = None
self.gas = None
self.gas_used = None
self.subtraces = 0
self.trace_address = None
self.error = None
self.status = None
self.trace_id = None
self.trace_index = None

View File

@@ -34,3 +34,8 @@ class EthTransaction(object):
self.gas = None
self.gas_price = None
self.input = None
self.max_fee_per_gas = None
self.max_priority_fee_per_gas = None
self.transaction_type = None
self.max_fee_per_blob_gas = None
self.blob_versioned_hashes = []

View File

View File

@@ -0,0 +1,12 @@
class EntityType:
BLOCK = 'block'
TRANSACTION = 'transaction'
RECEIPT = 'receipt'
LOG = 'log'
TOKEN_TRANSFER = 'token_transfer'
TRACE = 'trace'
CONTRACT = 'contract'
TOKEN = 'token'
ALL_FOR_STREAMING = [BLOCK, TRANSACTION, LOG, TOKEN_TRANSFER, TRACE, CONTRACT, TOKEN]
ALL_FOR_INFURA = [BLOCK, TRANSACTION, LOG, TOKEN_TRANSFER]

View File

@@ -22,7 +22,6 @@
import json
from copy import deepcopy
ERC20_ABI = json.loads('''
[
@@ -240,19 +239,109 @@ ERC20_ABI = json.loads('''
],
"name": "Approval",
"type": "event"
},
{
"constant": true,
"inputs": [],
"name": "NAME",
"outputs": [
{
"name": "",
"type": "string"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "SYMBOL",
"outputs": [
{
"name": "",
"type": "string"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "DECIMALS",
"outputs": [
{
"name": "",
"type": "uint8"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
}
]
''')
def replace_output_type(func, new_output_type, output_name=''):
copy = deepcopy(func)
outputs = copy['outputs']
for output in outputs:
if output['name'] == output_name:
output['type'] = new_output_type
return copy
BYTES32_ERC20_ABI = [replace_output_type(func, 'bytes32') if func['name'] in ['symbol', 'name'] else func
for func in ERC20_ABI]
ERC20_ABI_ALTERNATIVE_1 = json.loads('''
[
{
"constant": true,
"inputs": [],
"name": "symbol",
"outputs": [
{
"name": "",
"type": "bytes32"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "SYMBOL",
"outputs": [
{
"name": "",
"type": "bytes32"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "name",
"outputs": [
{
"name": "",
"type": "bytes32"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "NAME",
"outputs": [
{
"name": "",
"type": "bytes32"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
}
]
''')

View File

@@ -0,0 +1,23 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

View File

@@ -20,47 +20,93 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import logging
import time
from requests.exceptions import Timeout as RequestsTimeout, HTTPError, TooManyRedirects
from web3.utils.threads import Timeout as Web3Timeout
from web3._utils.threads import Timeout as Web3Timeout
from ethereumetl.executors.bounded_executor import BoundedExecutor
from ethereumetl.executors.fail_safe_executor import FailSafeExecutor
from ethereumetl.misc.retriable_value_error import RetriableValueError
from ethereumetl.progress_logger import ProgressLogger
from ethereumetl.utils import dynamic_batch_iterator
RETRY_EXCEPTIONS = (ConnectionError, HTTPError, RequestsTimeout, TooManyRedirects, Web3Timeout, OSError)
RETRY_EXCEPTIONS = (ConnectionError, HTTPError, RequestsTimeout, TooManyRedirects, Web3Timeout, OSError,
RetriableValueError)
BATCH_CHANGE_COOLDOWN_PERIOD_SECONDS = 2 * 60
# Executes the given work in batches, reducing the batch size exponentially in case of errors.
class BatchWorkExecutor:
def __init__(self, starting_batch_size, max_workers, retry_exceptions=RETRY_EXCEPTIONS):
def __init__(self, starting_batch_size, max_workers, retry_exceptions=RETRY_EXCEPTIONS, max_retries=5):
self.batch_size = starting_batch_size
self.max_batch_size = starting_batch_size
self.latest_batch_size_change_time = None
self.max_workers = max_workers
# Using bounded executor prevents unlimited queue growth
# and allows monitoring in-progress futures and failing fast in case of errors.
self.executor = FailSafeExecutor(BoundedExecutor(1, self.max_workers))
self.retry_exceptions = retry_exceptions
self.max_retries = max_retries
self.progress_logger = ProgressLogger()
self.logger = logging.getLogger('BatchWorkExecutor')
def execute(self, work_iterable, work_handler, total_items=None):
self.progress_logger.start(total_items=total_items)
for batch in dynamic_batch_iterator(work_iterable, lambda: self.batch_size):
self.executor.submit(self._fail_safe_execute, work_handler, batch)
# Check race conditions
def _fail_safe_execute(self, work_handler, batch):
try:
work_handler(batch)
self._try_increase_batch_size(len(batch))
except self.retry_exceptions:
batch_size = self.batch_size
# Reduce the batch size. Subsequent batches will be 2 times smaller
if batch_size == len(batch) and batch_size > 1:
self.batch_size = int(batch_size / 2)
# For the failed batch try handling items one by one
self.logger.exception('An exception occurred while executing work_handler.')
self._try_decrease_batch_size(len(batch))
self.logger.info('The batch of size {} will be retried one item at a time.'.format(len(batch)))
for item in batch:
work_handler([item])
execute_with_retries(work_handler, [item],
max_retries=self.max_retries, retry_exceptions=self.retry_exceptions)
self.progress_logger.track(len(batch))
# Some acceptable race conditions are possible
def _try_decrease_batch_size(self, current_batch_size):
batch_size = self.batch_size
if batch_size == current_batch_size and batch_size > 1:
new_batch_size = int(current_batch_size / 2)
self.logger.info('Reducing batch size to {}.'.format(new_batch_size))
self.batch_size = new_batch_size
self.latest_batch_size_change_time = time.time()
def _try_increase_batch_size(self, current_batch_size):
if current_batch_size * 2 <= self.max_batch_size:
current_time = time.time()
latest_batch_size_change_time = self.latest_batch_size_change_time
seconds_since_last_change = current_time - latest_batch_size_change_time \
if latest_batch_size_change_time is not None else 0
if seconds_since_last_change > BATCH_CHANGE_COOLDOWN_PERIOD_SECONDS:
new_batch_size = current_batch_size * 2
self.logger.info('Increasing batch size to {}.'.format(new_batch_size))
self.batch_size = new_batch_size
self.latest_batch_size_change_time = current_time
def shutdown(self):
self.executor.shutdown()
self.progress_logger.finish()
def execute_with_retries(func, *args, max_retries=5, retry_exceptions=RETRY_EXCEPTIONS, sleep_seconds=1):
for i in range(max_retries):
try:
return func(*args)
except retry_exceptions:
logging.exception('An exception occurred while executing execute_with_retries. Retry #{}'.format(i))
if i < max_retries - 1:
logging.info('The request will be retried after {} seconds. Retry #{}'.format(sleep_seconds, i))
time.sleep(sleep_seconds)
continue
else:
raise

View File

@@ -44,7 +44,7 @@ class BaseItemExporter(object):
self._configure(kwargs)
def _configure(self, options, dont_fail=False):
"""Configure the exporter by poping options from the ``options`` dict.
"""Configure the exporter by popping options from the ``options`` dict.
If dont_fail is set, it won't raise an exception on unexpected options
(useful for using with keyword arguments in subclasses constructors)
"""
@@ -120,7 +120,7 @@ class CsvItemExporter(BaseItemExporter):
def _join_if_needed(self, value):
if isinstance(value, (list, tuple)):
try:
return self._join_multivalued.join(value)
return self._join_multivalued.join(str(x) for x in value)
except TypeError: # list in value may not contain strings
pass
return value

View File

View File

@@ -0,0 +1,31 @@
import logging
import requests
logger = logging.getLogger('ipfs')
IPFS_TIMEOUT = 5 # Timeout in second
IPFS_NUM_ATTEMPTS = 3
# A simple client to fetch content from IPFS gateways.
class IpfsClient:
def __init__(self, gatewayUrls):
self._gatewayUrls = gatewayUrls
def _get(self, path, json):
for i in range(IPFS_NUM_ATTEMPTS):
# Round-robin thru the gateways.
gatewayUrl = self._gatewayUrls[i % len(self._gatewayUrls)]
try:
url = "{}/{}".format(gatewayUrl, path)
r = requests.get(url, timeout=IPFS_TIMEOUT)
r.raise_for_status()
return r.json() if json else r.text
except Exception as e:
logger.error("Attempt #{} - Failed downloading {}: {}".format(i + 1, path, e))
raise Exception("IPFS download failure for hash {}".format(path))
def get(self, path):
return self._get(path, False)
def get_json(self, path):
return self._get(path, True)

139
ethereumetl/ipfs/origin.py Normal file
View File

@@ -0,0 +1,139 @@
import logging
import re
from ethereumetl.domain.origin import OriginMarketplaceListing, OriginShopProduct
from ethereumetl.ipfs.client import IpfsClient
logger = logging.getLogger('origin')
IPFS_PRIMARY_GATEWAY_URL = 'https://cf-ipfs.com/ipfs'
IPFS_SECONDARY_GATEWAY_URL = 'https://gateway.ipfs.io/ipfs'
# Returns an IPFS client that can be used to fetch Origin Protocol's data.
def get_origin_ipfs_client():
return IpfsClient([IPFS_PRIMARY_GATEWAY_URL, IPFS_SECONDARY_GATEWAY_URL])
# Parses the shop's HTML index page to extract the name of the IPFS directory under
# which all the shop data is located.
def _get_shop_data_dir(shop_index_page):
match = re.search('<link rel="data-dir" href="(.+?)"', shop_index_page)
return match.group(1) if match else None
# Returns the list of products from an Origin Protocol shop.
def _get_origin_shop_products(receipt_log, listing_id, ipfs_client, shop_ipfs_hash):
results = []
shop_index_page = ipfs_client.get(shop_ipfs_hash + "/index.html")
shop_data_dir = _get_shop_data_dir(shop_index_page)
path = "{}/{}".format(shop_ipfs_hash, shop_data_dir) if shop_data_dir else shop_ipfs_hash
logger.debug("Using shop path {}".format(path))
products_path = "{}/{}".format(path, 'products.json')
try:
products = ipfs_client.get_json(products_path)
except Exception as e:
logger.error("Listing {} Failed downloading product {}: {}".format(listing_id, products_path, e))
return results
logger.info("Found {} products in for listing {}".format(len(products), listing_id))
# Go through all the products from the shop.
for product in products:
product_id = product.get('id')
if not product_id:
logger.error('Product entry with missing id in products.json')
continue
logger.info("Processing product {}".format(product_id))
# Fetch the product details to get the variants.
product_base_path = "{}/{}".format(path, product_id)
product_data_path = "{}/{}".format(product_base_path, 'data.json')
try:
product = ipfs_client.get_json(product_data_path)
except Exception as e:
logger.error("Failed downloading {}: {}".format(product_data_path, e))
continue
# Extract the top product.
result = OriginShopProduct()
result.block_number = receipt_log.block_number
result.log_index = receipt_log.log_index
result.listing_id = listing_id
result.product_id = "{}-{}".format(listing_id, product_id)
result.ipfs_path = product_base_path
result.external_id = str(product.get('externalId')) if product.get('externalId') else None
result.parent_external_id = None
result.title = product.get('title')
result.description = product.get('description')
result.price = product.get('price')
result.currency = product.get('currency', 'fiat-USD')
result.option1 = None
result.option2 = None
result.option3 = None
result.image = product.get('image')
results.append(result)
# Extract the variants, if any.
variants = product.get('variants', [])
if len(variants) > 0:
logger.info("Found {} variants".format(len(variants)))
for variant in variants:
result = OriginShopProduct()
result.block_number = receipt_log.block_number
result.log_index = receipt_log.log_index
result.listing_id = listing_id
result.product_id = "{}-{}".format(listing_id, variant.get('id'))
result.ipfs_path = product_base_path
result.external_id = str(variant.get('externalId')) if variant.get('externalId') else None
result.parent_external_id = str(product.get('externalId')) if product.get('externalId') else None
result.title = variant.get('title')
result.description = product.get('description')
result.price = variant.get('price')
result.currency = product.get('currency', 'fiat-USD')
result.option1 = variant.get('option1')
result.option2 = variant.get('option2')
result.option3 = variant.get('option3')
result.image = variant.get('image')
results.append(result)
return results
# Returns a listing from the Origin Protocol marketplace.
def get_origin_marketplace_data(receipt_log, listing_id, ipfs_client, ipfs_hash):
# Load the listing's metadata from IPFS.
try:
listing_data = ipfs_client.get_json(ipfs_hash)
except Exception as e:
logger.error("Extraction failed. Listing {} Listing hash {} - {}".format(listing_id, ipfs_hash, e))
return None, []
# Fill-in an OriginMarketplaceListing object based on the IPFS data.
listing = OriginMarketplaceListing()
listing.block_number = receipt_log.block_number
listing.log_index = receipt_log.log_index
listing.listing_id = str(listing_id)
listing.ipfs_hash = ipfs_hash
listing.listing_type = listing_data.get('listingType', '')
listing.category = listing_data.get('category', '')
listing.subcategory = listing_data.get('subCategory', '')
listing.language = listing_data.get('language', '')
listing.title = listing_data.get('title', '')
listing.description = listing_data.get('description', '')
listing.price = listing_data.get('price', {}).get('amount', '')
listing.currency = listing_data.get('price', {}).get('currency', '')
# If it is a shop listing, also extract all of the shop data.
shop_listings = []
shop_ipfs_hash = listing_data.get('shopIpfsHash')
if shop_ipfs_hash:
try:
shop_listings = _get_origin_shop_products(receipt_log, listing_id, ipfs_client, shop_ipfs_hash)
except Exception as e:
logger.error("Extraction failed. Listing {} Shop hash {} - {}".format(listing_id, shop_ipfs_hash, e))
return listing, shop_listings

View File

@@ -0,0 +1,23 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

View File

@@ -0,0 +1,287 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import csv
import logging
import os
import shutil
from time import time
from ethereumetl.csv_utils import set_max_field_size_limit
from blockchainetl.file_utils import smart_open
from ethereumetl.jobs.export_blocks_job import ExportBlocksJob
from ethereumetl.jobs.export_contracts_job import ExportContractsJob
from ethereumetl.jobs.export_receipts_job import ExportReceiptsJob
from ethereumetl.jobs.export_token_transfers_job import ExportTokenTransfersJob
from ethereumetl.jobs.export_tokens_job import ExportTokensJob
from ethereumetl.jobs.exporters.blocks_and_transactions_item_exporter import blocks_and_transactions_item_exporter
from ethereumetl.jobs.exporters.contracts_item_exporter import contracts_item_exporter
from ethereumetl.jobs.exporters.receipts_and_logs_item_exporter import receipts_and_logs_item_exporter
from ethereumetl.jobs.exporters.token_transfers_item_exporter import token_transfers_item_exporter
from ethereumetl.jobs.exporters.tokens_item_exporter import tokens_item_exporter
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
from ethereumetl.web3_utils import build_web3
logger = logging.getLogger('export_all')
def is_log_filter_supported(provider_uri):
return 'infura' not in provider_uri
def extract_csv_column_unique(input, output, column):
set_max_field_size_limit()
with smart_open(input, 'r') as input_file, smart_open(output, 'w') as output_file:
reader = csv.DictReader(input_file)
seen = set() # set for fast O(1) amortized lookup
for row in reader:
if row[column] in seen:
continue
seen.add(row[column])
output_file.write(row[column] + '\n')
def export_all_common(partitions, output_dir, provider_uri, max_workers, batch_size):
for batch_start_block, batch_end_block, partition_dir in partitions:
# # # start # # #
start_time = time()
padded_batch_start_block = str(batch_start_block).zfill(8)
padded_batch_end_block = str(batch_end_block).zfill(8)
block_range = '{padded_batch_start_block}-{padded_batch_end_block}'.format(
padded_batch_start_block=padded_batch_start_block,
padded_batch_end_block=padded_batch_end_block,
)
file_name_suffix = '{padded_batch_start_block}_{padded_batch_end_block}'.format(
padded_batch_start_block=padded_batch_start_block,
padded_batch_end_block=padded_batch_end_block,
)
# # # blocks_and_transactions # # #
blocks_output_dir = '{output_dir}/blocks{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(blocks_output_dir), exist_ok=True)
transactions_output_dir = '{output_dir}/transactions{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(transactions_output_dir), exist_ok=True)
blocks_file = '{blocks_output_dir}/blocks_{file_name_suffix}.csv'.format(
blocks_output_dir=blocks_output_dir,
file_name_suffix=file_name_suffix,
)
transactions_file = '{transactions_output_dir}/transactions_{file_name_suffix}.csv'.format(
transactions_output_dir=transactions_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Exporting blocks {block_range} to {blocks_file}'.format(
block_range=block_range,
blocks_file=blocks_file,
))
logger.info('Exporting transactions from blocks {block_range} to {transactions_file}'.format(
block_range=block_range,
transactions_file=transactions_file,
))
job = ExportBlocksJob(
start_block=batch_start_block,
end_block=batch_end_block,
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
max_workers=max_workers,
item_exporter=blocks_and_transactions_item_exporter(blocks_file, transactions_file),
export_blocks=blocks_file is not None,
export_transactions=transactions_file is not None)
job.run()
# # # token_transfers # # #
token_transfers_file = None
if is_log_filter_supported(provider_uri):
token_transfers_output_dir = '{output_dir}/token_transfers{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(token_transfers_output_dir), exist_ok=True)
token_transfers_file = '{token_transfers_output_dir}/token_transfers_{file_name_suffix}.csv'.format(
token_transfers_output_dir=token_transfers_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Exporting ERC20 transfers from blocks {block_range} to {token_transfers_file}'.format(
block_range=block_range,
token_transfers_file=token_transfers_file,
))
job = ExportTokenTransfersJob(
start_block=batch_start_block,
end_block=batch_end_block,
batch_size=batch_size,
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
item_exporter=token_transfers_item_exporter(token_transfers_file),
max_workers=max_workers)
job.run()
# # # receipts_and_logs # # #
cache_output_dir = '{output_dir}/.tmp{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(cache_output_dir), exist_ok=True)
transaction_hashes_file = '{cache_output_dir}/transaction_hashes_{file_name_suffix}.csv'.format(
cache_output_dir=cache_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Extracting hash column from transaction file {transactions_file}'.format(
transactions_file=transactions_file,
))
extract_csv_column_unique(transactions_file, transaction_hashes_file, 'hash')
receipts_output_dir = '{output_dir}/receipts{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(receipts_output_dir), exist_ok=True)
logs_output_dir = '{output_dir}/logs{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(logs_output_dir), exist_ok=True)
receipts_file = '{receipts_output_dir}/receipts_{file_name_suffix}.csv'.format(
receipts_output_dir=receipts_output_dir,
file_name_suffix=file_name_suffix,
)
logs_file = '{logs_output_dir}/logs_{file_name_suffix}.csv'.format(
logs_output_dir=logs_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Exporting receipts and logs from blocks {block_range} to {receipts_file} and {logs_file}'.format(
block_range=block_range,
receipts_file=receipts_file,
logs_file=logs_file,
))
with smart_open(transaction_hashes_file, 'r') as transaction_hashes:
job = ExportReceiptsJob(
transaction_hashes_iterable=(transaction_hash.strip() for transaction_hash in transaction_hashes),
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
max_workers=max_workers,
item_exporter=receipts_and_logs_item_exporter(receipts_file, logs_file),
export_receipts=receipts_file is not None,
export_logs=logs_file is not None)
job.run()
# # # contracts # # #
contract_addresses_file = '{cache_output_dir}/contract_addresses_{file_name_suffix}.csv'.format(
cache_output_dir=cache_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Extracting contract_address from receipt file {receipts_file}'.format(
receipts_file=receipts_file
))
extract_csv_column_unique(receipts_file, contract_addresses_file, 'contract_address')
contracts_output_dir = '{output_dir}/contracts{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(contracts_output_dir), exist_ok=True)
contracts_file = '{contracts_output_dir}/contracts_{file_name_suffix}.csv'.format(
contracts_output_dir=contracts_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Exporting contracts from blocks {block_range} to {contracts_file}'.format(
block_range=block_range,
contracts_file=contracts_file,
))
with smart_open(contract_addresses_file, 'r') as contract_addresses_file:
contract_addresses = (contract_address.strip() for contract_address in contract_addresses_file
if contract_address.strip())
job = ExportContractsJob(
contract_addresses_iterable=contract_addresses,
batch_size=batch_size,
batch_web3_provider=ThreadLocalProxy(lambda: get_provider_from_uri(provider_uri, batch=True)),
item_exporter=contracts_item_exporter(contracts_file),
max_workers=max_workers)
job.run()
# # # tokens # # #
if token_transfers_file is not None:
token_addresses_file = '{cache_output_dir}/token_addresses_{file_name_suffix}'.format(
cache_output_dir=cache_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Extracting token_address from token_transfers file {token_transfers_file}'.format(
token_transfers_file=token_transfers_file,
))
extract_csv_column_unique(token_transfers_file, token_addresses_file, 'token_address')
tokens_output_dir = '{output_dir}/tokens{partition_dir}'.format(
output_dir=output_dir,
partition_dir=partition_dir,
)
os.makedirs(os.path.dirname(tokens_output_dir), exist_ok=True)
tokens_file = '{tokens_output_dir}/tokens_{file_name_suffix}.csv'.format(
tokens_output_dir=tokens_output_dir,
file_name_suffix=file_name_suffix,
)
logger.info('Exporting tokens from blocks {block_range} to {tokens_file}'.format(
block_range=block_range,
tokens_file=tokens_file,
))
with smart_open(token_addresses_file, 'r') as token_addresses:
job = ExportTokensJob(
token_addresses_iterable=(token_address.strip() for token_address in token_addresses),
web3=ThreadLocalProxy(lambda: build_web3(get_provider_from_uri(provider_uri))),
item_exporter=tokens_item_exporter(tokens_file),
max_workers=max_workers)
job.run()
# # # finish # # #
shutil.rmtree(os.path.dirname(cache_output_dir))
end_time = time()
time_diff = round(end_time - start_time, 5)
logger.info('Exporting blocks {block_range} took {time_diff} seconds'.format(
block_range=block_range,
time_diff=time_diff,
))

View File

@@ -24,7 +24,7 @@
import json
from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.jobs.base_job import BaseJob
from blockchainetl.jobs.base_job import BaseJob
from ethereumetl.json_rpc_requests import generate_get_block_by_number_json_rpc
from ethereumetl.mappers.block_mapper import EthBlockMapper
from ethereumetl.mappers.transaction_mapper import EthTransactionMapper
@@ -72,7 +72,7 @@ class ExportBlocksJob(BaseJob):
def _export_batch(self, block_number_batch):
blocks_rpc = list(generate_get_block_by_number_json_rpc(block_number_batch, self.export_transactions))
response = self.batch_web3_provider.make_request(json.dumps(blocks_rpc))
response = self.batch_web3_provider.make_batch_request(json.dumps(blocks_rpc))
results = rpc_response_batch_to_results(response)
blocks = [self.block_mapper.json_dict_to_block(result) for result in results]

View File

@@ -24,14 +24,15 @@
import json
from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.jobs.base_job import BaseJob
from blockchainetl.jobs.base_job import BaseJob
from ethereumetl.json_rpc_requests import generate_get_code_json_rpc
from ethereumetl.mappers.contract_mapper import EthContractMapper
# Exports contracts bytecode
from ethereumetl.service.eth_contract_service import EthContractService
from ethereumetl.utils import rpc_response_to_result
# Exports contracts bytecode
class ExportContractsJob(BaseJob):
def __init__(
self,
@@ -57,13 +58,13 @@ class ExportContractsJob(BaseJob):
def _export_contracts(self, contract_addresses):
contracts_code_rpc = list(generate_get_code_json_rpc(contract_addresses))
response_batch = self.batch_web3_provider.make_request(json.dumps(contracts_code_rpc))
response_batch = self.batch_web3_provider.make_batch_request(json.dumps(contracts_code_rpc))
contracts = []
for response in response_batch:
# request id is the index of the contract address in contract_addresses list
request_id = response['id']
result = response['result']
result = rpc_response_to_result(response)
contract_address = contract_addresses[request_id]
contract = self._get_contract(contract_address, result)

View File

@@ -0,0 +1,80 @@
# MIT License
#
# Copyright (c) 2018 Evgeniy Filatov, evgeniyfilatov@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.json_rpc_requests import generate_trace_block_by_number_json_rpc
from blockchainetl.jobs.base_job import BaseJob
from ethereumetl.mappers.geth_trace_mapper import EthGethTraceMapper
from ethereumetl.utils import validate_range, rpc_response_to_result
# Exports geth traces
class ExportGethTracesJob(BaseJob):
def __init__(
self,
start_block,
end_block,
batch_size,
batch_web3_provider,
max_workers,
item_exporter):
validate_range(start_block, end_block)
self.start_block = start_block
self.end_block = end_block
self.batch_web3_provider = batch_web3_provider
self.batch_work_executor = BatchWorkExecutor(batch_size, max_workers)
self.item_exporter = item_exporter
self.geth_trace_mapper = EthGethTraceMapper()
def _start(self):
self.item_exporter.open()
def _export(self):
self.batch_work_executor.execute(
range(self.start_block, self.end_block + 1),
self._export_batch,
total_items=self.end_block - self.start_block + 1
)
def _export_batch(self, block_number_batch):
trace_block_rpc = list(generate_trace_block_by_number_json_rpc(block_number_batch))
response = self.batch_web3_provider.make_batch_request(json.dumps(trace_block_rpc))
for response_item in response:
block_number = response_item.get('id')
result = rpc_response_to_result(response_item)
geth_trace = self.geth_trace_mapper.json_dict_to_geth_trace({
'block_number': block_number,
'transaction_traces': [tx_trace.get('result') for tx_trace in result],
})
self.item_exporter.export_item(self.geth_trace_mapper.geth_trace_to_dict(geth_trace))
def _end(self):
self.batch_work_executor.shutdown()
self.item_exporter.close()

Some files were not shown because too many files have changed in this diff Show More