Hi Sourav,
This is knowledge transfer of lessons I learnt working with Ethereum. I had to figure out a lot of this from scratch, so hopefully this saves you a lot of time.
The current geth node on andromeda is in a local environment so passwords etc will be included in this write-up (see the last page), but please change them as soon as possible for hygiene purposes.
The dataset on andromeda is a copy of Aashish's and Ivica's forked geth node that they used for the MAIAN project.
For this project, we will only be using Geth as an archival node (i.e. to query transaction information, or to get EVM execution using debug_traceTransaction). So there is theoretically no need to use the forked geth node.
While you can start working with the copy of Geth on andromeda immediately, I would recommend reprovisioning a Geth node and syncing a full copy of the Ethereum blockchain (see the current dataset's limitations below). I found myself spending more time fighting the problems with the current dataset (and not having sudo access to andromeda).
The only issue is that re-provisioning a Geth node may take up to a week to sync. You can also look into services like Quiknode (https://quiknode.io), but these cost money.
The big thing is that the dataset is actually incomplete. Many of the Merkle tries from genesis block to the 4 millionth block are missing (or have been accidentally pruned). I spent a large amount of time trying to debug an error until I realized that the dataset is only complete from block 4 million onwards.
If I had to do it again, I would have just provisioned an AWS EC2 instance (t2.medium) and attached a 3TB EBS volume, and re-synced the entire Ethereum chain on Geth. It would have allowed me to work remotely (do you have NUS vpn?) as well. However, these things cost money so we're stuck with andromeda for now.
Quite frankly, the coding task for this project was straightforward, but the difficulty was in learning the tooling in the Ethereum ecosystem, and where to look for the information needed. If you are already familiar with Ethereum, the EVM, and opcodes then feel free to skip this section.
At a high level, you will first need to enumerate the change each transaction makes to the blockchain. This can be divided into three categories:
These can be obtained from web3.eth.getBlock(...).transactions
. You can look up the web3 API to see the methods on an individual transaction, or my simple python code.
for curBlockNum in range(startBlock,endBlock):
# Gets current block
currentBlock = web3.eth.getBlock(curBlockNum, full_transactions=True)
print("+++++++ Current block: " + str(curBlockNum) + " +++++++++++")
# Iterates through current block's transactions
for txn in currentBlock.transactions:
# fromAddress details
fromAddr = txn['from']
fromAddrInitialBalance = getInitialBalance(fromAddr)
# Shard
shard = hash(fromAddr) % 50
# shard = randint(0,49)
# toAddress details
toAddr = txn['to']
if (toAddr): toAddrInitialBalance = getInitialBalance(toAddr)
In Ethereum, internal transactions (when a transaction to a smart contract triggers an internal transaction from the contract to another address) are not included as part of the web3.eth.getblock(..).transactions
data object.
To find these internal transactions, you will need to use Geth (not web3!) to run debug_traceTransaction
. In gist this allows you to "replay" the execution of a contract, so you can see what the EVM is doing. You can read more about this method here https://github.com/ethereum/go-ethereum/wiki/Management-APIs#debug_tracetransaction.
There is no python-geth library (as far as I could find) that worked well. Instead, we link up with Geth using the JSON-RPC.
For internal transactions (to the best of my knowledge), they are represented by the CALL
opcode. You should refer to the Ethereum yellowpaper (or the much simpler-to-understand beigepaper) to understand how CALL
works.
debug_traceTransactions
allows you to see the values CALL
takes in.
- stack[-1] is the gas amount
- stack[-2] is the address the amount is sent to
- stack[-3] is the amount being sent in wei
You should refer to the yellowpaper for more detail.
# Gets EVM Trace from debug_traceTransaction
params = [txnHash]
payload = {
"jsonrpc":"2.0",
"method":"debug_traceTransaction",
"params":params,
"id":1
}
headers = {'Content-type':'application/json'}
debugTraceTransaction = session.post(
'http://localhost:'+rpcport,
json=payload,
headers=headers
)
transactionTrace = debugTraceTransaction.json()['result']['structLogs']
# Handler for different EVM Opcodes
if (transactionTrace):
for log in transactionTrace:
if(log['op'] == 'CALL'):
txnGas = int(log['stack'][-1], 16)
internalFromAddr = toAddr
internalToAddr = '0x' + log['stack'][-2][24:64] # Turn 64 char string into formatted address TODO: refactor into helper methhod
internalTxnValue = int(log['stack'][-3], 16)
internalFromAddrInitialBalance = getInitialBalance(internalFromAddr) + txnValue # Note: We add txnValue to cover instances where contract is a "pass through" contract
internalToAddrInitialBalance = getInitialBalance(internalToAddr)
# Sanity check for internal transactions
if (internalFromAddrInitialBalance < internalTxnValue):
debug_CALL_transactions = True
debug_transaction = True
if (debug_CALL_transactions):
print("====== Hash: " + txnHash)
print("TxnGas: " + str(txnGas))
print("Internal fromAddr: " + internalFromAddr)
print("Internal toAddr: " + internalToAddr)
print("Internal txnValue: " + str(web3.fromWei(internalTxnValue, 'ether')))
debug_CALL_transactions = False
Similar to CALL
, SLOAD
loads information from a smart contract's storage, while SSTORE
writes it to storage. You will need to enumerate how this mutates the chain. This is fairly similar to CALL
in that it takes arguments from the stack and mutates the key-value pairs in the patricia tree.
The code that runs this is untested; one thing I wanted to put more time into was to understand how I could effectively test that my code was picking up the changes correctly.
(Perhaps - checking internal ERC20 token transfers? These are state changes?)
I did not have enough time to look into this, but perhaps you could look into other types of EVM opcodes that could mutate state. In my mind, one thing I did not look at was Selfdestruct, which would definitely require some sort of transaction ordering.
In particular, I would focus on the following opcodes:
- DELEGATECALL
- CALLCODE
- SELFDESTRUCT or SUICIDE
- CREATE (???)
I originally approached this problem incorrectly, from the perspective of "slotting" transactions into shards. This was conceptually wrong and quite frankly an uninteresting exercise.
Prateek's talk helped to clarify the intent of this exercise, and it's basically to create a DAG of all transactions and their dependence. In gist:
Tx(1) -> Tx(3) -> Tx(2) -> Tx(4)
I was not very familiar with the Python ecosystem, but after some time I found the following two libraries that might help with the creation of a DAG-type data structure:
- NetworkX: a Python network library, that allows you store data in nodes. However given the size of our dataset I am not sure whether you can hold it in RAM
- Neo4J: a graph database that will enable you to hold data in nodes and edges and persist it. It has some snazzy visualizations as could help you visualize the transaction dependence.
user: daniel pass: triangleChicken1234# <- please change the ASAP once you are logged in.
You may also opt to create a new user on andromeda and cp the geth filestore over to your user's directory.
#!/usr/bin/expect -f
spawn ssh -X [email protected]
expect "assword:"
send "7L6vJ*LGMxf5"
interact
#!/bin/bash
/mnt/c/daniel/Test/go-ethereum/build/bin/geth --datadir /mnt/c/daniel/mychainfull/ --rpc --rpcapi "eth,net,web3,debug" --port 9001 --rpcport 9111 --mine --maxpeers 0 --etherbase 0 --ethash.dagdir ethash --latestblock 43340a6d232532c328211d8a8c0fa84af658dbff1f4906ab7a7d4e41f82fe3a3
This is useful if you want to use sublime text or atom to edit the files. I had been using vim on andromeda, and it was not the most productive experience.
#!/usr/bin/expect -f
spawn sshfs [email protected]:/home/daniel /home/daniel/andromedafs
expect "assword:"
send "7L6vJ*LGMxf5"
interact