How to Filter Wallet Transactions Related to Token

How to Filter Wallet Transactions Related to Token

Recently I was asked to generate a list of all transactions related to one token and my company's wallet. I started from checking Etherscan and trying to figure out if I could somehow download the list of transaction from there. I quickly found out they allow to export data I need to a CSV file but the it is limited to 5000 rows. Not hard to guess that the wallet I was interested had significantly higher number of transactions than 5000. What a bummer. It means the only way to get the list I want is writing a script that collects the transactions I needed.

I decided to share my learnings and open source my script because I believe this problem might be common problem enough so you might benefit from it. At the end, all companies in blockchain space have similar issues.

You can find the script and its source code on Github

In this article, I'm using my personal wallet to simplify examples. My personal wallet has only 17 transactions but  the script on my Github works well with significantly hight amounts of transactions, so feel free to use it. I will be working on my transactions related DAI stablecoin. Let's check my wallet history on Etherscan.

We can find out I did a bunch of transfers and a few smart contract interactions (methods such as Deposit, Withdraw, File Order, Swap Exact Token and so on).
I'm mentioning the smart contract interactions because they are a harder to process. Their logs are encoded but following this example it is irrelevant.

Disclaimer

All examples in this article are written in Python. I am using Python v3.10.5 and extensively taking advantage of  library called web3.py. This library allows me to interact with Ethereum, you can install it using PyPI by typing in your terminal:

$ pip install web3

If you can write JavaScript only - don't worry. You can translate my snippets to JavaScript and use web3.js library instead. There are some differences in namespaces and naming but has all necessary methods and some extras.

Let's get started!

The first thing the script needs to do is establishing connection to Ethereum blockchain. To do so, we will need to find an Ethereum API. If you don't run your own node, you can get a free API from Infura, QuickNode or Moralis. Choose one, I'm using Infura for my personal projects (As you can see, I pasted it in line 9).

import asyncio
from web3 import Web3, AsyncHTTPProvider
from web3.eth import AsyncEth, Eth
from web3.net import AsyncNet


async def runner():
    ethereum_api_url = "https://mainnet.infura.io/v3/MY_SUPER_SECRET_TOKEN"

    async_web3 = Web3(
        AsyncHTTPProvider(ethereum_api_url),
        modules={
            "eth": (AsyncEth,),
            "net": (AsyncNet,),
        },
        middlewares=[],
    )
    web3_modules = get_default_modules()
    web3_modules["eth"] = Eth

    block = await async_web3.eth.get_block("latest")
    print("Current block:", block.number)

    # All code goes here
    await asyncio.sleep(0)


if __name__ == "__main__":
    asyncio.run(runner())

I created an asynchronous web3 provider, and I will be using it to interact with blockchain. You might have noticed that on the snippet above I'm fetching the latest block. It isn't require to complete my task, I'm doing it to verify my web3 provider is connected to Ethereum.

When you run this script you will see the following result:

$ python runner.py
Current block: 16436276

Now we are ready to scan the blockchain.

Hunting logs

Let's look into my wallet transaction history again, especially my very first transaction. As you can see, the first transaction is 0xe3d38...cefaa2e and it was dispatched in block 13352962. At the moment of writing this article, Ethereum mined block 16436276; it means my first transaction was executed more than 3 million blocks ago! Obviously, retrieving all transactions by querying every single block isn't really an option. Don't get me wrong - it's possible but extremely time consuming. Requesting a single historical block might take even a few seconds so the whole processing would take at least few hours (or days!). It's too slow.

Let's dig a bit deeper and focus on the Logs tab on Etherscan. As you can see, this transaction has one event - a Transfer event.

Ethereum is indexing logs of every transactions. This indexing means It allows to request particular logs by providing broader block ranges and smart contract addresses. The block range can be broad but the output cannot exceed 10000 logs per call. Isn't it amazing?

Let's return back to our Transfer event. You can spot it has three topics - topic 0 is the Transfer event's signature. It tells us that all Transfer events have signature 0xddf252ad...f523b3ef. This information is crucial, we will be using it to find all transfer to our wallet but I will return to this later. Let's focus on topic 1 and 2 for now.

Most of tokens on Ethereum (excluding NFTs and Ether) are ERC-20 tokens and DAI isn't any different. The easiest way to find out about topics exposed by Transfer event, will be reading the ERC-20 token smart contract. Fortunately, some of them are open source and we can find their source code on Github. We will be looking into interfaces and search for the Transfer event and its definition.

// SPDX-License-Identifier: MIT
// OpenZeppelin Contracts (last updated v4.6.0) (token/ERC20/IERC20.sol)

pragma solidity ^0.8.0;

/**
 * @dev Interface of the ERC20 standard as defined in the EIP.
 */
interface IERC20 {
    /**
     * @dev Emitted when `value` tokens are moved from one account (`from`) to
     * another (`to`).
     *
     * Note that `value` may be zero.
     */
    event Transfer(address indexed from, address indexed to, uint256 value);
    
    ...
    
}

As you can see, the Transfer event has two indexed arguments - from and to - and value which isn't indexed. The indexed arguments are exposed as topics, and we will be able to request logs associated with them.

Now we know that topic 1 is the from address and topic 2 is the to address. Let's request all incoming DAI transactions to my wallet from block 13,352,962 (my first DAI transaction) to block 16,436,276 (the current block).

incoming_logs = await async_web3.eth.get_logs(
    {
        "fromBlock": 13352962,
        "toBlock": 16436276,
        "address": "0x6B175474E89094C44Da98b954EedeAC495271d0F", # DAI
        "topics": [
            "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
            None,
            "0x000000000000000000000000d5e73f9199e67b6ff8dface1767a1bdadf1a7242",
        ],
    }
)

When you look closer into this snippet, you might feel be confused by two things. The first one is the address field. It isn't my wallet address but DAI address (smart contract). If you'd provide the wallet address there you'd find nothing. Wallets don't implement any events with Transfer signature.

The second thing, the wallet address is prefixed with 24 zeros. It is related to EVM - the topic user 32 bytes data types and wallet addresses are stored in 20 bytes. It means I need to fill the "free" space with zeros.

If you run the snippet code, you will see the array of logs that looks like this one:

[
    AttributeDict({
        'address': '0x6B175474E89094C44Da98b954EedeAC495271d0F', 
        'blockHash': HexBytes('0xa67b8ceaeb79ec2592e161ee2efee6fba3fd329c87131d9335ccaa869cc857ec'), 
        'blockNumber': 13352962, 
        'data': '0x0000000000000000000000000000000000000000000000356ea11fcb4975c000', 
        'logIndex': 85, 
        'removed': False, 
        'topics': [
            HexBytes('0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'), 
            HexBytes('0x00000000000000000000000021a31ee1afc51d94c2efccaa2092ad1028285549'), 
            HexBytes('0x000000000000000000000000d5e73f9199e67b6ff8dface1767a1bdadf1a7242')
        ], 
        'transactionHash': HexBytes('0xe3d380b3647abee2f8e2980d8e3bf6e9a43b00a0f4a388765585df13ecefaa2e'), 
        'transactionIndex': 41
    }), 
    ...
]

If you read it carefully, you can spot the sender and receiver of each log (topic 1 and 2). The only missing part is the transaction value, right? Actually, it is stored in the data field but it is encoded. It can be decoded by using the toInt method from the web3.py library.

Let's try it out.

from web3 import Web3

value = Web3.toInt(hexstr="0x0000000000000000000000000000000000000000000000356ea11fcb4975c000")
print(value) # 985649123680000000000

The script returned 985649123680000000000 and this number feels a bit too high, right?

Not really, DAI has 18 decimals - it means you need to divide this number by 10 to the power of 18 (10^18) to get a "human" readable number. If you do so you will find out that the value of this transaction was 985.64912368 DAI. You can confirm this number with transaction on Etherscan.

Conclusion

Processing Ethereum block-by-block can be tedious and not necessary. Often we can search logs to find events related to transactions we are looking for. As I said at the very beginning, you can find a fully-fledged script is on Github with all instructions how to run it. The article contains only the juicy parts.

If you have any question you can start a discussion on Github or ping me on Twitter. Feel free to follow me - I tweet things related to Ethereum and software engineering.

It would mean the world to me if you would share this article on your social media. Thank you!

Subscribe to mateu.sh

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe