Beyond the eye-popping prices of monkey-themed images, the underlying technology of NFTs offers companies a new way to monetize their digital presence directly. Major brands like Adidas, the NBA, and TIME have already begun experimenting with NFTs to explore these revenue streams—and we are still in the early stages of this trend.
As data practitioners, we can deliver valuable insights into these new business models, especially since all transactions are publicly accessible on the blockchain. This article introduces a starter project that uses Python to access, analyze, and identify potential fraud using blockchain data.
In this post, we’ll cover:
- The basics of blockchain, NFTs, and network graphs.
- How to extract NFT data using the open-source NFT Analyst Starter Pack from a16z.
- How to interpret Ethereum blockchain data.
- The concept of wash trading in NFT markets.
- How to build a network graph to visualize potential wash trading in the Bored Ape Yacht Club NFT project.
We’ll also include a Frequently Asked Questions section to clarify common points of confusion.
Understanding Blockchain Data
Amid the media buzz around meme coins and million-dollar pixel art, there lies a transformative technology: the blockchain.
At its core, a blockchain is a cryptographically secured, decentralized, and immutable digital ledger. Unlike a traditional bank ledger, a blockchain is maintained by a distributed network of computers that must collectively agree on the validity of each new entry. This ensures transparency and prevents tampering.
For a deeper dive into Ethereum data, consider exploring dedicated Ethereum analytics resources.
One of the most powerful features of blockchain technology is that all data—including transaction logs and metadata—is public and accessible. This openness allows analysts to study transaction behaviors, track asset movement, and identify unusual patterns.
What Are NFTs?
NFT stands for Non-Fungible Token. It is a type of cryptographic asset on a blockchain that represents ownership of a unique digital or physical item. For example, while an ounce of gold is fungible (interchangeable with any other ounce), the original Mona Lisa is non-fungible—there is only one.
Contrary to popular belief, NFTs are not just digital art. They can represent ownership of a wide range of assets, including music, virtual real estate, and collectibles. In this article, we focus on the Bored Ape Yacht Club (BAYC), one of the most well-known NFT art projects.
For a beginner-friendly introduction to NFTs, video explainers from trusted educational sources can be very helpful.
Why Use Network Graphs for Blockchain Data?
Network graphs are an intuitive way to represent relational data. They consist of nodes (entities such as wallet addresses) and edges (connections or transactions between those nodes). Both nodes and edges can store metadata such as timestamps, transaction values, and asset IDs.
Blockchain transactions are inherently relational—every transaction has a from address, a to address, and associated metadata. This makes network graphs a natural fit for analyzing and visualizing transaction behavior.
In this tutorial, we use network graphs to detect wash trading: a form of market manipulation where a trader buys and sells an asset through multiple accounts to create artificial trading volume or inflate the price.
According to industry reports, wash trading is a significant issue in NFT markets, with millions of dollars in estimated profits from such activities in recent years.
Extracting Data from the Ethereum Blockchain
Although blockchain data is public, accessing and preparing it for analysis can be challenging. Common methods include:
- Running your own Ethereum node.
- Using third-party node services.
- Accessing cleaned and aggregated data via commercial APIs.
- Using open-source tools like the NFT Analyst Starter Pack from a16z.
For this project, we use the a16z starter pack, which provides a convenient wrapper around the Alchemy API and outputs easy-to-use CSV files for specified NFT contracts.
Preparing Data and Creating a Network Graph
The NFT Analyst Starter Pack generates three CSV files for the BAYC project:
- BAYC Metadata: Information about individual NFTs, identified by a unique
asset_id. - BAYC Sales: Transaction logs including buyer, seller, sale price, and transaction hash.
- BAYC Transfers: Data on asset transfers, including those without financial transactions.
Key data preparation steps include:
- Merging and deduplicating sales and transfers data.
- Standardizing column names and ensuring consistent data types.
Once the data is cleaned, we use the NetworkX package in Python to construct a network graph. The from_pandas_edgelist function allows us to build a graph from a DataFrame by specifying source and target columns (representing from and to addresses) and attaching metadata to each edge.
With over 40,000 transactions in the full dataset, visualizing the entire graph is impractical. Instead, we focus on a specific asset to tell a clear and compelling data story.
Visualizing Potential Wash Trading
To demonstrate the process, we examine BAYC token #8099, which has been publicly flagged as potentially involved in wash trading.
We follow these steps:
- Filter the dataset to include only transactions involving asset #8099.
- Anonymize wallet addresses by renaming them to alphabetical labels (e.g., Wallet A, Wallet B).
- Use NetworkX to generate a directed graph from the transaction data.
- Visualize the graph with labels, edge arrows, and a structured layout.
The resulting graph reveals a circular pattern of transactions between a small set of wallets, accompanied by a sharp increase in sale price. While this does not conclusively prove wash trading, it highlights a pattern worthy of further investigation.
By examining the transaction history between the involved wallets on Etherscan, analysts can gather additional evidence to assess whether market manipulation occurred.
Frequently Asked Questions
What is wash trading?
Wash trading is a form of market manipulation where an individual or group trades an asset with themselves to create false activity and manipulate perceived value.
Can network graphs prove fraud?
No. Network graphs help visualize transaction patterns and identify red flags. Further investigation using tools like Etherscan is needed to validate suspicions.
Is all blockchain data public?
Yes, on public blockchains like Ethereum, all transaction data is accessible. However, wallet owners are pseudonymous by default.
Do I need to run a node to analyze blockchain data?
Not necessarily. Services like the NFT Analyst Starter Pack simplify data access, though running a node offers the highest level of data autonomy.
What other NFT projects can I analyze this way?
This method can be applied to any NFT project on Ethereum. You only need the project’s contract address to begin extracting data.
How can I learn more about blockchain analytics?
Join online communities focused on Web3 and blockchain data science. Many open-source tools and educational resources are available for beginners.
Next Steps
This tutorial introduced how to access, prepare, and visualize Ethereum NFT data using Python and network analysis. If you’re interested in diving deeper, consider:
- Replicating this analysis with other NFT collections.
- Incorporating time-series analysis to track price manipulation over time.
- Exploring other blockchain networks like Solana or Polygon.
To learn more about real-time analytics tools and advanced methods, you can explore additional strategies here.
Disclaimer
This content is for educational purposes only and is not financial advice. The author does not hold any financial position in the NFTs analyzed. This analysis highlights potential red flags for further investigation and does not constitute proof of fraud. Always protect your private keys and recovery phrases—never share them with anyone.