The Ethereum Virtual Machine (EVM) is the core runtime environment for executing smart contracts on the Ethereum blockchain. It is a decentralized, Turing-complete virtual machine that enables developers to deploy and run code in a secure, isolated environment. This article provides a comprehensive overview of the EVM's architecture, data structures, and operational mechanics.
Core Components of the Ethereum Blockchain
Ethereum’s architecture consists of multiple layers, including the data layer, consensus layer, and execution layer. The EVM operates within the execution layer, handling smart contract deployment and interaction.
Data Storage and Structure
Ethereum uses a unique data structure to protect block headers and bodies. Block headers include Merkle root hashes for transactions, account states, and logs. These hashes are computed using a Merkle Patricia Tree (MPT), which efficiently verifies data integrity.
- Hash Function: Keccak256 is used for cryptographic hashing.
- Digital Signatures: ECDSA (Elliptic Curve Digital Signature Algorithm) secures transactions.
Most Ethereum clients, like Geth, use LevelDB—a key-value non-relational database based on Log-Structured Merge Trees—to store blockchain data. However, some clients, such as OpenEthereum, use RocksDB for enhanced performance.
Consensus Mechanisms
- Eth1.0: Relies on Ethash Proof-of-Work (PoW).
- Eth2.0: Uses a hybrid consensus model combining Ethash PoW for the main chain and Casper FFG (Friendly Finality Gadget) for the beacon chain. Future upgrades will transition fully to Proof-of-Stake (PoS).
Mining rules also differ between versions:
- Eth1.0 uses the GHOST protocol (Greedy Heaviest-Observed Sub-Tree).
- Eth2.0 adopts LMD-GHOST (Latest Message Driven GOHST).
Smart Contracts and the EVM
Smart contracts are self-executing contracts with terms directly written into code. They are compiled into bytecode and ABI (Application Binary Interface) before deployment.
- Bytecode: Executed on the EVM.
- ABI: Defines how to interact with the contract via JSON-RPC calls.
Transactions are the only way users or contracts interact with Ethereum. Each transaction contains critical data, such as recipient address, value, and input data.
EVM Architecture
The EVM is a stack-based, register-less virtual machine that interprets and executes bytecode. It operates as a sandboxed environment, ensuring code execution does not affect other parts of the system.
- Nodes: Computers running clients (e.g., Geth) that maintain the EVM.
- Distributed System: The EVM is collectively upheld by all nodes in the network.
👉 Explore advanced EVM operational details
Data Storage in EVM: Storage vs. Memory
Storage Layout
Global variables are stored compactly in storage. Multiple variables may share the same 32-byte slot if their combined size is ≤32 bytes. Variables are stored consecutively starting from slot[0].
- Small Variables: Reading/writing variables <32 bytes consumes more gas due to EVM’s 32-byte operation unit.
- Dynamic Arrays & Mappings: Require unpredictable storage space. They occupy a base slot (e.g.,
slot[p]), but actual values are stored elsewhere using keccak256 hashing for lookup.
Examples:
- Dynamic Array Elements: Located at
keccak256(p) + index. - Mapping Values: Stored at
keccak256(abi.encode(key, p)). - Strings/Bytes: Treated as
bytes1[]. If ≤31 bytes, length and data are packed in one slot. If ≥32 bytes, data is stored atkeccak256(p).
Memory Layout
Memory (mem) is temporary and less compact than storage. Variables in memory are aligned to 32-byte boundaries, even if they require less space. This prevents data corruption but increases gas costs for smaller variables.
EVM Execution Model
The EVM processes transactions or messages from externally owned accounts (EOAs) or contract accounts. Execution results in updated state data and logs.
- Message Object: All operations are converted into a
Messageobject, which the EVM uses to create aContractobject. Transaction Types:
- Simple Transfers: Update account balances.
- Contract Interactions: Use
datafield to deploy or invoke functions.
Opcodes and Gas Costs
EVM opcodes are 1-byte instructions (e.g., ADD, SSTORE). There are up to 256 opcodes, though not all are used.
- Function Selection: The EVM matches function signatures to entry points in bytecode.
- JumpTable: Maps opcodes to operations during execution.
Gas costs vary by opcode. Some have fixed costs; others have dynamic costs based on operation complexity. For detailed gas metrics, refer to evm.codes.
Frequently Asked Questions
What is the EVM?
The Ethereum Virtual Machine is a decentralized computation platform that executes smart contracts. It ensures code runs exactly as programmed without downtime or third-party interference.
How does EVM storage work?
EVM storage uses 32-byte slots. Variables are packed together to save space, but dynamic types (e.g., mappings) use hashing for storage allocation. This design balances efficiency and accessibility.
Why do small variables cost more gas?
The EVM operates in 32-byte chunks. Reading/writing smaller variables requires extra operations (e.g., masking), increasing gas costs.
What is the difference between storage and memory?
- Storage: Persistent, on-chain data stored in slots.
- Memory: Temporary, off-chain data used during execution.
How are contracts executed?
Contracts are compiled to bytecode. The EVM interprets opcodes in this bytecode, updating state based on function calls and transactions.
What is the role of the ABI?
The ABI defines how to encode/decode data for contract interactions. It specifies function names, parameters, and return types for JSON-RPC calls.