How the Ethereum Virtual Machine Works: Architecture and Opcodes Explained

·

The Ethereum Virtual Machine (EVM) is the core computational engine powering the Ethereum blockchain, executing smart contracts and processing transactions that move billions of dollars in value daily. As smart contracts grow more complex, developers benefit from understanding not just high-level languages like Solidity, but also the low-level mechanics of how the EVM operates. This guide provides a comprehensive look at EVM architecture, its key components, and how opcodes drive contract execution.

Understanding Virtual Machines

A Virtual Machine (VM) is a software-based emulation of a physical computer system, complete with virtualized hardware and software specifications. Commonly accessed through cloud platforms, VMs allow users worldwide to deploy and run applications without managing physical hardware. At their lowest level, VMs operate using machine code—binary instructions represented by opcodes like ADD, PUSH, and POP—which perform arithmetic, conditional checks, and other fundamental operations.

While most developers today work with high-level programming languages, those using assembly language have direct experience with these foundational opcodes.

What Is the Ethereum Virtual Machine?

The Ethereum Virtual Machine (EVM) is a purpose-built virtual machine that executes smart contract code on the Ethereum blockchain. Its operation is defined in the Ethereum Yellow Paper, which specifies a set of opcodes that include both standard VM instructions and Ethereum-specific codes enabling smart contract functionality.

Contained within Ethereum execution clients like Geth, Nethermind, and Reth—each with their own EVM implementation—the EVM processes transaction calldata and EVM bytecode, updating Ethereum’s global state. It uses a stack-based architecture and includes several key components:

Some components, like the Stack and Memory, are volatile and exist only during transaction execution. Others, like Storage, are persistent and maintain state across transactions.

Ethereum’s State Transition Function

A critical feature of the EVM is its deterministic state transition function, expressed as Y(S, T) = S'. Given an old state (S) and a new set of transactions (T), the function produces a new state (S'). This ensures that every Ethereum client, following the Yellow Paper specification, computes identical results for the same transaction sequence, enabling network consensus and predictability.

Core Components of the EVM

The Stack

The Stack is a volatile, LIFO data structure with a 1024-element capacity, each element being a 256-bit (32-byte) word. It handles inputs and outputs for opcode operations but cannot store complex data types like arrays or strings—these are managed via Memory. Common Stack-related errors include stack overflow/underflow and out-of-gas exceptions, which occur when operations exceed stack limits or allocated gas.

Opcodes like PUSH1, ADD, MUL, and POP interact directly with the Stack, while others like MSTORE and SSTORE enable access to Memory and Storage.

Memory

EVM Memory is a linear, volatile data store used during transaction execution. It is cheaper to use than Storage but costs increase quadratically with size. Memory operates in 32-byte words and is accessed via opcodes like MSTORE (store data in memory) and MLOAD (load data from memory). It is ideal for storing temporary data such as arrays and strings that don’t fit in the Stack.

Storage

Storage is Ethereum’s persistent state database, structured as a key-value store mapping 256-bit keys to 256-bit values. It holds account states—including balances, nonces, storage hashes, and code hashes—using a modified Merkle Patricia Trie for efficient hashing and verification. Externally Owned Accounts (EOAs) store only balance and nonce data, while contract accounts include additional fields.

Storage modifications are costly and incur gas fees based on network activity and opcode pricing. To reduce costs, developers often use external storage solutions like IPFS for large data such as NFT metadata. 👉 Explore decentralized storage solutions

The maximum deployable contract size on Ethereum is 24,576 bytes, though the amount of data a contract can store is limited only by gas, not size.

EVM Code

Smart contracts written in languages like Solidity or Vyper are compiled into EVM bytecode, which is stored in Storage as part of the contract account. This bytecode contains all contract logic and is executable by the EVM when the contract is called.

Program Counter

The Program Counter (PC) tracks the current position in the bytecode, indicating the next opcode to execute. It ensures instructions are processed sequentially, enabling correct contract operation.

Gas

Gas is the fuel powering EVM operations, quantifying the computational effort required for each opcode. Gas costs are predefined in the Yellow Paper, preventing denial-of-service attacks and encouraging efficient code.

EVM Opcodes Explained

EVM bytecode consists of opcodes that instruct the EVM on how to manipulate data, manage state, and control execution flow. Common opcode categories include:

Ethereum also introduces environment-specific opcodes like:

Each opcode has an associated gas cost, with state-modifying operations like SSTORE being more expensive than simple arithmetic opcodes.

From Source Code to Bytecode

Consider a simple Solidity storage contract:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.16 <0.9.0;

contract SimpleStorage {
    uint storedData;

    function set(uint x) public {
        storedData = x;
    }

    function get() public view returns (uint) {
        return storedData;
    }
}

Compilation produces bytecode like:

6080604052348015600e575f80fd5b506101438061001c5f395ff3fe608060405234801561000f575f80fd5b5060043610610034575f3560e01c806360fe47b1146100385780636d4ce63c14610054575b5f80fd5b610052600480360381019061004d91906100ba565b610072565b005b61005c61007b565b60405161006991906100f4565b60405180910390f35b805f8190555050565b5f8054905090565b5f80fd5b5f819050919050565b61009981610087565b81146100a3575f80fd5b50565b5f813590506100b481610090565b92915050565b5f602082840312156100cf576100ce610083565b5b5f6100dc848285016100a6565b91505092915050565b6100ee81610087565b82525050565b5f6020820190506101075f8301846100e5565b9291505056fea2646970667358221220fe2a712e6758ca6e067fd552b99e33f169a13afa9b0c54fdd2e92518f3aa766764736f6c63430008190033

This bytecode translates to opcode sequences executable by the EVM. For example, the initial bytes 6080 correspond to PUSH1 0x80, pushing the value 0x80 onto the Stack. Tools like evm.codes allow developers to decompile and analyze bytecode.

How the EVM Executes Transactions

An Ethereum transaction includes fields such as nonce, gas price, gas limit, recipient address, value, and data (calldata). Calldata is especially important for contract interactions, as it encodes function calls and arguments.

A typical contract interaction transaction might look like:

to: 0x6b175474e89094c44da98b954eedeac495271d0f
from: 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
value: 0x0
data: 0x60fe47b10000000000000000000000000000000000000000000000000000000000010f2c
gasPrice: 500000
gasLimit: 210000

The EVM execution flow for such a transaction involves:

  1. Decoding the transaction’s RLP-encoded payload to extract recipient, value, and data.
  2. Validating the nonce and ECDSA signature (v, r, s values) to authenticate the sender.
  3. Initializing a clean Memory and Stack context for execution.
  4. Processing opcodes sequentially under the guidance of the Program Counter, updating state as needed.

Calldata is structured with the first four bytes representing a function selector (a hash of the function signature), followed by encoded arguments. In the example, 0x60fe47b1 is the selector for the set function, and the remaining data represents the input argument.

Frequently Asked Questions

What is the primary purpose of the EVM?
The EVM executes smart contract code and processes transactions on the Ethereum blockchain, ensuring deterministic state changes across all network nodes. It provides a runtime environment for decentralized applications.

How does gas prevent network abuse?
Gas requires users to pay for computational resources, discouraging spam and inefficient code. Each opcode has a fixed cost, making transaction costs predictable and preventing denial-of-service attacks.

What is the difference between Memory and Storage?
Memory is volatile and temporary, used only during contract execution. Storage is persistent and stored on the blockchain, modifying the global state. Storage operations are more expensive than Memory operations.

Can the EVM execute code from other blockchains?
The EVM is specific to Ethereum and Ethereum-compatible networks. Other blockchains may use different virtual machines, though some, like Polygon or BSC, are EVM-compatible.

What tools can help analyze EVM bytecode?
Developers use tools like evm.codes for opcode reference, Porosity for decompilation, and blockchain explorers to inspect contract bytecode and transaction execution.

How are smart contracts upgraded if bytecode is immutable?
Smart contracts are immutable once deployed. Upgrades typically use proxy patterns that delegate logic to interchangeable implementation contracts, allowing logic updates while preserving state and address.

Understanding the EVM’s architecture and opcodes is essential for developers building secure and efficient smart contracts. By mastering these low-level concepts, you can optimize gas usage, debug complex issues, and gain deeper insight into Ethereum’s operational mechanics. 👉 Learn more about advanced blockchain development