Simplifying 6502 Addressing Modes in Rust Emulation

Introduction

Embarking on the journey of building a CPU emulator is a fascinating dive into the fundamental workings of computing. At its core, a 6502 CPU emulator, like many others, operates as a sophisticated state machine, tirelessly executing a sequence of instructions read directly from memory. These instructions, often referred to by their specific binary codes as opcodes, are the very essence of a program, dictating everything from how data is loaded into registers, how the program interacts with the memory stack, to direct manipulation of memory locations.

Before we delve into the intricacies of how to efficiently implement such a system in Rust, especially concerning one of the MOS 6502’s most distinguishing features, its versatile addressing modes, it’s crucial to establish a clear and concise vocabulary. This shared understanding will pave the way for a deeper exploration of our topic.

Let’s define the key terms we’ll be using throughout this guide:

Term	Definition
Instruction	The high-level operation the CPU performs. Think of it as a command, like “Load Accumulator,” “Add with Carry,” or “Jump.” It describes what the CPU is supposed to do. A single instruction can often be performed in multiple ways, depending on where its data comes from.
Opcode	An “operation code” is the specific byte (or bytes) in machine code that represents a particular instruction and its addressing mode. It’s the unique numerical code the CPU reads from memory that tells it exactly how to execute an instruction. For example, while “Load Accumulator” is an instruction, `A9` (LDA Immediate) and `AD` (LDA Absolute) are two distinct opcodes for that same instruction.
Addressing Mode	Defines how the CPU finds the data (the operand) that an instruction needs to operate on. It’s the set of rules for calculating the effective memory address of the operand. Different addressing modes offer flexibility in how you access data, whether it’s directly from a register, from a fixed memory location, or from a location calculated relative to another register.

Instructions vs Opcodes

It’s common for beginners to confuse “instruction” and “opcode,” but understanding the distinction is key. An instruction is the conceptual operation (e.g., LDA - LoaD Accumulator). An opcode is the specific byte value that tells the CPU to perform that instruction using a particular addressing mode.

Consider the LDA instruction. It always means “put a value into the Accumulator register.” But which value?

If the opcode is A9, it means “put the next byte in memory into the Accumulator.” (Immediate addressing mode)
If the opcode is AD, it means “put the value from the absolute memory address specified by the next two bytes into the Accumulator.” (Absolute addressing mode)

So, LDA is the instruction, while A9 and AD are just two of its many possible opcodes, each implicitly defining the addressing mode. When implementing an emulator, you’re primarily parsing opcodes and then dispatching to logic that handles the associated instruction using the specified addressing mode.

6502 Addressing Modes

The 6502 CPU boasts a wide array of addressing modes, each designed to optimize memory access for different programming patterns. Each of the many instructions has multiple available opcodes, corresponding to different addressing modes. This flexibility is powerful but also adds complexity to emulator implementation. Understanding these modes is paramount to accurately fetching operands and determining the cycle cost of each operation.

Here’s a breakdown of the primary 6502 addressing modes:

Mode (Abbr.)	Description	Example
Implied (IMP)	The instruction operates on an implicit register or memory location. No additional bytes are needed.	`INX` (Increment X register)
Accumulator (ACC)	The instruction operates directly on the Accumulator register. No additional bytes are needed.	`ASL A` (Arithmetic Shift Left Accumulator)
Immediate (IMM)	The operand is the byte immediately following the opcode.	`LDA #$FF` (Load Accumulator with value FF hex)
Zero Page (ZP)	The operand is at an address in the first 256 bytes of memory ( $0000-$ 00FF). Only one byte is needed to specify the address, making these instructions faster.	`LDA $42` (Load Accumulator from address $0042)
Zero Page, X (ZPX)	The effective address is found by adding the contents of the X register to a Zero Page address.	`LDA $42, X` (Load Accumulator from $0042 + X)
Zero Page, Y (ZPY)	(Only for `LDX` and `STX` instructions) Similar to ZPX, but adds the contents of the Y register to a Zero Page address.	`LDX $42, Y` (Load X register from $0042 + Y)
Absolute (ABS)	The operand is at the full 16-bit address specified by the two bytes following the opcode.	`LDA $C000` (Load Accumulator from address $C000)
Absolute, X (ABSX)	The effective address is found by adding the contents of the X register to an Absolute address.	`LDA $C000, X` (Load Accumulator from $C000 + X)
Absolute, Y (ABSY)	The effective address is found by adding the contents of the Y register to an Absolute address.	`LDA $C000, Y` (Load Accumulator from $C000 + Y)
Indirect (IND)	(Only for `JMP` instruction, and has a bug on real hardware) The two bytes following the opcode specify a 16-bit address that contains the actual 16-bit target address for the jump.	`JMP ($1234)` (Jump to address at $1234)
Indexed Indirect (INDX)	(X-indexed, Indirect) The operand is at an address found by adding the X register to a Zero Page address, then using the result as the low byte of a 16-bit pointer to the actual operand address.	`LDA ($42, X)` (Load Accumulator from address pointed to by $0042 + X)
Indirect Indexed (INDY)	(Indirect, Y-indexed) The operand is at an address found by taking a 16-bit pointer from a Zero Page address, and then adding the Y register to that pointer.	`LDA ($42), Y` (Load Accumulator from address pointed to by $0042, then add Y)
Relative (REL)	(Only for branch instructions) The operand is a signed 8-bit offset added to the Program Counter to determine the jump target.	`BCC $05` (Branch if Carry Clear to PC + 5)

The Problem

Implementing each instruction and its various opcodes individually can lead to a lot of repetitive code. Many instructions share the same underlying logic for fetching their operands, even if the final operation on that operand (e.g., loading, storing, adding) differs.

The LDA Instruction

LDA is an instruction that will load some value into the A (Accumulator) register. The value loaded into the A register can come from various sources: it could be an immediate value embedded directly in the instruction, a value read from a relative memory address, or from an absolute memory address, among others. Each of these different ways of specifying “where the value comes from” corresponds to a distinct addressing mode, and thus a distinct opcode.

The LDA Opcodes

For instance, the LDA instruction alone has 8 distinct opcodes, each corresponding to a different addressing mode:

Opcode	Addressing Mode	Bytes	Cycles
`A9`	Immediate	2	2
`A5`	Zero Page	2	3
`B5`	Zero Page, X	2	4
`AD`	Absolute	3	4
`BD`	Absolute, X	3	4*
`B9`	Absolute, Y	3	4*
`A1`	(X-indexed) Indirect	2	6
`B1`	Indirect, (Y-indexed)	2	5*

*Cycles marked with * indicate that an additional cycle is incurred if a page boundary is crossed during address calculation.

The Challenge

The challenge lies in avoiding a massive match statement (or switch in other languages) where each of these 8 LDA opcodes has its own duplicated address calculation logic. How can we abstract the addressing mode logic so that any instruction can simply request an operand via a specific mode, and the CPU (or a helper) handles the fetching consistently? This separation of concerns is key to building a maintainable and accurate 6502 emulator.

The Naive Solution

A novice embarking on a 6502 CPU emulator might instinctively begin with a traditional fetch-decode-execute loop. At its core, this loop involves reading the next opcode from memory, determining its function, performing the required operations, and advancing the program counter.

struct CPU {
    register_a: u8,
    register_x: u8,
    // ... other registers (SP, Y, etc.)
    program_counter: u16,
    memory: Vec<u8>, // Simulating memory for demonstration
}

impl CPU {
    // A simplified memory read function
    fn read_byte(&self, address: u16) -> u8 {
        self.memory[address as usize] // Basic direct access for example
    }

    fn run(&mut self) {
        loop {
            let opcode = self.read_byte(self.program_counter);
            self.program_counter += 1;

            match opcode {
                // LDA (Load Accumulator) opcodes
                0xA9 => self.lda_immediate(),     // LDA #$NN
                0xA5 => self.lda_zero_page(),     // LDA $NN
                0xB5 => self.lda_zero_page_x(),   // LDA $NN,X
                0xAD => self.lda_absolute(),      // LDA $NNNN
                0xBD => self.lda_absolute_x(),    // LDA $NNNN,X
                0xB9 => self.lda_absolute_y(),    // LDA $NNNN,Y
                0xA1 => self.lda_indexed_indirect(), // LDA ($NN,X)
                0xB1 => self.lda_indirect_indexed(), // LDA ($NN),Y

                // Other opcodes would go here...
                // _ => panic!("Unhandled opcode: {:X}", opcode), // For robust emulation
            }
            // In a real emulator, you'd handle cycles, interrupts, etc. here.
        }
    }
}

This direct-mapping approach then necessitates separate, distinct function implementations for each combination of instruction and addressing mode. As you can see, the core logic for fetching the operand’s value is duplicated across multiple LDA functions:

impl CPU {
    // Handles LDA with Immediate Addressing Mode
    fn lda_immediate(&mut self) {
        let operand = self.read_byte(self.program_counter);
        self.program_counter += 1; // Advance PC after reading operand
        self.register_a = operand;
        // ... (update flags based on operand)
    }

    // Handles LDA with Zero Page Addressing Mode
    fn lda_zero_page(&mut self) {
        let zero_page_address = self.read_byte(self.program_counter) as u16;
        self.program_counter += 1; // Advance PC after reading address byte
        let operand = self.read_byte(zero_page_address);
        self.register_a = operand;
        // ... (update flags)
    }

    // Handles LDA with Zero Page, X Addressing Mode
    fn lda_zero_page_x(&mut self) {
        let base_zero_page_address = self.read_byte(self.program_counter) as u16;
        self.program_counter += 1; // Advance PC
        // Note: Zero Page indexed addressing wraps around 0xFF
        let effective_address = base_zero_page_address.wrapping_add(self.register_x as u16) & 0xFF; // Only lower 8 bits for ZP
        let operand = self.read_byte(effective_address);
        self.register_a = operand;
        // ... (update flags)
    }
    // ... many more lda_* functions for other modes (absolute, indirect, etc.)
}

Imagine extending this pattern to all 56 or so 6502 instructions, each with its own set of 2-8 opcodes! You would quickly find yourself writing a bewildering amount of highly repetitive code. The same logic for calculating a “Zero Page, X” address, for example, would be copy-pasted into LDA_ZeroPageX, STA_ZeroPageX, LDX_ZeroPageY, and so on.

Surely there must be a better, more elegant way to structure this.

Luckily for us… there is!

A Better Solution

The repetitive nature of the naive approach clearly demonstrates the need for a more structured and modular design. The core problem is that addressing mode logic is tightly coupled with instruction execution. By decoupling these concerns, we can achieve significant improvements in code clarity, maintainability, and extensibility.

First, let’s formalize how we model our 6502 components and their relationships.

Modeling Addressing Modes & Opcodes

To begin, we need clear data structures to represent the distinct characteristics of each addressing mode and each opcode. This lays the groundwork for our lookup table.

#[derive(Debug)]
pub enum AddressingMode {
    Immediate,
    ZeroPage,
    ZeroPageX,
    ZeroPageY,
    Absolute,
    AbsoluteX,
    AbsoluteY,
    IndirectX,
    IndirectY,
    Indirect, // Exclusive to JMP opcodes
    Relative, // Exclusive to Branch opcodes
    None,
}

pub struct Opcode {
    pub value: u8,
    pub name: &'static str, // Instruction mnemonic, e.g., "LDA"
    pub size: u8,           // Number of bytes the opcode occupies (opcode + operand bytes)
    pub cycles: u8,         // Base CPU cycles for this opcode (excluding page cross penalties)
    pub mode: AddressingMode, // The addressing mode used by this opcode
}

impl Opcode {
    pub const fn new(
        value: u8,
        name: &'static str,
        size: u8,
        cycles: u8,
        mode: AddressingMode,
    ) -> Self {
        Self {
            value,
            name,
            cycles,
            size,
            mode,
        }
    }
}

Creating an Opcode Lookup Table

With our Opcode struct defined, we can now create a single, immutable source of truth for all 6502 opcodes. This array will contain every opcode, along with its properties like its value, mnemonic, byte size, base cycles, and, most importantly for our topic, its associated addressing mode.

To enable fast lookups during emulation, we’ll convert this static array into a HashMap mapping opcode byte values (u8) to references to their corresponding Opcode struct. We’ll use the once_cell crate’s Lazy static for efficient, one-time initialization of this map.

use std::collections::HashMap;
use once_cell::sync::Lazy;

const OPCODES: &[Opcode] = &[

    // --- LDA opcodes ---
    Opcode::new(0xA9, "LDA", 2, 2, AddressingMode::Immediate),
    Opcode::new(0xA5, "LDA", 2, 3, AddressingMode::ZeroPage),
    Opcode::new(0xB5, "LDA", 2, 4, AddressingMode::ZeroPageX),
    Opcode::new(0xAD, "LDA", 3, 4, AddressingMode::Absolute),
    Opcode::new(0xBD, "LDA", 3, 4, AddressingMode::AbsoluteX),
    Opcode::new(0xB9, "LDA", 3, 4, AddressingMode::AbsoluteY),
    Opcode::new(0xA1, "LDA", 2, 6, AddressingMode::IndirectX),
    Opcode::new(0xB1, "LDA", 2, 5, AddressingMode::IndirectY),

    // --- Other Instructions (Example) ---
    Opcode::new(0xAA, "TAX", 1, 2, AddressingMode::Implied), // Transfer A to X
    Opcode::new(0x00, "BRK", 1, 7, AddressingMode::Implied), // Break
    Opcode::new(0xEA, "NOP", 1, 2, AddressingMode::Implied), // No Operation
    Opcode::new(0x4C, "JMP", 3, 3, AddressingMode::Absolute), // Jump Absolute
    Opcode::new(0x6C, "JMP", 3, 5, AddressingMode::Indirect), // Jump Indirect (with bug)
    Opcode::new(0x90, "BCC", 2, 2, AddressingMode::Relative), // Branch if Carry Clear

    // ... many, many more opcodes for a full 6502 implementation
];

pub static OPCODES_MAP: Lazy<HashMap<u8, &'static Opcode>> = Lazy::new(|| {
    let mut map = HashMap::new();
    for opcode in OPCODES {
        map.insert(opcode.value, opcode);
    }
    map
});

This static HashMap means that once OPCODES_MAP is accessed for the first time, it’s built and then available for lightning-fast lookups throughout the emulator’s lifetime.

The Magic Sauce: Decoupling Operand Fetching

The core of our “naive solution’s” problem was the duplication of logic for calculating the operand’s memory address. The “magic sauce” lies in extracting this address calculation into a single, dedicated function within our CPU implementation. This function will take an AddressingMode as input and return the 16-bit effective address from which the operand should be read (or to which it should be written).

Let’s enhance our CPU structure and add this crucial get_operand_address method:

impl CPU {

    // A helper to read a 16-bit word from memory (Little Endian)
    fn read_word(&self, address: u16) -> u16 {
        let lo = self.read_byte(address) as u16;
        let hi = self.read_byte(address.wrapping_add(1)) as u16;
        (hi << 8) | lo
    }

    /// Calculates the effective memory address based on the provided addressing mode.
    /// Returns the calculated address and a boolean indicating if a page boundary was crossed.
    fn get_operand_address(&mut self, mode: &AddressingMode) -> (u16, bool) {
        match mode {
            AddressingMode::Immediate => {
                // Operand is the byte directly after the opcode.
                // PC already points to this byte before advancement.
                let addr = self.program_counter;
                self.program_counter = self.program_counter.wrapping_add(1); // Advance PC
                (addr, false)
            }
            AddressingMode::ZeroPage => {
                let zero_page_addr = self.read_byte(self.program_counter) as u16;
                self.program_counter = self.program_counter.wrapping_add(1);
                (zero_page_addr, false)
            }
            AddressingMode::ZeroPageX => {
                let zero_page_base = self.read_byte(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(1);
                let effective_addr = zero_page_base.wrapping_add(self.register_x);
                (effective_addr as u16, false) // Zero page addresses are always 8-bit, so no page crossing
            }
            AddressingMode::ZeroPageY => {
                let zero_page_base = self.read_byte(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(1);
                let effective_addr = zero_page_base.wrapping_add(self.register_y);
                (effective_addr as u16, false)
            }
            AddressingMode::Absolute => {
                let addr = self.read_word(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(2); // Read two bytes
                (addr, false)
            }
            AddressingMode::AbsoluteX => {
                let base_addr = self.read_word(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(2);
                let effective_addr = base_addr.wrapping_add(self.register_x as u16);
                let page_crossed = (base_addr & 0xFF00) != (effective_addr & 0xFF00);
                (effective_addr, page_crossed)
            }
            AddressingMode::AbsoluteY => {
                let base_addr = self.read_word(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(2);
                let effective_addr = base_addr.wrapping_add(self.register_y as u16);
                let page_crossed = (base_addr & 0xFF00) != (effective_addr & 0xFF00);
                (effective_addr, page_crossed)
            }
            AddressingMode::IndirectX => {
                let zero_page_idx = self.read_byte(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(1);

                // Add X to the zero-page index, wrapping within the zero page
                let ptr_base = zero_page_idx.wrapping_add(self.register_x);

                // Read the actual 16-bit address from the zero page
                // The HI byte is read from ptr_base + 1, *wrapping within zero page*
                let lo = self.read_byte(ptr_base as u16) as u16;
                let hi = self.read_byte(ptr_base.wrapping_add(1) as u16) as u16; // Zero-page wrap

                let effective_addr = (hi << 8) | lo;
                (effective_addr, false) // IndirectX itself doesn't cause page cross penalty for operand fetch
            }
            AddressingMode::IndirectY => {
                let zero_page_addr = self.read_byte(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(1);

                // Read the base 16-bit address from the zero page (handles zero-page wrap for high byte)
                let lo = self.read_byte(zero_page_addr as u16) as u16;
                let hi = self.read_byte(zero_page_addr.wrapping_add(1) as u16) as u16; // Zero-page wrap
                let base_ptr = (hi << 8) | lo;

                // Add Y to this 16-bit pointer
                let effective_addr = base_ptr.wrapping_add(self.register_y as u16);
                let page_crossed = (base_ptr & 0xFF00) != (effective_addr & 0xFF00);
                (effective_addr, page_crossed)
            }
            AddressingMode::Indirect => {
                // NOTE: Only used by JMP. This mode has a famous bug on real 6502 hardware
                // if the indirect vector falls on a page boundary (e.g., $xxFF).
                // In such cases, the LSB is fetched from $xxFF, but the MSB is fetched from $xx00.
                let indirect_vec_addr = self.read_word(self.program_counter);
                self.program_counter = self.program_counter.wrapping_add(2);

                let effective_addr = if indirect_vec_addr & 0x00FF == 0x00FF {
                    // Page boundary bug: high byte from $xx00
                    let lo = self.read_byte(indirect_vec_addr) as u16;
                    let hi = self.read_byte(indirect_vec_addr & 0xFF00) as u16;
                    (hi << 8) | lo
                } else {
                    self.read_word(indirect_vec_addr)
                };
                (effective_addr, false) // Indirect JMP doesn't incur page crossing penalty for *itself*
            }
            AddressingMode::Relative => {
                // Used by branch instructions.
                // The offset is a signed 8-bit value, relative to PC + 2 (opcode + 1 byte operand)
                let offset = self.read_byte(self.program_counter) as i8;
                self.program_counter = self.program_counter.wrapping_add(1); // Read offset byte

                // Branch target is calculated relative to the *address of the next instruction*
                let base_pc_for_branch = self.program_counter;
                let addr = base_pc_for_branch.wrapping_add_signed(offset as i16);
                let page_crossed = (base_pc_for_branch & 0xFF00) != (addr & 0xFF00);
                (addr, page_crossed)
            }
        }
    }
    // ...
}

With the operand fetching logic extracted, we’ve achieved a significant separation of concerns. Now, any instruction can simply call get_operand_address with its specific mode, and receive the operand’s location (and page cross info!) without needing to know the complex details of how that address was derived. This leads us to our much cleaner instruction implementations.

impl CPU {
    /// Executes the LDA instruction, handling all its addressing modes.
    fn lda(&mut self, opcode: &Opcode) {
        let (address, page_crossed) = self.get_operand_address(&opcode.mode);

        let operand = self.read_byte(address);
        self.register_a = operand;

        // Update CPU flags based on the value loaded into A
        self.update_zero_and_negative_flags(self.register_a);

        // Crucial for cycle accuracy: Add a penalty if a page was crossed
        if page_crossed {
            self.add_cycles(1); // Add 1 cycle for page cross penalty
        }
    }

    // Placeholders
    fn update_zero_and_negative_flags(&mut self, value: u8) { /* ... */ }
    fn add_cycles(&mut self, cycles: u8) { /* ... */ }
}

Ahh, this is a much cleaner implementation! This single lda() function now elegantly supports all 8 LDA opcodes by leveraging the centralized get_operand_address logic, adhering to the DRY (Don’t Repeat Yourself) principle.

Conclusion

By formalizing our AddressingMode and Opcode structures and creating a central OPCODES_MAP, we establish a robust foundation for a 6502 emulator. The true power of this approach comes from decoupling the operand addressing logic from the instruction execution logic.

This allows our CPU’s main run loop to become much cleaner:

impl CPU {
    // ...
    fn run(&mut self) {
        loop {
            let opcode_value = self.read_byte(self.program_counter);
            self.program_counter += 1; // Advance PC past the opcode byte

            // Look up the opcode's details from our global map
            let opcode = OPCODES_MAP.get(&opcode_value).expect(&format!("Unknown opcode: {:X}", opcode_value));

            // Execute the instruction based on its name (mnemonic)
            match opcode.name {
                "LDA" => self.lda(opcode),
                "TAX" => self.tax(opcode), // Example for another instruction
                "JMP" => self.jmp(opcode), // Example for JMP with its own mode handler
                "BCC" => self.bcc(opcode), // Example for branch instructions
                // ... many more instruction matches for a full CPU
                _ => panic!("Instruction not yet implemented: {}", opcode.name),
            }

            // After executing the instruction and advancing PC for the operand(s),
            // update total cycles. Page cross penalties are handled within instruction functions.
            self.add_cycles(opcode.cycles);

            // In a full emulator, this loop would also handle
            // interrupts, external events, etc.
        }
    }

    // Placeholder for other instruction implementations
    fn tax(&mut self, opcode: &Opcode) { /* ... */ }
    fn jmp(&mut self, opcode: &Opcode) { /* ... */ }
    fn bcc(&mut self, opcode: &Opcode) { /* ... */ }
}

This final run loop illustrates the simplified dispatch. Instead of an unwieldy match statement with hundreds of cases for every opcode, we now have a match on the instruction’s mnemonic (e.g., “LDA”, “JMP”). Each branch then calls a single, well-defined function (lda, jmp, etc.), passing the entire Opcode struct. This Opcode struct provides all the necessary context, including the AddressingMode that the instruction-specific function (like lda) can then use to retrieve its operand via get_operand_address.

This architecture not only reduces code duplication but also significantly improves maintainability. Adding new instructions or debugging addressing mode issues becomes far more straightforward since changes are localized to specific, well-defined functions. This is a most helpful approach to building a robust and accurate 6502 emulator.

# Simplifying 6502 Addressing Modes in Rust Emulation

Introduction

Instructions vs Opcodes

6502 Addressing Modes

The Problem

The LDA Instruction

The LDA Opcodes

The Challenge

The Naive Solution

A Better Solution

Modeling Addressing Modes & Opcodes

Creating an Opcode Lookup Table

The Magic Sauce: Decoupling Operand Fetching

Conclusion

# Astro - A Fantastically Fast Way To Create a Static Site