## How Do Computers Even? *Simulating the Nintendo in 3000 lines of Javascript* ![nes](http://redlinernotes.com/docs/talks/trp/nes.jpg) -- **Or, how to spend hella time on a project that can't possibly make money** --- ## About Me (long ago)
I spent my childhood `"In the year 20xx ..."` --- ## About Me (2007-2011) 15 years later, not the most disciplined young man but in need of a career. -- I got into programming because I wanted to know how computers *work*. -- I self studied SICP. I wrote a lot of lisp. I got my CS degree and a job. --
But on some level, the NES was still **magic**. --
It's only fair I warn you this is a [pet][trp] [issue][tcc] of mine. That software today is remarkably complex and powerful but we're so used to the complexity we hardly strive for understandable software anymore. -- And the low-level foundations beneath our frameworks and libraries are **fuzzy mysteries mostly forgotten**. [trp]: http://redlinernotes.com/docs/talks/trp.html [tcc]: http://blog.redlinernotes.com/posts/Towards-Comprehensible-Computing.html -- *(This probably has something to do with why I'm enamored by Smalltalk and Common Lisp. They date from a time when programming environments had different goals.)* --- ## Well... --
> You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. - Alan Perlis --
> Thus, programs must be written for people to read, and only incidentally for machines to execute. - SICP, Preface --- ## The Goal Computers can simulate anything we can describe with sufficient precision. -- Including another computer! -- To better understand how computers work, I set out to simulate a Nintendo specifically to play Mega Man 2. --
(Unstated Thesis: Clearly one of the greatest games ever made.) --
The rest of this talk will walk through the myriad difficulties encountered and lessons learned as I worked on it. I'll try not to get too lost in the weeds. **Please ask questions when you are curious or confused!** --- ## Previously ![famiclom](http://redlinernotes.com/docs/talks/trp/famiclom.png) A wonderful effort to write a [readable emulator][readable] in Common Lisp. [readable]: http://redlinernotes.com/docs/cl-6502.pdf -- Stalled out from a mixture of boredom and life changes. -- But I also made some crucial false assumptions, the death of many programs! -- > The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow. - Fred Brooks, The Mythical Man Month --- ## Recently
[Nescavation][nescavation], a more widely deployed (i.e. browser-based) effort. [nescavation]: https://github.com/kingcons/nescavation -- Because life settled down, I'm still curious, and can make better assumptions! --- ## Overview To simulate the Nintendo, we have to simulate its parts. * CPU * Memory * Video Card * Sound Card * Controllers * Cartridges -- I started with the CPU since the CPU is the "brain" and seemed more straightforward than graphics or sound. (Neither of which I knew anything about.) -- Enter the [MOS 6502][mos-6502] and [cl-6502][cl-6502]. [mos-6502]: https://en.wikipedia.org/wiki/MOS_Technology_6502 [cl-6502]: https://github.com/kingcons/cl-6502 --- ## Stage 0 - The CPU in Isolation Conceptually, 8-bit CPUs are pretty straightforward.
*(Things went to hell in the 90s.)* -- * It knows a certain set of instructions, called an [Instruction Set Architecture][isa] or ISA. * It has access to memory. [isa]: https://en.wikipedia.org/wiki/Instruction_set * It has a few registers (read: tiny boxes of storage). * Of particular note is the `Program Counter` which says where in memory the next instruction is. -- ### How a CPU Works! > The CPU reads the memory location from the Program Counter (PC), > gets the instruction there, and runs it. -- > (This results in a change to the PC for the next instruction.) --- ### You mean... ```javascript while (cpu.isOn()) { let instruction = cpu.fetch(); instruction.run(); } ``` This is easy! -- ## Not Exactly -- Enter assembly code... --- ## Stage 0 - Programming a CPU You program CPUs in assembly language. Which is text. Then an "assembler" converts your code into so called "native code" or "machine code" which is just an array of numbers. There are a lot of important things to understand about assembly programming but these are 4 big ones: -- 1. There is no Operating System. This means we control hardware ourselves by working with *certain parts of memory*. -- 2. There is no Garbage Collection. This means we don't really have variables, we have *addresses and registers*. -- 3. There are no Data Types. This means everything is just a "byte", we don't even have *numbers*! -- 4. There are no Functions! This means we can at best say, "*go to this address* in memory and run the code". -- So what *do* you have? Enter the [MOS 6502][mos]. [mos]: https://en.wikipedia.org/wiki/MOS_Technology_6502 --- ## Stage 0 - The MOS 6502 The MOS 6502 is a historic CPU that played a massive role in the microcomputer boom of the 80s. It powered the Apple II, the Commodore 64, the BBC Micro, and the Nintendo. (Fictionally, both Futurama's Bender and the original Terminator were powered by the 6502.) -- ### Features * A 16-bit **Program Counter**, meaning the highest number the NES can count to is 65535 and it can address at most 64 kilobytes of memory. * No CS degree? Not fluent in binary? Don't worry. We'll be back. -- * Three 8-bit general purposes registers: Accumulator, X, Y. * An 8-bit stack pointer. * An 8-bit status register. * The individual bits represented interesting true-false facts about the system. This is commonly called a [bitfield][bitfield]. [bitfield]: https://en.wikipedia.org/wiki/Bit_field -- Let's finally look at some code... --- ## Stage 0 - Example 6502 Code (Actually the boot code for Super Mario Bros 1!) -- (Which is apparently [pretty tricky][smb-is-hard] to emulate.) [smb-is-hard]: http://forums.nesdev.com/viewtopic.php?p=22022#22022 -- ![smb-init](http://redlinernotes.com/images/smb_init.png) --- ## Stage 0 - Example 6502 Code, Takeaways It doesn't look fun to *write*. It's pretty low level. But maybe *running* it wouldn't be so bad... -- We just need a CPU object with registers and methods for the instructions and a big array to represent memory, right? -- Sure. Some questions remain... -- * How does 6502 code get loaded into memory? * How will we do it? * How is data represented? How do numbers work now? * Need to talk binary/hex, two's complement, endianness. --- ## Stage 0 - Representing Numbers In assembly code, we spend a lot of time talking about bits. Rarely do we use decimal numbers, we normally use binary or hex. -- The Decimal number system uses the digits 0-9 and digits represent *powers* of 10. The Binary number system uses the digits 0-1 and digits represent *powers* of 2. Hexadecimal uses the digits 0-f and digits represent *powers* of 16. -- ![binary](https://upload.wikimedia.org/wikipedia/commons/7/75/Binary_counter.gif) -- *(In Javascript Now)* * Decimal: `255`, Hex: `0xff`, Binary: `0b11111111` -- *(In Assembly, syntax varies by assembler)* * Decimal: `255`, Hex: `$ff`, Binary: `%11111111` --- ## Stage 0 - Representing Numbers, Pt. 2 Of course, if we have an 8-bit CPU then we're mostly working with bytes. Because of limited storage, a byte can only represent 256 values, 0-255. -- Fine, but what about negative numbers? What about bigger numbers? -- #### Definitions * *Signed*: A digital number that can be positive or negative. * *Unsigned*: A digital number that is positive. * *Word*: (In the case of the 6502) 16 Bits -- * *Endianness*: How to order bytes in memory. * *(top) Most Significant Bit/Byte*: The highest power bit/byte. * *(bottom) Least Significant Bit/Byte*: The lowest power bit/byte. -- *Little Endian*: The little bits come first. `$2000` would be stored in memory as the bytes: `$00 $20` *Big Endian*: The big bits come first. `$2000` would be stored in memory as the bytes: `$20 $00` -- Which one do you think the MOS 6502 (and your Intel chip) uses? --- ## Stage 0 - Representing Numbers, Pt. 3 So how do negative numbers work? -- #### [Two's Complement][two-cmp] [two-cmp]: https://en.wikipedia.org/wiki/Two%27s_complement Without going too in depth, here are some salient points: * Used for signed numbers on modern computers. * Represents numbers from -(2^(n-1)) to (2^(n-1))-1 * For a byte, that means from -128 to 127 * Top bit 0 means positive. Top bit 1 means negative. * Perhaps surprisingly, `0b11111111` is -1, `0b10000000` is -128. * This works because a 1 (aside from the top bit) still means "add this digit" to -128. * Not subtracting from zero, adding to 128. -- The really good news: * No negative version of zero. Something called "One's Complement" numbers have this. It isn't nice. -- * **Addition, Multiplication, Subtraction** work the same on signed/unsigned numbers (aside from overflow above a byte). --- ## Stage 0 - Loading Code Fine, enough with numbers for a moment! How are we going to load a game? -- Ever heard of ROMs? -- Julia Stiles time! -- ![console-cowboys](http://i.imgur.com/mn2wtrb.gif) -- *(That was [real][ghostwriters])* [ghostwriters]: https://www.youtube.com/watch?v=bLlj_GeKniA -- The 90s were a special time. --- ## Stage 0 - Loading Code Game cartridge data has been read and stored in files called ROMs. The Nintendo ROMs are in a format called iNES (from the iNES emulator). The simplest to understand documentation I've found of the format is [here][ines]. -- tl;dr: * First 16 bytes are metadata about the cartridge. * The rest is the binary data composing the game. -- So we can just load everything after 16 bytes into memory and we're golden! All we need is to be able to read files in binary. -- For now. -- And here's [how I did that][rom-loading] in javascript. tl;dr: Add a `FileReader` to the page, give it a callback to process the data and return a `Cartridge` object. Thanks fancy new [Browser APIs][web-apis]! [ines]: http://fms.komkon.org/EMUL8/NES.html#LABM [rom-loading]: https://github.com/kingcons/nescavation/commit/0d266df394b7ee9e5ad2867a42ed547a82ea3110 [web-apis]: https://developer.mozilla.org/en-US/docs/Web/API -- The one other detail to know is that when the Nintendo turned on or was reset, the CPU's program counter would be set to the word at `0xfffc`. --- ## Stage 0 - A 6502 in Javascript Well, on a simple level we need: * A class to represent the CPU and its registers * Methods on that class for the various "instructions" the CPU supports * A big array of numbers to act as memory We'll make an instance, load the cartridge into memory, and then just go! -- So how many methods do we have to write? --- ## Stage 0 - Emulating the 6502, Instructions The 6502 has 56 instructions it knows. You don't need to learn them all! Let's just skim them to get a general idea. -- We can group them into 5 rough categories: -- * **Arithmetic and Logic** * ROL, ROR (rotate left and rotate right) * ASL, LSR (shift left and shift right) * ADC, SBC (add and subtract with carrying) * INC, INX, INY (increment a number) * DEC, DEX, DEY (decrement a number) * CMP, CPX, CPY (compare numbers ==) * AND, ORA, EOR (logical && and ||, exclusive or/XOR) * BIT (test data with accumulator) -- * **Moving Data** * LDA, LDX, LDY (load data from memory) * STA, STX, STY (store data into memory) * TAX, TAY, TSX, TXA, TXS, TYA (transfer data between registers) --- ## Stage 0 - Emulating the 6502, Instructions Pt. 2 * **Branching and Control Flow** * BCC, BCS (branch if result did or didn't involve a carry) * BEQ, BNE (branch if result was equal or not equal) * BMI, BPL (branch if result was negative or positive) * BVC, BVS (branch if result did or didn't overflow a byte) * JMP, JSR (Jump "directly", Jump to Subroutine) * RTI, RTS (Return from Interrupt, Return from Subroutine) * BRK, NOP (Force break, No Operation / Do nothing) -- * **Status Flags** * CLC, SEC (clear or set carry flag) * CLD, SED (clear or set decimal flag) * CLI, SEI (clear or set interrupt flag) * CLV (clear overflow flag) -- * **Stack Operations** * PHA, PLA (Push and Pull Accumulator from Stack) * PHP, PLP (Push and Pull Status from Stack) --- ## Stage 0 - Emulating the 6502, Addressing Modes Unfortunately, the 6502 has ~150 opcodes. **Opcode**: A specific instance of an instruction that uses a given *Addressing Mode*. -- If you recall from earlier, when programming assembly *we don't have variables*. We have registers and addresses. That means if we want to use an argument, we need to know it's address. -- But while NES registers are 8 bits, addresses are 16 bits, as there is 64K of addressable memory. -- It would be inefficient storage and performance-wise to always require a full 16-bit address. -- Also, there are many common patterns of accessing memory that we might be able to simplify. -- Thus, each instruction supports 1 or more ways to have arguments "passed" to it. Some support as many as 8. These different ways of passing data are called `Addressing Modes`. --- # Let's Look At The SMB Boot Code Again... But in **NEScavation** this time! -- Note how one `LDA` instruction is represented in memory as `$a9` and another is represented as `$bd`. That's because these are 2 *different* LDA instructions. One for the "Immediate" addressing mode, One for "AbsoluteX" mode. --- ## Stage 0 - Emulating the 6502, Addressing Modes Pt. 2 So, what are all the addressing modes? -- 1. Implied mode - The instruction doesn't take an argument. 2. Immediate mode - A constant is stored in the next byte. 3. Accumulator mode - Similar to implied. Used to visually distinguish from implied mode in assembly. 4. ZeroPage, zpX, zpY - Zero Page is the memory range from `$00` - `$ff`. * ZeroPage is very common for frequently accessed variables and is "cheap" since it only takes 1 byte. ZeroPage instructions also take fewer CPU cycles to fetch than the more advanced modes. * X and Y are there to also add the value in the X or Y register as an offset, say for looping through an array. 5. Absolute, absoluteX, absoluteY - Same idea but for 2-byte/16-bit addresses covering `$0000` - `$ffff`. 6. Indirect, indirectX, indirectY - Used for "pointer chasing". I.e. "This memory address, holds another memory address, get the data in that one." 7. Relative - Used exclusively for branches to nearby code/conditionals. * Because you can only store a one byte offset to go forward or back 128 bytes. * If you need to go to an entirely different region of code, you can use JMP/JSR which support Absolute and Indirect addressing. -- One final tricky bit is that some instructions use *the address* but most instructions use the **byte stored at the address**. --- ## Stage 0 - Overall Design Okay, so our model has to change a little from before... -- * A class to represent the CPU and its registers * Methods on that class for the various "instructions" the CPU supports * A big array of numbers to act as memory * Methods for each of the addressing modes that "compute" the argument * Some way for the addressing modes to know whether to use the address or the byte stored there -- So how do most people do it? Not to be rude but ... [jsnes][jsnes] [jsnes]: https://github.com/bfirsh/jsnes/blob/master/source/cpu.js#L116 -- Okay, how are we gonna do it? (Note to self: show step, init, memory) With closures~!!!eleventy * [Addressing Modes](https://github.com/kingcons/nescavation/blob/master/src/js/memory/memory.js) * [Opcodes](https://github.com/kingcons/nescavation/blob/master/src/js/cpu/cpu.js) * [Init](https://github.com/kingcons/nescavation/blob/master/src/js/cpu/init.js) --- ## Stage 0 - Correctness? How do you know what the hardware should do? * **COPIOUS REFERENCES** * Especially nesdev wiki. * **COPIOUS TECHNICAL DOCUMENTATION** * Linked to from nesdev wiki. * Other people's source code is good too but a mixed bag. -- How do you actually test your emulator code? * Unit tests, Blind hope, etc? -- Boy, it sure would be nice. For several reasons, this doesn't seem prevalent. Most importantly, it's normally a combination of several (usually *many*) instructions that lead to the important test cases. -- E.g. "If an NMI was sent by the video card, and an indirect jmp is done that crosses a page boundary, are bits x and y in the status register set?" -- [Test ROMs][test-roms] are more common. Programs written in 6502 assembly that will infinite loop or otherwise behave incorrectly if your emulator is busted. -- How to read the results/output varies by ROM usually. Not exactly like you have a "command line" for the CPU available. [test-roms]: https://wiki.nesdev.com/w/index.php/Emulator_tests --- ## Stage 0 - Tying It All Together In traditional languages, an infinite loop is common for a Game's main loop. -- ```javascript while (true) { update(); draw(); } ``` -- Why doesn't this work in Javascript? Well, one reason is that we can't actually *make* the browser draw, it's in charge of rendering. So if we use a `while (true)` loop, we'll just hang the browser. -- Additionally, we should consider that an infinite loop to "grab the next instruction and run it" may be *faster* than it should be. Our JS Nintendo CPU may run faster than the **actual** Nintendo. So we need to slow it down. We need a clock! --- ## Stage 0 - Tying It All Together, Pt. 2 Your next thought might be "setInterval" lets us run something every couple milliseconds. That's a clock! -- But `requestAnimationFrame` is better. It syncs us with the browsers refresh/render loop for best performance and provides the callback it is given with *High Resolution Timestamps*. -- Using those timestamps we can compute how many CPU cycles "should" have passed and run the CPU until that many cycles have passed. -- Ask me during Q&A "How exact to we have to be with the cycles?" --- ## For Next Time... * Stage 1 - Memory Mapping * Stage 2 - Pictures * Stage 3 - Audio * Stage 4 - Integration * Stage N - Tooling --- ## Stage N - Oh the Places We'll Go Well, see [Frodo Redpill][frodo]... -- Next for me... * Graphics support! * Audio support! * MMC3 so we can play Super Mario Bros 3. And TMNT 2, obvi. -- * A Proper Assembler * Tools for Inspecting Hardware States * Name Table / Pattern Table / Palettes * Watch all writes to memory mapper / Level loading * Song Recorder/Editor? Separate NES synth/tracker? * Tools for exploring the game code? Graphing the execution paths? [frodo]: https://www.youtube.com/watch?v=tjcvR5McmSg --- ## Conclusions * You too can be a systems programmer or hardware hacker! * Just more time than usual reading docs, maybe more whiteboard time. -- Some things are easier at low-level: * No framework to be sneaky. No magic anywhere! * Your brain knows the answer, you just have to work it out. -- Some things are harder at low-level: * Specifically, not many debugging tools for your half-working "hardware". * How to test is somewhat tricky. --- ## Further Resources * [I Am Error - MIT Press][i-error] * [Reverse Engineering the 6502][reverse6502] - CCC lecture * A major inspiration to begin this project. SO COOL TO ME! * [Visual6502][visual6502] * [Easy6502][easy6502] by skilldrick - Get your assembly on! * [Statically Compiling NES Games][static-nes] by Andrew Kelley * [Nesdev reference guide][nesdev] [i-error]: https://www.amazon.com/Am-Error-Nintendo-Computer-Entertainment/dp/0262028778 [visual6502]: http://www.visual6502.org/ [reverse6502]: https://media.ccc.de/v/27c3-4159-en-reverse_engineering_mos_6502 [easy6502]: https://skilldrick.github.io/easy6502/ [static-nes]: http://andrewkelley.me/post/jamulator.html [nesdev]: http://wiki.nesdev.com/w/index.php/NES_reference_guide --- #### Thanks for letting me share my passions with you. <3 ![it-me](http://redlinernotes.com/images/XO_Excitement.jpg) --- # (Bonus Slides) ## Stage 1 - Memory Mapping --- ## Stage 2 - Pictures Tell The Story * Screen resolution output by the NES was 256x240! * The PPU actually renders left to right, top to bottom, in "scanlines" working as fast as it can. * Interestingly, the PPU is clocked to do 262 scanlines a second, even though only 240 actually draw/render pixels. * This is done to give the CPU a chance to update the data for the next frame. After scanline 240, a NMI interrupt is sent to the CPU that jumps to a programmer specified loop. * This is called "Vblank" or "vertical blank". * Only 2260 CPU cycles per frame that can be used to access video memory (during VBlank). PPU is fetching data, dragons lurk here. --- ## Stage 3 - Tying It Together with Clocks & Loops --- ## Stage 3 - On Accuracy and Performance * byuu / [higan][higan] * michael steil / visual 6502 * [Accuracy vs Compatibility][accuracy] * [How important is Cycle Accuracy?][cycles] * "But does my clock *really* have to be right? Will the trains run on time?" [higan]: http://byuu.org/emulation/higan/ [cycles]: http://forums.nesdev.com/viewtopic.php?f=3&t=14359 [accuracy]: http://wiki.nesdev.com/w/index.php/Accuracy -- Preservation or Modeling? Is our goal perfect simulation of hardware? * Why and how did it actually work? Or usability, fun, and nostalgia? * "Good enough".