Main Content

Building a homebrew CPU from scratch takes a large number of logic chips. It is understandable, that implementing registers, program counter, ALU, and other components of the CPU in TTL or CMOS logic does require a substantial amount of chips. But how many exactly?

I’ve tried to optimize my homebrew CPU for the lowest amount of logic chips possible and answer a question:
How few ICs are required for a Turing-complete CPU without a CPU?

My answer is a 16-bit Serial CPU with only 8 ICs, including memory and clock. It has 128kB SRAM, 768kB FLASH, and can be clocked up to 10MHz. It contains only a 1-bit ALU, but the majority of its 52 instructions operate on 16-bit values (serially). At its maximum speed, it executes roughly 12k instructions per second (0.012MIPS) and, among other things, is capable of streaming a video on PCD8544 based (Nokia 5110) LCD at ~10 FPS.

Depending on where you place the line between a state machine and a CPU, my 16-bit system might actually be the CPU with the lowest amount of ICs. Although, some other contestants are Jeff Laughton’s 1-bit computer with 1 instruction and 1-bit of memory, and Daniel Thornburgh’s Simple CPU with 1 byte-byte-jump instruction and memory simulated on a Raspberry PI.

Hardware:
The architecture is inspired by other CPU builds like James Sharman’s JAM-1, Ben Eater’s SAP-1, Warren’s 4-bit Crazy Small CPU, its 8-bit version, and others. All of them, and many others alike, use a “control” EEPROM, EPROM, or ROM for generating control signals to the CPU components. Because it is way easier than generating them by logic circuits alone, and because it offers more flexibility in the future, I’ve also decided to use such a “control” memory, specifically, an EPROM. Contrary to the builds mentioned above, I’ve aimed for the lowest possible chip count, so I’ve tried to “squeeze” as much data processing inside the memory as possible, to either lower the demands on other CPU components or better yet, eliminate them completely. Here are some key steps taken:

- Completely eliminating the ALU and implementing it as a lookup. Because most EPROMs have only 8-bit output and the system also needs other control signals, the data width of the ALU has to be drastically limited. Not to worry, it can be reduced all the way down to a single bit: 1-bit computing is actually all we need.
- To get any meaningful computation done, the output from the 1-bit ALU has to be serialized. That is a perfect use case for a serial SRAM, which also brings other benefits. First, it eliminates the need for registers, since all ALU operations can be performed directly on the data in SRAM. Second, serial SRAMs are also addressed serially, so there is no need to latch the source and destination addresses. Third, an arbitrary data processing width can be achieved just by selecting the number of SRAM clock cycles. I chose 16 bits (16 SRAM clock cycles per 1 ALU operation) as a nice compromise between utility and speed.
- At least 2 serial SRAM chips are required, one of them has to provide a serialized input to our 1-bit ALU, and, at the same time, the second one has to store the result.
- For ALU operations with 2 operands (like ADD/AND/XOR…), 2 serialized inputs are needed. Adding a third SRAM could certainly be an option (2 for ALU inputs, 1 for result), but there is a better solution. If a serial FLASH memory is used instead of an SRAM, the same benefits remain (already serialized data, serialized address), but the FLASH can be used for storing the instructions/program as well as providing the ALU input.
- It is unnecessary to add any hardware for a program counter, as there is already plenty of space inside the SRAMs where its value can be stored.

Link to article