The Ghost in the Machine ~ CSIT 534 OS Study Blog: Part 2: How Microchips are the PC's Brain

Friday, February 25, 2011

Part 2: How Microchips are the PC's Brain

This section deals with the nature of digital information: from vacuum tube, to transistor, to any type of microchip, they are all essentially switches that when arranged in special patterns accomplish different tasks.

Chapter 5: How Transistors Manipulate Data

As the basic building block from which all microchips are built, the transistor is at the heart of a computer’s operation. This chapter begins with an introduction to binary numbers and Boolean logic which has its roots in Gottfried Leibniz’s binary arithmetic (1679 CE). By constructing different configurations of transistors into logic gates which are further combined into arrays of half adders and full adders, logical operations can be performed on series of bits. Depending upon the length of the byte involved, more transistors are needed to handle the operation. For instance, more than 260 transistors are needed to create a full adder for operations on 16-bit numbers.

Transistors work by using a small electrical current to open a “gate” that allows stronger current to pass through. Transistors are made in integrated circuits by connecting a positive lead to a strip of polysilicon inside a layer of non-conductive silicon dioxide. When current is applied to the polysilicon a positive charge builds and electromagnetically causes electrons to gather in the p-type silicon base below the silicon dioxide layer. This collecting of electrons in the p-type silicon base allows current to flow from source to drain via two n-type silicon leads separated by a channel of (now charged) p-type silicon. When no current is present on the polysilicon gate, the n-type silicon cannot transmit current across the p-type silicon substrate. Set this up in arrays of millions of such transistors and voila you have a microchip (or at least a transistor array).

RAM uses these transistor arrays to encode data by controlling the gates via “address lines” and the sources via data lines on a grid. Each address and data line combination reference a single bit of data. Capacitors on the drain side of the transistor store the charge and record the bit: 1 for a charged capacitor and 0 for an uncharged capacitor. I was surprised to learn that this process is repeated continuously while the computer is on to prevent capacitors from losing their charge (and thus losing your data). When software wants to read data in RAM the address line is again charged which opens the gates along that address line and each charged capacitor discharges sending a pulse along the data line that again uniquely identifies that bit as a 1. The combination of 1s and 0s from each data line along the address line forms a single byte of data.

DDR was able to double the speed of SDRAM (which had become a bottle neck to computer speeds) by allowing the memory controller to both read and write to memory on a single clock cycle. The book is somewhat vague on this point, but through additional research I discovered that this process happens by reading on the uptick of the clock step function and writing on the downtick, thereby effectively doubling the speed of the DDR RAM.

Similarly, DDR2 RAM uses a dual channel architecture which creates an additional pipeline to supply memory with data and ensure that each pulse of the clock is productive. I believe DDR3 RAM expands on this concept by adding further channels to supply memory with data to keep up with the capabilities of the memory controller. (See the Guru of 3D blog)

Unlike RAM which loses data in the absence of ongoing electrical refresh, Flash memory retains its data even when disconnected. Word addresses run perpendicular to bit addresses and together address a cell. Each cell has two transistors: a control gate connected to the word line and a floating gate separated from the control gate by a layer of metal oxide. Using Fowler-Nordheim tunneling current flows through the bit line through the source to ground which causes electrons to “boil off” through the metal oxide layer to the control gate where they are trapped and repel further charge from the floating gate. A bit sensor detects the difference in charge between the two transistors and if the charge on the control gate is below 50% of that on the floating gate it is considered to be a 0.

Chapter 6: How a Microprocessor Works

Analogous to a set of master switches that control many other switches the CPU is where the action happens. Using registers to record intermediary data the microprocessor has specialized computational areas including the ALU which carries out math instructions and the control unit which herds instructions and data through the processor. Registers record by their address the meaning of the data they hold, these registers include the memory data register, the program counter register and the accumulation registers. There is next a discussion of the combination of logic gates required for computation: XOR, AND, half adders and full adders.

The microprocessor moves data quickly by having the L2 cache on the same interface as the processor. Data enters the processor through the BIU (Bus Interface Unit) which sends program code to the Level 1 instruction cache (i-cache) and data to the Level 1 data cache (d-cache). The BTB compares i-cache instructions with a separate record to look for branching operations that can be optimized by looking at past precedents (with a 90% effective rate). Meanwhile the The fetch/decode unit uses three parallel decoders to break complex instructions into micro-ops, again to expedite the work of the processor. From the decode unit micro-ops are sent into the reorder buffer which contains two ALUs to handle integer calculations. The reorder buffer is circular with a head and tail that allows the micro-ops to be ordered as specified by the BTB. The dispatch/execute unit uses speculative execution, executing up to five micro-ops simultaneously. It reads around the buffer and confirms that each micro-op in the buffer has all the necessary information for processing and when it comes across one that is ready it process it, executes it, stores the result in the micro-op and marks it complete. If data from memory is needed it is skipped and the necessary information is sought first in the L1 and then in the L2 cache, meanwhile it continues to move around the buffer checking and executing micro-ops. The ALUs hand off floating point calculations to the floating point math unit. When delayed micro-ops are finally processed the result is checked against the BTB prediction and if the prediction fails the jump execution unit (JEU) moves the end marker ahead of the failed result to signal they should be ignored and overwritten by new micro-ops, meanwhile the BTB is told the prediction was incorrect and it stores that data for future predictions. The retirement unit checks the circular buffer from the head for three consecutive completed micro-ops. When it finds them they are moved to the store buffer as a group of three where the prediction unit performs a final check before they are sent out to system RAM.

Multicore processors use hyper-threading to split operations between the multiple processors. Threaded or multithreaded software is specifically designed to take advantage of multicore architecture, however even standard software can benefit from this enhanced architecture as the operating system can create an affinity between a core and a particular program or function to take advantage of the parallel processing power. A time-staggered queue helps prevent traffic jams in shared resources.

Another way to enhance performance is to speed up the system clock by overclocking. With the speed of contemporary computer a crystal oscillator is already paired with a multiplier to create an overtone frequency higher than the crystal’s fundamental resonance. The operating frequency can by calculated as Frontside bus (Hz) x CPU Multiplier x Crystal’s resonance = Operating Frequency (Hz). This can create additional heat which has to be distributed through enhanced heat sinks, heat pipes, water cooling, Peltier cooling (though this uses a lot of power) or even cooling with Crisco submersion (or similar insulating and heat conductive solutions).

The Ghost in the Machine ~ CSIT 534 OS Study Blog

Search This Blog

Friday, February 25, 2011

Part 2: How Microchips are the PC's Brain

No comments:

Post a Comment

Followers

Blog Archive