Search This Blog

Friday, February 25, 2011

Part 2: How Microchips are the PC's Brain

This section deals with the nature of digital information: from vacuum tube, to transistor, to any type of microchip, they are all essentially switches that when arranged in special patterns accomplish different tasks.

Chapter 5: How Transistors Manipulate Data

As the basic building block from which all microchips are built, the transistor is at the heart of a computer’s operation. This chapter begins with an introduction to binary numbers and Boolean logic which has its roots in Gottfried Leibniz’s binary arithmetic (1679 CE).  By constructing different configurations of transistors into logic gates which are further combined into arrays of half adders and full adders, logical operations can be performed on series of bits. Depending upon the length of the byte involved, more transistors are needed to handle the operation. For instance, more than 260 transistors are needed to create a full adder for operations on 16-bit numbers.
Transistors work by using a small electrical current to open a “gate” that allows stronger current to pass through. Transistors are made in integrated circuits by connecting a positive lead to a strip of polysilicon inside a layer of non-conductive silicon dioxide. When current is applied to the polysilicon a positive charge builds and electromagnetically causes electrons to gather in the p-type silicon base below the silicon dioxide layer. This collecting of electrons in the p-type silicon base allows current to flow from source to drain via two n-type silicon leads separated by a channel of (now charged) p-type silicon. When no current is present on the polysilicon gate, the n-type silicon cannot transmit current across the p-type silicon substrate. Set this up in arrays of millions of such transistors and voila you have a microchip (or at least a transistor array).
RAM uses these transistor arrays to encode data by controlling the gates via “address lines” and the sources via data lines on a grid. Each address and data line combination reference a single bit of data. Capacitors on the drain side of the transistor store the charge and record the bit: 1 for a charged capacitor and 0 for an uncharged capacitor. I was surprised to learn that this process is repeated continuously while the computer is on to prevent capacitors from losing their charge (and thus losing your data). When software wants to read data in RAM the address line is again charged which opens the gates along that address line and each charged capacitor discharges sending a pulse along the data line that again uniquely identifies that bit as a 1. The combination of 1s and 0s from each data line along the address line forms a single byte of data.
DDR was able to double the speed of SDRAM (which had become a bottle neck to computer speeds) by allowing the memory controller to both read and write to memory on a single clock cycle. The book is somewhat vague on this point, but through additional research I discovered that this process happens by reading on the uptick of the clock step function and writing on the downtick, thereby effectively doubling the speed of the DDR RAM.
Similarly, DDR2 RAM uses a dual channel architecture which creates an additional pipeline to supply memory with data and ensure that each pulse of the clock is productive. I believe DDR3 RAM expands on this concept by adding further channels to supply memory with data to keep up with the capabilities of the memory controller. (See the Guru of 3D blog)
Unlike RAM which loses data in the absence of ongoing electrical refresh, Flash memory retains its data even when disconnected. Word addresses run perpendicular to bit addresses and together address a cell. Each cell has two transistors: a control gate connected to the word line and a floating gate separated from the control gate by a layer of metal oxide.  Using Fowler-Nordheim tunneling current flows through the bit line through the source to ground which causes electrons to “boil off” through the metal oxide layer to the control gate where they are trapped and repel further charge from the floating gate. A bit sensor detects the difference in charge between the two transistors and if the charge on the control gate is below 50% of that on the floating gate it is considered to be a 0.

Chapter 6: How a Microprocessor Works

Analogous to a set of master switches that control many other switches the CPU is where the action happens. Using registers to record intermediary data the microprocessor has specialized computational areas including the ALU which carries out math instructions and the control unit which herds instructions and data through the processor. Registers record by their address the meaning of the data they hold, these registers include the memory data register, the program counter register and the accumulation registers. There is next a discussion of the combination of logic gates required for computation: XOR, AND, half adders and full adders.
The microprocessor moves data quickly by having the L2 cache on the same interface as the processor. Data enters the processor through the BIU (Bus Interface Unit) which sends program code to the Level 1 instruction cache (i-cache) and data to the Level 1 data cache (d-cache). The BTB compares i-cache instructions with a separate record to look for branching operations that can be optimized by looking at past precedents (with a 90% effective rate).  Meanwhile the The fetch/decode unit uses three parallel decoders to break complex instructions into micro-ops, again to expedite the work of the processor. From the decode unit micro-ops are sent into the reorder buffer which contains two ALUs to handle integer calculations. The reorder buffer is circular with a head and tail that allows the micro-ops to be ordered as specified by the BTB. The dispatch/execute unit uses speculative execution, executing up to five micro-ops simultaneously. It reads around the buffer and confirms that each micro-op in the buffer has all the necessary information for processing and when it comes across one that is ready it process it, executes it, stores the result in the micro-op and marks it complete. If data from memory is needed it is skipped and the necessary information is sought first in the L1 and then in the L2 cache, meanwhile it continues to move around the buffer checking and executing micro-ops. The ALUs hand off floating point calculations to the floating point math unit. When delayed micro-ops are finally processed the result is checked against the BTB prediction and if the prediction fails the jump execution unit (JEU) moves the end marker ahead of the failed result to signal they should be ignored and overwritten by new micro-ops, meanwhile the BTB is told the prediction was incorrect and it stores that data for future predictions. The retirement unit checks the circular buffer from the head for three consecutive completed micro-ops. When it finds them they are moved to the store buffer as a group of three where the prediction unit performs a final check before they are sent out to system RAM.
Multicore processors use hyper-threading to split operations between the multiple processors. Threaded or multithreaded software is specifically designed to take advantage of multicore architecture, however even standard software can benefit from this enhanced architecture as the operating system can create an affinity between a core and a particular program or function to take advantage of the parallel processing power.  A time-staggered queue helps prevent traffic jams in shared resources.
Another way to enhance performance is to speed up the system clock by overclocking. With the speed of contemporary computer a crystal oscillator is already paired with a multiplier to create an overtone frequency higher than the crystal’s fundamental resonance.  The operating frequency can by calculated as Frontside bus (Hz) x CPU Multiplier x Crystal’s resonance = Operating Frequency (Hz).  This can create additional heat which has to be distributed through enhanced heat sinks, heat pipes, water cooling, Peltier cooling (though this uses a lot of power) or even cooling with Crisco submersion (or similar insulating and heat conductive solutions).

Friday, February 18, 2011

Part 1: Boot-Up Process

The author starts with an interesting overview of the history of computers from mechanical computing devices, from John Napier's "Napier's Bones" (1614) -- a forerunner of the sliderule -- and William Oughtred's sliderule (1621) which remained the cutting edge of computational technology for nearly 350 years, to Charles Babbage's "Difference Engine" (1822) and concept for the Analytical Engine (1830), and Herman Hollerinth's electric tabulating system for the US Census Bureau (1890) that lead to his formation of the Tabulating Machine Company (1896) which ultimately becomes IBM.
Against this backdrop of computing innovation there is a discussion of semiconductors and the evolution from vacuum tubes to transistors to integrated circuits and designers attempted to add computational power while reducing energy usage to avoid issues of overheating.
The first comsumer computer, the "Altair 8800" appeared in 1975. It consisted of a box of switches and LEDs without a screen or keyboard or mouse. As it did not come with an operating system, computer hobbyists (as they were then known) would have to program the Altair by turning switches on and off, then decode the resulting answers by charting the changes in the patterns of illuminated LEDs.
It wasn't until two such hobbyists (or hackers), Steve Jobs and Steve Wozniak put together a computer that came with a display, built-in keyboard and disk storage called the Apple, and began selling it at computer clubs an approximation of contemporary computers were available. Until that time, computers had been primarily the domain of highly trained computer technicians who input data and instructions using punch cards or magnetic tape then interpreted the output results. With the Apple, Radio Shack, Commodore and IBM PC can a new generation of computers that were affordable, available and usable by the consumer masses.

Chapter 1: Getting to Know the Hardware
Here we get look inside a standard "Wintel" desktop computer. Components are identified and their purposes explained. Similarly, we get a glimpse inside a Notebook PC to see how the hardware layout changes given the size constraints of a portable computer. Next, we explore Tablet PCs and the technologies behind the touchscreen: resistive, capacitive, surface acoustic wave, optical, and dispersive signal sensing.
Next there is a discussion of handwriting recognition: the ink (or raw handwriting path data) is processed by a character expert that takes a two-pronged approach. First it looks at stroke timing by looking at the ink as a sequence of time-separated points. The second approach uses a spacial matrix that compares the ink to a database of shapes. If the ink is a continuous script it is processed through a segmentation expert that identifies how to parse out the letters. Next a language expert application analyzes the result to contextually verify the result and identify any unrecognized letters or words.

Chapter 2: How Circuits Juggle Data
Here we begin with an overview of circuit boards, tracings, pin connectors and board components such as capacitors and resistors. Next we examine the complete motherboard layout for a big-picture perspective on the role of each component in the processing of data. Exta attention is given to the chip set (the North Bridge - which handles RAM and video - and the South Bridge - which handles I/O from all other ports) that facilitate the operation of the CPU. There is discussion of PCI-E that allows for point-to-point links through the South Bridge's crossbar switch. PCI-E is scalable and allows for channel bonding to increase the bandwidth for a particular component.

Chapter 3: How a PC Comes Alive 
When a computer is turned on BIOS runs code that begins power-on self-test (POST) and search drives for an operating system. But first, BIOS collects device information from CMOS (which stores information on what components are installed on the computer). Next BIOS loads device drivers and interrupt handlers into memory. CPU checks the real-time clock and sends signals over the system bus to make sure all basic components are functioning. Display adaptor and video signals are tested and display BIOS code is added to overall system BIOS. Next BIOS check to see if it needs to test RAM, if it is a cold boot (meaning the computer was off) all RAM is tested, if it is a warm boot, or reboot, it skips the rest of POST. If it is a cold boot, BIOS runs a series of tests to ensure that RAM is functioning correctly. Finally, POST completes and BIOS transfers control to the OS.
Some attention is given to the hard disk boot sector which contains instructions for loading the OS and where to find all the necessary drivers, registry, services, and plug-and-play components.

Chapter 4: How an Operating System Controls Hardware
The operating system was originally designed to communicate with disk drives, but has taken on a hugely expanded role as the options for devices and peripherals expands geometrically. Controllers plug into the motherboard and translate instructions from the BIOS/driver into signals that the device can use. Controllers relay a signal to the interrupt controller which manages the IO Bus. Two types of interrupts exist: INTR (which is used for normal interrupts that can be deprioritized) and NMI (or a non-maskable interrupt that like a CTRL-ALT-DEL takes priority). When the CPU receives an interrupt it records a memory address to a stack so it can resume what it was doing when the interrupt is complete. The computer has 256 types of interrupts and stores in the interrupt descriptor table (IDT) the appropriate interrupt service routine (ISR) for each. When the ISR completes its job it sends a return from interrupt, RET, instruction to the CPU which takes that cue to look up the stack and find the memory address of its last function.
Next there is a discussion of plug-and-play and how the configuration manager adds ennumerators to itself for the devices plugged into the system. These ennumerators are stored in a hardware tree, which is examined by the OS for resource arbitration to ensure that each device has a unique interrupts (IRQ) and confirms with the device that IRQ it should "listen" for. Finally, the OS searches for the appropriate devices driver and loads it into memory to complete the boot.
Keeping track of all this information and more is the registry which uses five root keys with numerous subkeys  and numerouse values beneath those to create a road map for the OS. The five (fairly self explanatory) keys are:
(1) HKEY_CLASSES_ROOT (HKCR)
(2) HKEY_CURRENT_USER (HKCU)
(3) HKEY_LOCAL_MACHINE (HKLM)
(4) HKEY_USERS (HKU)
(5) HKEY_CURRENT_CONFIG (HKCC)

The registry uses five primary data types for values:
(1) String value - plain text and numbers (most common)
(2) string array value - multiple strings of plain text and numbers
(3) expanded string value - points to locations
(4) DWORD values - numbers (hex in reg)
(5) binary values

Thursday, February 10, 2011

Bought my book and started reading...

Yesterday, I began reading the first of three books for this course, How Computers Work (9th Edition), written by Ron White and illustrated by Tim Downs. I'm about 40 pages in so far and I really enjoying the straightforward approach to the workings of the computer with simple but engaging illustrations. The book actually reminds me of one of my favorites growing up, The Way Things Work by David Macaulay, which uses fun illustrations to teach readers about the principles of physics at work in a variety of everyday objects.
Thus far, How Computers Work, is a thorough and humorous review of computer components. I'm looking forward to delving deeper into these topics over the coming weeks.