- Information
- AI Chat
Complete CSC 326 - Lecture notes
Computer architecture (CSC 326)
Joseph Ayo Babalola University
Students also viewed
Preview text
COURSE MATERIAL
ON
CSC 326
COMPUTER ARCHITECTURE II
PRODUCED By FADARE OLUWASEUN GBENGA
This book is a continuation of series of learning processes that every computer student must acquire in the act of learning Computer Architecture as a course.
1 Module 1: Pipelining
1. Learning Outcomes: After completing this module, the students should be able to: (1) Describe the primitive way of processor’s instruction fetching (2) Describe, understand pipelining and it works (3) Compute the cycle time of a processor with different degrees of pipelining (4) Understand the concepts of instruction hazard One of the two techniques that are widely used for improving processor performance is pipelining .Pipelining is a mechanism for improving processor performance. This process allows a processor to overlap the execution of several instructions so that more instructions can be executed in the same period of time. Early computers executed instruction in a primitive format: The processor fetches an instruction from memory, decoded it to determine what the instruction was, that is, its nature of the instructions, read the instruction’s input from the register file, performed the operations required by the instruction, and wrote the result back into the register file. Instructions that accessed memory were different from this, but each instruction was completely finished before the execution of the next one began. The general problem with this approach is that the hardware needed to perform each of these stages(instruction fetch, instruction decode, register read, instruction execution, and register write back) is different, so majority of these hardware is idle at any given moment, waiting for the other parts of the processor to complete their part of executing an instruction. This process wastes time and make most
pipelining processor gets executed in a single cycle. Pipelined processor can have reduced cycle time (more cycles/second) than unpipelined processors. Since the pipelined processor has a throughput of one instruction/cycle, the total number of instructions processor executed per unit time is higher in the pipelined processor, giving better performance. The cycle time of a pipelined processor is dependent on four factors: the cycle time of the unpipelined diversion of the processor, the number of pipelined stages, how evenly the datapath logic is divided among the stages and the latency of the pipeline latches. Pipelining can reduce a processor’s cycle time and thereby increase instruction throughput, it increases the latency of the processor by at least the sum of all the pipeline latch latencies. The latency of a pipeline is the amount of time that a single instruction takes to pass through the pipeline, which is the product of the number of pipeline stages and the clock cycle time. 1 Instruction hazards There are a number of factor that limit a pipeline’s ability to execute instructions at its peak rate, including dependencies between instructions, branches and the time requires to access memory. Instruction hazard (dependencies) occurs when instructions read or write registers that are used by other instructions. They are divided into four categories, depending on whether the two instructions involved read or write each other’s register. Read-after-read hazard occurs when two instructions both read from the same register. RAW hazards occur when an instruction reads a register that was written by a previous instruction. The actual performance of a pipelined system is generally limited by
data dependencies within a program. These data dependencies are read-after write, write- after read and write-after write. WAR and WAW dependencies are also known as name dependencies, as they only occur because the processor has a limited number of registers to store results in, which must be reused over the course of a program’s execution. Branches also limit a pipeline’s performance, because the processor must stall until the branch has completed execution. Branches instructions can also cause delays in pipelined processors, because the processor cannot determine which instruction to fetch next until the branch has executed. However, branch instruction particularly conditional branches, create data dependencies between the branch instruction and the instruction fetch stage of the pipeline, since the branch instruction computes the address of the next instruction that the instruction fetch stage should fetch.
1 Self-Assessment Questions i. Why does pinelining improve performance? ii. What are the limits on how much a processor’s performance can be improved using pipelining? iii. Given an unpipelined processor with a 10ns cycle time and pipeline latches with 0 ns latency, what are the cycle time of pipelined versions of the processor with 2,4, and 16 stages if the datapath logic is evenly divided among the pipeline stages? Also, what is the latency of each of the pipelines versions of the processor? iv. Identify all the RAW hazards in this instruction sequence: DIV r2, r5, r
2 Module 2: Instruction-Level Parallelism
2. Learning Outcomes: After completing this module, the students should be able to: (1) Understand the concept behind instruction-level parallelism and it is used. (2) Understand the concepts of register renaming (3) understand two common architectures for instruction-level parallelism 2 Instruction-Level Parallelism Instruction-Level parallel processors exploit the fact that many of the instructions in a sequential program do not depend on the instruction that immediately precede them in the program. Modern processors explore this technique (Instruction-Level Parallelism) by executing multiple instructions simultaneously further improving performance. Modern processor typically employ both pipelining and techniques that exploit Instruction-Level Parallelism to improve processor performance. Processors that exploit Instruction-Level Parallelism have been much more successfully than Multiprocessors in the general-purpose workstation PC. 2 Limitation of Instruction-Level Parallelism The performance of any ILP processor is limited by the amount of Instruction- Level Parallelism that the compiler and the hardware can locate in the program. Instruction-Level Parallelism is limited by several factors: data dependencies, name dependencies (WAR and WAW hazards) and branches. RAW dependencies limit performance by requiring that instructions be executed in sequence to generate the correct results, and they represent a huge limitation on the amount of Instruction-Level Parallelism available in the program. Branches limit instruction-level parallelism because
the processor does not know which instructions will be executed after a branch until the branch has completed. This requires the processor to wait for the branch to complete before any instructions after the branch can be executed. 2 Register Renaming WAM and WAM dependencies are sometimes referred to as” name dependencies,” because they are a result of the fact that programs are forced to reuse registers because of the limited size of the register file. These dependencies can limit Instruction-Level Parallelism on scalar processors, because it is necessary to ensure that all instructions that read a register complete the register read stage of the pipeline before any instruction overwrite that register. Register Renaming is a techniques that reduces the impact of WAR and WAW dependencies 2 Architectures for Instruction-Level Parallelism There are two common architectures for Instruction-Level Parallelism: superscalar processors and very long instruction word processors. VLIW processors rely on the compiler to schedule instructions for parallel execution by placing multiple operations in a single long instruction word. All of the operations in a VLIW instruction execute in the same cycle, allowing the compiler to control which instructions execute in any given cycle. VLIW processors can be relatively simple, allowing them to be implemented at high clock speeds, but they are generally unable to maintain compatibility between generations because any change to the processor implementation requires that programs be recompiled if they are to execute correctly. Superscalar processors, on the other hand, contain hardware that examines a sequential program to locate instructions that can be
3 Module 3: Computer Main Memory and its associated Attributes
3. Learning Outcomes: After completing this module, the students should be able to: (1) Understand the concepts of Computer Main memory and its characteristics (2) Understand the various characters associated with computer main memory (3) Understand general characteristic of memory system operations
3 Computer Main-Memory and its characteristics The main memory also known as the primary memory is a part of the central processing unit and is a combination of both RAM (random access memory) and ROM (read only memory). RAM : The random access memory is a read write memory i. information can be read as well as written into this type of memory. It is volatile in nature, i., the information it contains is lost as soon as the system is shut down unless 'saved' for further usage by users. It is basically used to store programs and data during the computer’s operation. ROM : The read only memory as the name may suggest contains information that can only be read, i., you can’t write on this type of memory. It is non-volatile or permanent in nature. It is basically used to store permanent programs such as program for the functioning of the monitor. The main memory is a fast-memory, i., it has small access time. It is because of its limited capacity that it is fast. The main memory contains the programs that are currently being worked on. It passes on this information to the control unit as and when required. In case the CPU wants to access some data that is present in a secondary storage
device, this data is first transferred to the main memory and then processed. The main memory is much more costly than the secondary storage devices. Although the ROM IC’s of various computers do not vary much in their capacities, the RAM chips are available in wide ranges of storage capacities. In fact, the capacity of the random access memory is an important specification of a computer. A larger RAM means larger programs (in terms of memory) can be loaded and executed. Suppose you want to run a 68-KB program on a machine with 64-KB. This means that the whole program cannot be loaded into the main memory at once resulting in either the non-execution of the program or a very slow execution. A 64-K memory means that there are approximately 64000 (65,536 to be precise) storage locations which can store 1 bit of data each. Different memories can be classified on the basis of these concepts:
- Access Mode : which means how easily they are accessible.
- Access time : the average time required to reach a storage location and obtain its content is called access time.
- Transfer Rate: the transfer rate is the number of characters or words that a device can transfer per second after it has been positioned at the beginning of the record.
- Capacity and cost : the capacity and cost may depend upon the requirement and the budget. The main memory has a very low access time and a very high transfer rate. It is limited in capacity and costlier than secondary storage devices. Double Data Rate : DDR, DDR2 and DDR3 stand for Double Data Rate. They are the measurement at which your memory transfers data chunks in a clock cycle. All memories
The Cache Memory Another important concept is that of the cache memory, which is also a part of the CPU. The cache memory lies in the path between the processor and the main memory. The cache memory therefore, has lesser access time than the main memory and is faster than the main memory. A cache memory may have an access time of 100ns, while the main memory may have an access time of 700ns. The cache memory is very expensive and hence is limited in capacity. Earlier cache memories were available separately but the latest microprocessors contain the cache memory on the chip itself. The need for the cache memory is due to the mismatch between the speeds of the main memory and the CPU. The CPU clock as discussed earlier is very fast, whereas the main memory access time is comparatively slower. Hence, no matter how fast the processor is, the processing speed depends more on the speed of the main memory (the strength of a chain is the strength of its weakest link). It is because of this reason that a cache memory having access time closer to the processor speed is introduced. The cache memory stores the program (or its part) currently being executed or which may be executed within a short period of time. The cache memory also stores temporary data that the CPU may frequently require for manipulation.
3 General Characteristics of Memory System Operations Computer memory exhibits perhaps the widest range of type, technology, organization, performance, and cost of any feature of a computer system. No one
technology is optimal in satisfying the memory requirements for a computer system. As a consequence, the typical computer system is equipped with a hierarchy of memory subsystems, some internal to the system (directly accessible by the processor) and some external (accessible by the processor via an I/O module). The complexity of computer memory is made more manageable if we classify memory systems according to their key characteristics. The most important of these are below: The term locatio n is referred to whether memory is internal and external to the computer system. Internal memory is often equated with main memory. But there are other forms of internal memory. The processor requires its own local memory, in the form of registers, the control unit portion of the processor may also require its own internal memory. Cache is another form of internal memory. External memory consists of peripheral storage devices, such as disk and tape, which are accessible to the processor via I/O controllers. Another characteristic of memory is its capacity. For internal memory, this is typically expressed in terms of bytes (1 byte = 8 bits) or words. Common word lengths are 8, 16, and 32 bits. External memory capacity is typically expressed in terms of bytes. A related issue is the unit of transfer. For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module. This may be equal to the word length, but is often larger, such as 64,128, or 256 bits. The "natural" unit of organization of memory. The size of the word is typically equal to the number of bits used to represent an integer and to the instruction length. Addressing units: many systems allow addressing at the byte level. In any case, the relationship between the length in bits A of an address and the number N of addressable units is 2A = N.
Associative : This is a random access type of memory that enables one to make a comparison of desired bit locations within a word for a specified match, and to do this or all words simultaneously. Thus, a word is retrieved based on a portion of its contents rather than its address. As with ordinary random-access memory, each location has its own addressing mechanism, and retrieval time is constant independent of location or prior access patterns. Cache memories may employ associative access. From a user's point of view, the two most important characteristics of memory are capacity and performance. Three performance parameters are used: Access time (latency): For random-access memory, this is the time it takes to perform a read or write operation, that is, the time from the instant that an address is presented to the memory to the instant that data have been stored or made available for use. Memory cycle time : This concept is primarily applied to random-access memory and consists of the access time plus any additional time required before a second access can commence. This additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively. Note that memory cycle time is concerned with the system bus, not the processor. Transfer rate: This is the rate at which data can be transferred into or out of a memory unit.
A variety of physical types of memory have been employed. The most common today are semiconductor memory, magnetic surface memory, used for disk a tape, and optical and magneto-optical.
Several physical characteristics of data storage are important. In a volatile memory, information decays naturally or is lost when electrical power is switched off. In a nonvolatile memory, information once recorded remains without deteriorate until deliberately changed; no electrical power is needed to retain information .Magnetic- surface memories are nonvolatile. Semiconductor memory may be -volatile or nonvolatile. Non-erasable memory cannot be altered, except by destroy the storage unit. Semiconductor memory of this type is known as read-only mere (ROM). Of necessity, a practical non-erasable memory must also be nonvolatile. The three forms of memory to be discussed are, typically, volatile and employ semiconductor technology. The use of three levels exploits the fact that semiconductor memory comes in a variety of types, which differ in speed and cost. Data are stored more permanently on external mass storage devices, of which the most com on are hard disk and removable media, such as removable magnetic disk, tape, and optical storage. External, nonvolatile memory is also referred to as secondary memory auxiliary memory. These are used to store program and data files and are usually invisible to the programmer only in terms of files and records, as opposed to individual bytes or words. Disk is also used to provide an extension to main memory own as virtual memory,
Other forms of memory may be included in the hierarchy is the using of semiconductor technology that is slower and less expensive than that of in memory. Other forms of secondary memory include optical and magneto-optical disks. 3 Error Correction
4 Module 4: Memory System
4. Learning Outcomes: After completing this module, the students should be able to: (1) Understand the concepts of Memory System (2) Understand the concepts of latency and bandwidth and how they relate to memory system (3) Understand the concept of memory hierarchy and be able to compute average memory access times for memory hierarchies (4) Understand the difference between DRAM and SRAM memory technologies. (5) Understand how caches are organized and implemented, and be able to define and describe cache terminology. (6) Understand virtual memory, virtual addresses, and physical addresses
4 Memory System. This aspect on computer architecture is centered on how memory systems are implemented in modern computer systems. The computer memory system is random- access memory. There are terms that are used to measure and describe the time taken to complete an individual operation and the rate at which operation are completed. In the course of this material, we are discussed Latency and throughput. An additional term that is used in discussing memory system is bandwidth, which describes the total at which data can be moved between the processor and the memory system. Bandwidth is the product of the throughput and the amount of data referenced by each memory operation.
Complete CSC 326 - Lecture notes
Course: Computer architecture (CSC 326)
University: Joseph Ayo Babalola University
- Discover more from: