Advanced Computer Architecture 3e

Home / Catalog / Advanced Computer Architecture 3e

Published by Vijay Nicole Imprints Private Limited

Publication Date: Sun Jan 01 00:00:00 UTC 2012 Available in all formats

ISBN: 9789393665850

- Facebook
- Twitter
- Linkedin
- Whatsapp
- Copy URL

PAPERBACK

EBOOK (EPUB)

EBOOK (PDF)

ISBN: 9788182093126 Price: INR 495.00

Add to cart Buy Now

Description

Table of contents

Biographical note

User Reviews

This book Advanced Computer Architecture is designed for a one-semester advanced course targeted at readers with requisite knowledge of computer atchitecture & organisation. The book now in its 3rd edition is a best seller in its category and retains all the winning features of its previous editions.

Cover
Title Page
Copyright Page
Contents
Preface
Acknowledgement
Chapter 1 Instruction Level Parallelism
- 1.1 Pipelining Recalled
  - Characteristics of a Pipeline
  - Pipeline Computers
- 1.2 Instruction Level Parallelism
- 1.3 Hardware And Software Approaches
- 1.4 Dependences
  - Data Dependence
  - Name Dependence
  - Data Hazards
  - Control Dependence
- 1.5 Dynamic Scheduling
  - Basic Out-of-order Execution
  - Scoreboarding
  - Tomasulo’s Approach
- Review questions
Chapter 2 Software and Hardware Solutions to ILP
- 2.1 Introduction
- 2.2 Extracting Parallelism with Hardware Support
  - Conditional or Predicated Instructions
  - Exceptions
  - Limitations of Conditional Instructions
  - Compiler-Directed Speculative Execution with Hardware Support
- 2.3 Performance Issues
  - Speculation in Multiple-issue CPUs
  - Speculation Mechanism: Hardware vs Software
  - Things to Remember About CPU Design
- 2.4 Compiler Support for Exploiting ILP Further
  - Loop-Level Parallelism (LLP) Analysis
  - Loop-carried Dependence Detection
  - Eliminating Dependences
  - Software Pipelining (Symbolic Loop Unrolling)
  - Global Code Scheduling
  - Trace Scheduling: Focusing on Critical Path
  - Super Blocks
- 2.5 Branch Prediction
  - Static Branch Prediction
  - Dynamic Branch Prediction
  - Hardware Dynamic Branch Prediction
- 2.6 Branch-Target Buffer
  - Structure of a Branch-Target Buffer
  - Basic Operation
  - Adding Prediction to Branch- Target Buffer
  - Branch Folding
- 2.7 Branch-Target Instruction Cache
- 2.8 Effects of Prediction on Performance
- 2.9 General Idea
- 2.10 Location of Prediction Bits
- 2.11 Accuracy of Branch Prediction
- 2.12 Limitation to the Benefits of Branch Prediction
- 2.13 The First Intel Pentium
- Review questions
Chapter 3 Multiple Instruction Issue Processors
- 3.1 Introduction
- 3.2 Superscalar
  - Hardware Requirements
  - Static Scheduling on a Superscalar Processor
  - Loop Unrolling and Scheduling on a Dual-issue Processor
  - Dynamic Scheduling on a Superscalar Processor
  - Multiple Instruction Issue with Dynamic Scheduling Example
- 3.3 Very Long Instruction Word (VLIW)
  - Superscalar vs. VLIW
  - Limits in Multiple-issue Processors
  - Limitations Specific to Superscalar or VLIW
- 3.4 Explicitly Parallel Instruction Computing (EPIC)
  - The Epic Philosophy
  - Permitting the Compiler to Play the Statistics
  - Communicating the POE to the Hardware
  - Architectural Features Supporting Epic
  - Static Scheduling
  - Addressing the Branch Problem
  - Unbundled Branches
  - Predicated Execution
  - Control Speculation
  - Predicated Code Motion
  - Addressing the Memory Problem
  - Cache Specifiers
  - Data Speculation
  - The History of EPIC
- 3.5 Advanced Compiler Support for Exposing and Exploiting ILP
- Review questions
Chapter 4 More of Multiple Instruction Issue Processors
- 4.1 Hardware Support for Exposing Parallelism
  - Conditional or Predicated Instructions
  - Exceptions
- 4.2 Hardware vs Software Speculation Mechanism
- 4.3 The Intel IA-64 Itanium Processor
  - The Itanium Processor
  - The Intel IA-64 Instruction Set Architecture
  - IA-64 Register Model
  - Instruction Formats and Support for Explicit Parallelism
  - Functional Units and Instruction Issue
  - Itanium Performance
- 4.4 Itanium
  - The Intel Itanium Architecture
  - Instruction Execution
  - Memory Architecture
  - Architectural Changes
  - Software Support
- 4.5 Limits of Instruction-Level Parallelism
  - Register Renaming
  - Alias Analysis
  - Branch Prediction
- Review questions
Chapter 5 Multiprocessors with Shared Memory Architectures
- 5.1 Introduction
- 5.2 Parallel Processing
  - Array Computers
  - Multiprocessor Systems
- 5.3 Parallel Architecture Taxonomy
  - Flynn’s Classification
- 5.4 Centralized (Symmetric) Shared Memory Architectures
- 5.5 Distributed Shared Memory Architectures
- 5.6 Communication Models and Memory Architecture
  - Shared Memory with Non-uniform Memory Access (NUMA)
  - Message Passing Multi-computers
- 5.7 Performance Metrics for Communication Mechanisms
  - Communication Bandwidth
  - Communication Latency
  - Communication Latency Hiding
- 5.8 Advantages of Communication Mechanism
  - Shared-memory Advantages
  - Advantages of Message-passing
  - Challenges of Parallel Processing
- 5.9 Cache Coherence
  - Consistency
  - Enforcing Coherence
  - Cache Coherence Protocols
  - Bus Snooping Protocols
  - Performance Differences between Write Invalidate and Write Update Protocols
  - Implementation of Write Invalidate Protocols
- 5.10 Distributed Shared Memory Architectures - Revisited
- 5.11 Directory-based Cache Coherence Protocols
  - Directory-based Coherence
  - Directory Operation
  - Uncached State
  - Shared State
  - Exclusive State
  - Performance Issues
- 5.12 Synchronization
  - Load-linked and store-conditional
  - Implementing Locks using Coherence
  - Barrier Synchronization
  - Sense-reversing Barrier
- 5.13 Memory Consistency Models
- 5.14 Introduction to Multithreading
  - Concepts and Examples
  - Process Model with Consideration of Threads
  - Benefits of Multithreading
- 5.15 Application of Threads
  - Blocked Model
  - Multiplexing Model
  - Forking Model
  - Process-pool Model
  - Process-pool with Multithreading
- 5.16 Thread functionality
  - Thread States
Chapter 6 Cache Memory and its Performance
- 6.1 Memory Hierarchy Overview
  - Characterizing Memory Hierarchy
- 6.2 Cache Write Strategies
- 6.3 Cache Performance
  - Split vs. Unified Caches
  - Improving Cache Performance
- Review questions
Chapter 7 Main Memory and its Performance
- 7.1 Main Memory
  - Latency Measures
  - Memory bandwidth
  - Wider main memory
  - Interleaved memory
  - Independent memory banks
- 7.2 Virtual Memory
  - Differences between Cache and Virtual Memory
  - Basic Virtual Memory Caching Questions
  - Translation Look-aside Buffer
  - Selecting Page Sizes
  - Uses of Virtual Memory
- 7.3 Effects of CPU Design on Memory Hierarchy
- 7.4 Memory Technology
- 7.5 Semiconductor RAM
  - Organization of Memory Chip
  - Static Rams
  - CMOS Cell
  - Asynchronous DRAMs
  - Synchronous DRAM
  - Double-Data-Rate SDRAM (DDR SDRAM)
  - Static Memory System
  - Dynamic Memory System
  - Memory Selection Considerations
  - Memory Controller
- 7.6 Read-Only Memory (ROM)
  - PROM
  - EPROM
  - EEPROM
  - Flash Memory
  - Flash Cards
  - Flash Drives
- 7.7 Interleaved Memory
- Summary
- Review questions
Chapter 8 I/O Devices and their Performance
- 8.1 Introduction
- 8.2 Why is I/O so Important?
- 8.3 Types of Storage Devices
  - Magnetic Disks
  - Performance
  - The Future of Disks
- 8.4 Other Storage Devices
  - Optical Disks
  - Magnetic Tapes
  - Flash Memory
- 8.5 Buses
  - Types of Buses in the System
  - Basic Bus Transactions
  - Bus Design Decisions
  - Bus Options
  - Synchronous Bus
  - Asynchronous Bus
- 8.6 Performance Metrics
  - Trading off Throughput and Response Time
  - Human-Computer Interaction
- 8.7 Queuing theory
  - Treating the I/O system as a Block Box
  - Elements of a Queuing System
  - Useful statistics
- 8.8 Bus Standards
  - Examples of Buses
  - Advantages and Disadvantages of Buses
- 8.9 I/O Interface between Storage Devices and CPU
  - Connecting the Bus to the Main Memory
- 8.10 I/O Data Transfer Methods by using Memory Bus
  - Programmed I/O (PIO): Polling
  - Interrupt-Driven I/O
  - Direct Memory Access (DMA)
- 8.11 Connecting the Bus to Cache
  - Cache and I/O: The Stale Data Problem
- 8.12 Reliability, Availability and Dependability
- 8.13 Disk arrays
  - Problems with disk arrays
  - RAID Levels
  - RAID Issues
- 8.14 Disk Performance Benchmarks
  - Transaction Processing
  - TPC-A and TPC-B
  - SPEC System-level File Server (SFS)
  - Self-scaling I/O
- Review Questions
Chapter 9 Multi-threading Architectures
- 9.1 Software and Hardware Multithreading
- 9.2 Types of Multithreading
  - Block Multithreading Concept
  - Interleaved Multithreading
  - Simultaneous Multithreading
- 9.3 Transparent Software and Hardware Multithreading
  - Extending a Multithreaded Programming Paradigm
- 9.4 Support for Extended Multithreading
  - WMU Extensions
- 9.5 Multithreading Case Study
  - Multiple Memory Ports
- 9.6 SMT and CMP Architectures
- Review questions
Chapter 10 Case Study of Multicore Architectures
- 10.1 Multi-core Processors
- 10.2 Heterogeneous Multi-Core Systems
- 10.3 Hardware Trends and Architecture
- 10.4 Software Impact
- 10.5 The Coming Wave of Multithreaded Chip Multiprocessors
  - A Flexible Heterogeneous Multi-Core Architecture
  - Intel Multi-Core Architectures
  - Redefining Performance
- 10.6 A Fundamental Theorem of Multi-Core Processors
- 10.7 Introducing Intel® Quad-Core Technology
  - Intel Core Micro-architecture
  - Intel Advanced Smart Cache
  - Intel Smart Memory Access
  - The Quad-Core Line Up
  - Beyond Quad-Core: Tera-Scale Computing
- 10.8 Making the Move to Quad-Core and Beyond
  - Transitioning the Industry to Multi-Core Processing
- 10.9 The SUN CMP Architecture
- 10.10 The Evolution of Chip Multithreading
  - Business Challenges for Information Technology Services Deployments
  - Securing the Enterprise at Speed
  - Driving Data Center Virtualization and Eco-efficiency
  - Building Out for Application Scale
  - Rule-Changing Chip Multithreading Technology
- 10.11 Chip Multiprocessing with Multi-Core Processors
  - Chip Multithreading with Cool Threads Technology
- 10.12 The IBM Cell Processor Architecture
- 10.13 PowerXCell 8i
- 10.14 Power Processor Element (PPE)
  - Xenon in Xbox 360
- 10.15 Synergistic Processing Elements (SPE)
- 10.16 Element Interconnect Bus (EIB)
  - Memory and I/O Controllers
  - Supercomputing
  - Cluster Computing
- Review Questions
Solved Problems
Important Solved Questions
Index

K A Parthasarathy, M.Tech., M.I.S.T.E. is Professor and Head of the Department of CSE and IT in Asan Memorial College of Engineering and Technology, Chengalpattu, Tamilnadu. He has over 24 years of industrial experience in leading hardware companies like IBM International, CMC ltd, and software organization like Fidelity Computers.

Comments should not be blank

Rating

Description

Table of contents

Cover
Title Page
Copyright Page
Contents
Preface
Acknowledgement
Chapter 1 Instruction Level Parallelism
- 1.1 Pipelining Recalled
  - Characteristics of a Pipeline
  - Pipeline Computers
- 1.2 Instruction Level Parallelism
- 1.3 Hardware And Software Approaches
- 1.4 Dependences
  - Data Dependence
  - Name Dependence
  - Data Hazards
  - Control Dependence
- 1.5 Dynamic Scheduling
  - Basic Out-of-order Execution
  - Scoreboarding
  - Tomasulo’s Approach
- Review questions
Chapter 2 Software and Hardware Solutions to ILP
- 2.1 Introduction
- 2.2 Extracting Parallelism with Hardware Support
  - Conditional or Predicated Instructions
  - Exceptions
  - Limitations of Conditional Instructions
  - Compiler-Directed Speculative Execution with Hardware Support
- 2.3 Performance Issues
  - Speculation in Multiple-issue CPUs
  - Speculation Mechanism: Hardware vs Software
  - Things to Remember About CPU Design
- 2.4 Compiler Support for Exploiting ILP Further
  - Loop-Level Parallelism (LLP) Analysis
  - Loop-carried Dependence Detection
  - Eliminating Dependences
  - Software Pipelining (Symbolic Loop Unrolling)
  - Global Code Scheduling
  - Trace Scheduling: Focusing on Critical Path
  - Super Blocks
- 2.5 Branch Prediction
  - Static Branch Prediction
  - Dynamic Branch Prediction
  - Hardware Dynamic Branch Prediction
- 2.6 Branch-Target Buffer
  - Structure of a Branch-Target Buffer
  - Basic Operation
  - Adding Prediction to Branch- Target Buffer
  - Branch Folding
- 2.7 Branch-Target Instruction Cache
- 2.8 Effects of Prediction on Performance
- 2.9 General Idea
- 2.10 Location of Prediction Bits
- 2.11 Accuracy of Branch Prediction
- 2.12 Limitation to the Benefits of Branch Prediction
- 2.13 The First Intel Pentium
- Review questions
Chapter 3 Multiple Instruction Issue Processors
- 3.1 Introduction
- 3.2 Superscalar
  - Hardware Requirements
  - Static Scheduling on a Superscalar Processor
  - Loop Unrolling and Scheduling on a Dual-issue Processor
  - Dynamic Scheduling on a Superscalar Processor
  - Multiple Instruction Issue with Dynamic Scheduling Example
- 3.3 Very Long Instruction Word (VLIW)
  - Superscalar vs. VLIW
  - Limits in Multiple-issue Processors
  - Limitations Specific to Superscalar or VLIW
- 3.4 Explicitly Parallel Instruction Computing (EPIC)
  - The Epic Philosophy
  - Permitting the Compiler to Play the Statistics
  - Communicating the POE to the Hardware
  - Architectural Features Supporting Epic
  - Static Scheduling
  - Addressing the Branch Problem
  - Unbundled Branches
  - Predicated Execution
  - Control Speculation
  - Predicated Code Motion
  - Addressing the Memory Problem
  - Cache Specifiers
  - Data Speculation
  - The History of EPIC
- 3.5 Advanced Compiler Support for Exposing and Exploiting ILP
- Review questions
Chapter 4 More of Multiple Instruction Issue Processors
- 4.1 Hardware Support for Exposing Parallelism
  - Conditional or Predicated Instructions
  - Exceptions
- 4.2 Hardware vs Software Speculation Mechanism
- 4.3 The Intel IA-64 Itanium Processor
  - The Itanium Processor
  - The Intel IA-64 Instruction Set Architecture
  - IA-64 Register Model
  - Instruction Formats and Support for Explicit Parallelism
  - Functional Units and Instruction Issue
  - Itanium Performance
- 4.4 Itanium
  - The Intel Itanium Architecture
  - Instruction Execution
  - Memory Architecture
  - Architectural Changes
  - Software Support
- 4.5 Limits of Instruction-Level Parallelism
  - Register Renaming
  - Alias Analysis
  - Branch Prediction
- Review questions
Chapter 5 Multiprocessors with Shared Memory Architectures
- 5.1 Introduction
- 5.2 Parallel Processing
  - Array Computers
  - Multiprocessor Systems
- 5.3 Parallel Architecture Taxonomy
  - Flynn’s Classification
- 5.4 Centralized (Symmetric) Shared Memory Architectures
- 5.5 Distributed Shared Memory Architectures
- 5.6 Communication Models and Memory Architecture
  - Shared Memory with Non-uniform Memory Access (NUMA)
  - Message Passing Multi-computers
- 5.7 Performance Metrics for Communication Mechanisms
  - Communication Bandwidth
  - Communication Latency
  - Communication Latency Hiding
- 5.8 Advantages of Communication Mechanism
  - Shared-memory Advantages
  - Advantages of Message-passing
  - Challenges of Parallel Processing
- 5.9 Cache Coherence
  - Consistency
  - Enforcing Coherence
  - Cache Coherence Protocols
  - Bus Snooping Protocols
  - Performance Differences between Write Invalidate and Write Update Protocols
  - Implementation of Write Invalidate Protocols
- 5.10 Distributed Shared Memory Architectures - Revisited
- 5.11 Directory-based Cache Coherence Protocols
  - Directory-based Coherence
  - Directory Operation
  - Uncached State
  - Shared State
  - Exclusive State
  - Performance Issues
- 5.12 Synchronization
  - Load-linked and store-conditional
  - Implementing Locks using Coherence
  - Barrier Synchronization
  - Sense-reversing Barrier
- 5.13 Memory Consistency Models
- 5.14 Introduction to Multithreading
  - Concepts and Examples
  - Process Model with Consideration of Threads
  - Benefits of Multithreading
- 5.15 Application of Threads
  - Blocked Model
  - Multiplexing Model
  - Forking Model
  - Process-pool Model
  - Process-pool with Multithreading
- 5.16 Thread functionality
  - Thread States
Chapter 6 Cache Memory and its Performance
- 6.1 Memory Hierarchy Overview
  - Characterizing Memory Hierarchy
- 6.2 Cache Write Strategies
- 6.3 Cache Performance
  - Split vs. Unified Caches
  - Improving Cache Performance
- Review questions
Chapter 7 Main Memory and its Performance
- 7.1 Main Memory
  - Latency Measures
  - Memory bandwidth
  - Wider main memory
  - Interleaved memory
  - Independent memory banks
- 7.2 Virtual Memory
  - Differences between Cache and Virtual Memory
  - Basic Virtual Memory Caching Questions
  - Translation Look-aside Buffer
  - Selecting Page Sizes
  - Uses of Virtual Memory
- 7.3 Effects of CPU Design on Memory Hierarchy
- 7.4 Memory Technology
- 7.5 Semiconductor RAM
  - Organization of Memory Chip
  - Static Rams
  - CMOS Cell
  - Asynchronous DRAMs
  - Synchronous DRAM
  - Double-Data-Rate SDRAM (DDR SDRAM)
  - Static Memory System
  - Dynamic Memory System
  - Memory Selection Considerations
  - Memory Controller
- 7.6 Read-Only Memory (ROM)
  - PROM
  - EPROM
  - EEPROM
  - Flash Memory
  - Flash Cards
  - Flash Drives
- 7.7 Interleaved Memory
- Summary
- Review questions
Chapter 8 I/O Devices and their Performance
- 8.1 Introduction
- 8.2 Why is I/O so Important?
- 8.3 Types of Storage Devices
  - Magnetic Disks
  - Performance
  - The Future of Disks
- 8.4 Other Storage Devices
  - Optical Disks
  - Magnetic Tapes
  - Flash Memory
- 8.5 Buses
  - Types of Buses in the System
  - Basic Bus Transactions
  - Bus Design Decisions
  - Bus Options
  - Synchronous Bus
  - Asynchronous Bus
- 8.6 Performance Metrics
  - Trading off Throughput and Response Time
  - Human-Computer Interaction
- 8.7 Queuing theory
  - Treating the I/O system as a Block Box
  - Elements of a Queuing System
  - Useful statistics
- 8.8 Bus Standards
  - Examples of Buses
  - Advantages and Disadvantages of Buses
- 8.9 I/O Interface between Storage Devices and CPU
  - Connecting the Bus to the Main Memory
- 8.10 I/O Data Transfer Methods by using Memory Bus
  - Programmed I/O (PIO): Polling
  - Interrupt-Driven I/O
  - Direct Memory Access (DMA)
- 8.11 Connecting the Bus to Cache
  - Cache and I/O: The Stale Data Problem
- 8.12 Reliability, Availability and Dependability
- 8.13 Disk arrays
  - Problems with disk arrays
  - RAID Levels
  - RAID Issues
- 8.14 Disk Performance Benchmarks
  - Transaction Processing
  - TPC-A and TPC-B
  - SPEC System-level File Server (SFS)
  - Self-scaling I/O
- Review Questions
Chapter 9 Multi-threading Architectures
- 9.1 Software and Hardware Multithreading
- 9.2 Types of Multithreading
  - Block Multithreading Concept
  - Interleaved Multithreading
  - Simultaneous Multithreading
- 9.3 Transparent Software and Hardware Multithreading
  - Extending a Multithreaded Programming Paradigm
- 9.4 Support for Extended Multithreading
  - WMU Extensions
- 9.5 Multithreading Case Study
  - Multiple Memory Ports
- 9.6 SMT and CMP Architectures
- Review questions
Chapter 10 Case Study of Multicore Architectures
- 10.1 Multi-core Processors
- 10.2 Heterogeneous Multi-Core Systems
- 10.3 Hardware Trends and Architecture
- 10.4 Software Impact
- 10.5 The Coming Wave of Multithreaded Chip Multiprocessors
  - A Flexible Heterogeneous Multi-Core Architecture
  - Intel Multi-Core Architectures
  - Redefining Performance
- 10.6 A Fundamental Theorem of Multi-Core Processors
- 10.7 Introducing Intel® Quad-Core Technology
  - Intel Core Micro-architecture
  - Intel Advanced Smart Cache
  - Intel Smart Memory Access
  - The Quad-Core Line Up
  - Beyond Quad-Core: Tera-Scale Computing
- 10.8 Making the Move to Quad-Core and Beyond
  - Transitioning the Industry to Multi-Core Processing
- 10.9 The SUN CMP Architecture
- 10.10 The Evolution of Chip Multithreading
  - Business Challenges for Information Technology Services Deployments
  - Securing the Enterprise at Speed
  - Driving Data Center Virtualization and Eco-efficiency
  - Building Out for Application Scale
  - Rule-Changing Chip Multithreading Technology
- 10.11 Chip Multiprocessing with Multi-Core Processors
  - Chip Multithreading with Cool Threads Technology
- 10.12 The IBM Cell Processor Architecture
- 10.13 PowerXCell 8i
- 10.14 Power Processor Element (PPE)
  - Xenon in Xbox 360
- 10.15 Synergistic Processing Elements (SPE)
- 10.16 Element Interconnect Bus (EIB)
  - Memory and I/O Controllers
  - Supercomputing
  - Cluster Computing
- Review Questions
Solved Problems
Important Solved Questions
Index

Biographical note

User Reviews

Comments should not be blank

Rating