Sunday, September 26, 2010

SEMINAR REPORT Intel® Core™ i7 Processor

SEMINAR REPORT
Intel® Core™ i7 Processor




Contents

INTRODUCTION
Nehalem Architecture
SOCKET
Features and Benefits
1. Quad-Core Processing
2. Intel Hyper-Threading Technology
3. Intel Turbo Boost Technology
4. 8 MB Intel Smart Cache
5. Intel QuickPath Interconnect
6. Integrated Memory Controller
7. Intel HD Boost
8. Digital Thermal Sensor (DTS)
9. Intel Wide Dynamic Execution Access
INSTRUCTION SET
ADVANTAGES
APPLICATIONS
CONCLUSION
REFERENCE


ABSTRACT
With faster, intelligent multi-core technology that applies processing power dynamically when needed most, the new Intel® Core™ i7 processors deliver an incredible breakthrough in PC performance. They’re the best desktop processors on the planet.
Multitask applications faster and unleash incredible digital media creation. Experience maximum performance for everything you do, thanks to the combination of Intel® Turbo Boost Technology and Intel® Hyper-Threading Technology , which maximizes performance to match your workload.
Whether you’re casually checking e-mail and surfing the Web or multitasking compute-intensive applications such as HD
Video encoding, you want a processor that enables maximum PC performance. With the Intel Core i7 processor, you’ll get just that. An unprecedented four-core, eight-thread design with Intel Hyper-Threading Technology ensures incredible performance, no matter what your computing needs. And with more than double the memory bandwidth for faster memory access, you’ll achieve more while waiting less
It’s time for digital content creation that’s limited only by your imagination. Experience total creative freedom with the power to encode video up to 40% faster. And enjoy incredible performance on other multimedia tasks like image rendering, photo retouching, and editing.
The Intel® Core™ i7 processor provides new levels of brilliant performance for highly threaded immersive games. By distributing All , physics, and rendering across eight software threads, the Intel Core i7 processor lets you concentrate on taking down the bad guys while your PC handles all the visual details such as texturing and shading that keep you feeling totally immersed. It’s a gaming experience so perfect, you just might lose yourself in the action.


INTRODUCTION
Intel Unveils All New 2010 Intel Core Processor Family Mainstream processors now offer Intel(R) Turbo Boost Technology , automatically adapting to an individual's performance needs First 32 nanometer processors and first time Intel is mass-producing a variety of chips at mainstream prices at start of new manufacturing process, reflecting last year's $7 billion investment during economic recession Intel(R) Core(TM) i5 processors are about twice as fast as comparable existing PCs for visibly faster video, photo and music downloading experience Historic milestone: select processors integrate graphics directly on processors; also include Intel's second generation high-k metal gate transistors Beyond laptops and PCs, processors also target ATMs, travel kiosks, digital displays
Intel Corporation introduced its all new 2010 Intel(R) Core(TM) family of processors today, delivering unprecedented integration and smart performance, including Intel(R) Turbo Boost Technology for laptops, desktops and embedded devices. The introduction of new Intel(R) Core(TM) i7, i5 and i3 chips coincides with the arrival of Intel's groundbreaking new 32 nanometer (nm) manufacturing price- which for the first time in the company's history - will be used to immediately produce and deliver processors and features at a variety of price points, and integrate high-definition graphics inside the processor. This unprecedented ramp and innovation reflects Intel's $7 billion investment announced early last yearn the midst of a major global economic recession. Intel is unveiling several platform products, including more than 25 processors, wireless adapters and chipsets, including new Intel Core i7, i5 and i3 processors, Intel(R) 5 Series Chipsets, and Intel(R) Centrino(R) Wi-Fi and WiMAX adapters that include new Intel(R) My WiFi features (see charts below). More than 400 laptop and desktop PC platform designs are expected from computer makers based on these products, with another 200 expected for embedded devices.
New 2010 Intel Core processors are manufactured on the company's 32nm process, which includes Intel's second-generation high-k metal gate transistors. This technique, along with other advances, helps increase a computer's speed while decreasing energy consumption.
For the first time, there's a new family of Intel processors with the industry's most advanced technology available immediately at virtually every PC price point
said Sean Maloney, executive vice president and general manager of the Intel Architecture Group. "These smart processors adapt to an individual's needs, automatically providing a 'boost' of performance for everyday applications. They become energy efficient to the point of shutting down processing cores or
reducing power consumption to provide performance when people need it, and energy efficient when they don't." Speed Meets Intelligence Based on Intel's award-winning "Nehalem" micro architecture, these new desktop, mobile and embedded processors deliver smart performance for music, gaming, videos, movies, photos, social networking and other demanding mainstream applications. In addition, ultra-thin laptops with all new 2010 Intel Core processors inside provide a balance of performance, style and long battery life for sleek systems less than an inch thick.
New Intel Core i7 and Core i5 processors also feature exclusive Intel Turbo Boost Technology for adaptive performance, and thus smarter computing. Intel
Turbo Boost Technology automatically accelerates performance, adjusting to the workload to give users an immediate performance boost when needed. In
(R) Hyper-Threading Technology , available in Intel Core i7, Core i5 and Core i3 processors, enables smart multi-tasking by allowing each processing core t
run multiple "threads," providing amazing responsiveness and great performance, balanced with industry-leading energy efficiency when processing several tasks simultaneously.
Supporting the all new 2010 Intel Core(TM) processors, the Intel 5 Series Chipset is the company's first single-chip chipset solution, evolving from simply connecting components to providing a range of platform innovation and capabilities. The Intel Core family also has power-saving techniques like one Intel can “hurry up and get idle" or "HUGI," which enable processors to finish tasks quickly, while preserving battery life.
The all new 2010 Intel(R) Core(TM) processor family is the first to integrate graphics into mainstream PC processors. With Intel(R) HD Graphics, the processors deliver stunning visuals and smooth high-definition (HD) video playback. It's also the industry's first integrated solution to deliver multi-channel Dolby* TrueHD and DTS* Premium Suite home theater audio. In addition, Intel HD Graphics support mainstream and casual 3-D gaming without the need for add-in video card, and offer full support for the new Microsoft Windows* 7 operating system.
Another intuitive feature available to mainstream notebook buyers includes Intel Switchable Graphics, which allows users who play very graphics-intense games to automatically switch between Intel's integrated graphics to a discrete version on the fly, without having to re-boot, for optimal battery life and performance.


Nehalem Architecture
Intel is now shipping microprocessors using their new architecture codenamed “Nehalem” as a successor to the Core architecture. This design uses multiple cores like its predecessor, but claims to improve the utilization and communication between the individual cores. This is primarily accomplished through better memory management and cache organization. Some benchmarking and research has been performed on the Nehalem architecture to analyze the cache and memory improve-mints. In this paper I take a closer look at these studies to determine if the performance gains are significant.
The predecessor to Nehalem, Intel’s Core architecture ,made
use of multiple cores on a single die to improve performance
over traditional single-core architectures. But as more cores
and processors were added to a high-performance system,
some serious weaknesses and bandwidth bottlenecks began to appear. After the initial generation of dual-core Core processors,
Intel began a Core 2 series processor which was not much more than using two or more pairs of dual-core dies. The cores communicated via system memory which caused large delays due to limited bandwidth on the processor bus Adding more cores increased the burden on the processor and memory buses, which diminished the performance gains that could be possible with more cores. The new Nehalem architecture sought to improve core-to-core communication by establishing a point-to-point topology in which microprocessor cores can communicate directly with one another and have more direct access to system memory.

The approach to the Nehalem architecture is more modular than the Core architecture which makes it much more flexible and customizable to the application. The architecture really only consists of a few basic building blocks. The main blocks are a microprocessor core (with its own L2 cache), a shared
L3 cache, a Quick Path Interconnect (QPI) bus controller, an integrated memory controller (IMC), and graphics core. With this flexible architecture, the blocks can be configured to meet what the market demands. For example, the Bloom-field model, which is intended for a performance desktop application, has four cores, an L3 cache, one memory controller and one QPI bus controller. Server microprocessors like the architecture the reorder buffer has been greatly increased to allow more instructions to be ready for immediate execution.


Instruction Set
Intelalsoaddedsevennewinstructionstotheinstructionset.Thesearesingleinstruction,multiple-data(SIMD)instructions that take advantage of data-level parallelism for today’s data-intensive applications (like multimedia). Intel refers to the new instructions as Applications Targeted Accelerators (ATA) due to their specialized nature. For example, a few instructions are used explicitly for efficient text processing such as XML parsing. Another instruction is used just for calculating check-sums.
Power Management
For past architectures Intel has used a single power management circuit to adjust voltage and clock frequencies even on a die with multiple cores. With many cores, this strategy becomes wasteful because the load across cores is rarely uni-form. Looking forward to a more scalable power management strategy, Intel engineers decided to put yet another processing unit on the die called the Power Control Unit (PCU).
Out-of-order execution
Out-of-order execution also greatly increases the performance of the Nehalem architecture. This feature allows the processor to fill pipeline stalls with useful instructions so the pipeline efficiency is maximized. Out-of-order execution was present in the Core architecture, but in the Nehalem architecture the reorder buffer has been greatly increased to allow more instructions to be ready for immediate execution


SOCKET
Known as the LGA 1366 or Socket B Contact points
Successor to the LGA 775 and completely incompatible
I7 is the first to use the LGA 1366


Features and Benefits


1.Quad-Core Processing

Introducing Intel® Quad-Core Technology The next milestone in multi-core processor design and performance will be Intel’s unveiling of the industry’s first quad-core processors for desktops, workstations and volume servers. Intel is the only company with the manufacturing resources to take this next step so quickly. Intel’s implementation of quad-core takes advantage of our rich history of engineering expertise, along with our industry-leading manufacturing technologies and capabilities. This translates into excellent volume pricing and consistent supply. The industry will be able to make a fast transition as well—these quad-core processors are designed to plug into current motherboards meeting the proper thermal and electrical specifications.

Spurred by increasing globalization, growing device intelligence, and the explosion of digital data, Intel believes the next decade’s applications will be much more computationally intensive than anything we’ve seen to date. This will be the “tera era”—an age when people need teraflops (a trillion floating point operations second) of computing power, terabits (a trillion bits per second of communications bandwidth), and terabytes (1,024 gigabytes) of data storage to handle the information all around them. With the tera era in mind, Intel researchers are today working to shape future Intel microprocessors through the Intel® Tera-scale Computing Research Program. Intel has over 100 R&D projects worldwide in the tera-scale area. Our researchers are addressing
the hardware and software challenges of building and programming systems with dozens (even hundreds) of energy-efficient cores with sophisticated memory hierarchies to deliver the performance and capabilities needed by these systems.
Features and Benefits
Provides four complete execution cores in a single processor with up to 8 MB of L2 cache and up to a 1333 MHz Front Side Bus. Four dedicated, physical threads help operating systems and applications deliver additional performance, so end users can experience better multi-tasking and multi threaded performance across many types of applications and work loads.



2.Intel Hyper-Threading Technology

Hyper-Threading Technology, a feature of Intel Xeon Hand Intel Pentium 4 processors AND NOW ON CORE i7, makes a single physical processor appear as two logical processors to the operating system. Hyper-Threading duplicates the architectural state on each processor, while sharing one set of execution resources. This duplication allows a single physical processor to execute instructions from different threads in parallel rather than in serial, potentially leading to better processor utilization and overall performance. However, sharing system resources, such as cache or memory bus, may degrade system performance. Previous studies have shown that Hyper-Threading can improve the performance of some applications, but not all. Performance gains may vary depending on the cluster configuration, such as communication fabric or cache size, and on the applications running on the cluster. In high-performance computing (HPC) clusters, software developers often use standard message-passing systems such as Message Passing Interface (MPI) or Parallel Virtual
Machine (PVM) to achieve parallelism in applications. For optimal performance, in most cases the number of processes spawned is equal to the number of processors in the cluster. Therefore, parallelized applications can benefit from Hyper-Threading, because doubling the number of processors means the number of processes spawned is doubled, allowing parallel tasks to execute
faster. Applying Hyper-Threading—and thus doubling the processes that simultaneously run on the cluster—also increases the utilization rate of the processors’ execution resources. Although performance may improve,
doubling simultaneous processes may introduce overhead in the following ways:

• Cache access: Logical processes of the same physical CPU may compete for access to the caches, which potentially generates more cache-miss situations.

• Memory contention: More processes running on the same compute node may increase memory contention if the processes access the memory bus or communicate through shared memory simultaneously.

• Communication traffic: More processes on each node increase the message passing within and between nodes,
which can oversubscribe the communication capacity of the
shared memory, the I/O bus, or the interconnect networking, and thus create performance bottlenecks.

Delivers two processing threads per physical core for a total of eight threads for massive computational throughput. With Intel® Hyper-Threading Technology, highly threaded applications can get more work done in parallel, completing tasks sooner. With more threads available to the operating system, multitasking becomes even easier. This amazing processor can handle multiple applications working simultaneously, allowing you to do more with less wait time.


3.Intel Turbo Boost Technology

Dynamically increases the processor’s frequency as needed by taking advantage of thermal and power head-room when operating below specified limits. Get more performance automatically, when you need it the most.
Intel® Core™ Micro architecture (Nehalem) based processors incorporate a new feature: Intel® Turbo Boost technology. Under some configurations and workloads, Intel® Turbo Boost technology enables higher performance through the availability of increased core frequency. Intel® Turbo Boost technology automatically allows processor cores to run faster than the base operating frequency if the processor is operating below rated power, temperature, and current specification limits. Intel® Turbo Boost technology can be engaged with any number of cores or logical processors enabled and active. This result in increased performance of both multi-threaded and single-threaded workloads. It is possible for BIOS to contain a set-up option tenable or disable Intel® Turbo Boost technology and it operates under operating system (OS) control by engaging when the OS requests the highest performance state (P0). For ACPI aware operating systems, no changes are required to support Intel® Turbo Boost technology. The maximum frequency is dependent on the number
of active cores and varies based on the specific configuration on a per processor number basis. The amount of time the processor spends in the
Intel® Turbo Boost technology state will depend on workload and operating environment.

Intel® Turbo Boost technology is available only on supported processor versions. With Intel® Turbo Boost technology, the processor is capable of
maximizing core frequency while ensuring that it does not exceed its electrical and thermal the number of active cores. When temperature, power or current exceed factory configured limits and you are above the base operating frequency, the processor automatically steps down core frequency (-133.33 MHz) in order to reduce temperature, power and current. The processor
then monitors temperature, power, and current and continuously re-evaluates.
Note: When Intel® Turbo Boost technology is requested by the OS, the processor will commonly operate between the max Intel® Turbo Boost technology frequency and the base operating frequency. All active cores in the processor will operate at the same frequency. Even at frequencies above
the base operating frequency, all active cores will run at the same frequency and voltage. Due to the way the BIOS and OS communicate Intel® Turbo Boost technology, software may never detect core clock frequencies above the base
operating frequency. This is not reflective of actual core frequency. This means workloads that are naturally lower in power or lightly threaded may take advantage of headroom in the form of increased core frequency. Continual measurements of temperature, current draw, and power consumption are used to dynamically assess headroom.

Intel® Turbo Boost technology core frequency upside availability is ultimately constrained by power delivery limits, but within those constraints, it is limited by the following factor
• The estimated current consumption of the processor
• The estimated power consumption of the processor
• The temperature of the processor

8 MB Intel Smart Cache

This large last-level cache enables dynamic and efficient allocation of shared cache to all four cores to match the needs of various applications for ultra efficient data storage and manipulation. In the Core architecture, each pair of cores shared an L3 cache. This allowed the two cores to communicate efficiently with each other, but as more cores were added it proved difficult to implement efficient communication with more pairs
• Exclusive - The cache line is only present in the current cache and matches main memory (clean).
• Shared - The cache line is clean similar to the exclusive state, but the data has been read and may exist in another cache. This other cache should be updated somehow if the line changes.
• Invalid - The cache line is invalid.
• Forward - This cache line is designated as the responder to update all caches who are sharing this line. With the extra “Forward” state, the excessive responding among shared cache lines is eliminated.

Intel Advanced Smart Cache. The shared L2 cache is dynamically allocated to each processor core based on workload. This efficient, dual-core optimized implementation increases the probability that each core can access data from fast L2 cache, significantly reducing latency to frequently used data and improving performance.


5.Intel QuickPath Interconnect
Intel QuickPath Architecture is a platform architecture that provides high-speed connections between microprocessors and external memory, and between microprocessors and the I/O hub. One of its biggest changes will be the implementation of scalable shared memory. Instead of using a single shared pool of memory connected to all the processors through FSBs and memory controller hubs, each processor will have its own dedicated memory that it accesses directly through an Integrated Memory Controller. In cases where a processor needs to access the dedicated memory of another processor, it can do so through a high-speed Intel QuickPath Interconnect that links all the processors. A big advantage of the Intel QuickPath Interconnect is that it is point-to-point. There is no single bus that all the processors must use and contend with each other to reach memory and I/O. It also improves scalability, eliminating the competition between processors for bus bandwidth. Coupled with Intel’s great cache memory, this technological achievement will enable the performance of servers and workstations to take another leap forward.
Intel QuickPath Architecture is not Intel’s first implementation of scalable shared memory—a computer memory design with a single physical address space, but in which various parts of memory are faster to access than other parts. Intel has used it in other platforms, such as Intel® 8870 chipset-based servers. Nor is it the irst time Intel has used an integrated memory controller. Next generation micro architecture-based platforms will simply be the first to bring both scalable shared memory and integrated memory controllers together. With each processor having its own memory controller and dedicated memory, the local memory will always be the fastest to access. In other cases, when an instruction or data is located in another microprocessor’s dedicated memory, the memory will take longer to access. But not much—Intel QuickPath Interconnect is extremely fast. Thanks to the pioneering work done by companies such as Sequent* (IBM*) and others, scalable shared memory has been used in the high-end server space for years and most modern operating systems (OSes) are optimized for it. This means they schedule processes and allocate memory to take advantage of local physical memory and improve execution performance. Most virtualization software is also written to take advantage of scalable shared memory, pinning a virtual machine to a speciic execution microprocessor and its dedicated memory.
Intel® QuickPath Interconnect Advantages
Intel QuickPath Interconnect will deliver leading microprocessor interconnect bandwidth and RAS for Intel’s next generation of server and workstation platforms. Key advantages include:
• Best interconnect performance in the mainstream server/workstation segment. Intel QuickPath Interconnect uses up to 6.4 Giga transfers/second links, delivering bandwidth up to 25 Gigabytes/second (GB/s) of total bandwidth—up to 300 percent greater than other interconnect solutions used today. (Gig transfer refers to the number of data transfers.)
• Efficient architecture improves interconnect performance. Intel QuickPath Interconnect reduces the amount of communication required in the interface of multi-processor systems to deliver faster payloads. The dense packet and lane structure allow more data transfers in less time, improving overall system performance.
• Tightly integrated RAS features for high reliability. Implicit Cyclic Redundancy Check (CRC) with link-level retry ensures data quality and performance by providing CRC without the performance penalty of additional cycles. The link level retry retransmits data to make certain the transmission is completed without loss of data integrity. For advanced servers which require the highest level of RAS features, some processors include additional features including the following: self-healing links that avoid persistent errors by re-configuring themselves to use the good parts of the link; clock fail-over to automatically re-route clock function to a data lane in the event of clock-pin failure; and hot-plug capability to enable hot-plugging of nodes, such as processor cards.

6.Integrated Memory Controller
An integrated memory controller with three channels of DDR3 1066 MHz offers memory performance up to 25.6 GB/s. Combined with the processor’s efficient perfecting algorithms, this memory controller’s lower latency and higher memory bandwidth delivers amazing performance for data-intensive applications.

Integrated Memory Controller Advantages
The Integrated Memory Controller is specially designed for servers and high-end clients to take full advantage of the Intel QuickPath Architecture with its scalable shared memory architecture. The independent high-bandwidth,
low-latency memory controllers are paired with the high-bandwidth, low-latency Intel QuickPath Interconnects enabling fast, eficient access to remote memory controllers. The Integrated Memory Controller has the signiicant advantage of being coupled with large high-performance caches. This relieves pressure on the memory subsystem and lowers overall latency. The Integrated Memory Controller also continues Intel’s legacy of best-in-class scalability and RAS features, plus of course takes advantage of next generation Intel® micro architecture (Nehalem and Tukwila) and Hi-k 45nm process technology.
Key advantages include:
• Integrating the memory controller into the silicon die improves memory access latency (reduces communication delays in transferring data to and from memory) compared to traditional memory access through dedicated bus interface.
• Available memory bandwidth scales with the number of processors added.
• Full support for leading memory technologies allowing solutions optimized for different market segments.

7.Intel HD Boost
Includes the full SSE4 instruction set, significantly improving a broad range of multimedia and compute-intensive applications. The 128-bit SSE instructions are issued at a throughput rate of one per clock cycle allowing a new level of processing efficiency with SSE4-optimized applications.
The Intel Advanced Digital Media Boost is a feature that significantly improves performance when executing Streaming SMD Extension (SSE) instructions. 128-bit SIMD integer arithmetic and 128-bit SIMD double-precision floating-point operations reduce the overall number of instructions required to execute a particular program task, and as a result can contribute to an overall performance increase. They accelerate a broad range of applications, including video, speech and image, photo processing, encryption, financial, engineering and scientific applications.
Intel Advanced Digital Media Boost helps achieve similar dramatic gains in throughputs for programs utilizing SSE instructions of128-bit operands. (SSE instructions enhance Intel architecture by enabling programmers to develop algorithms that can mix packed, single-precision, and double-precision floating point and integers, using SSE instructions.) These throughput gains come from combining a 128-bit-wide internal data path with Intel Wide Dynamic Execution and matching widths and throughputs in the relevant caches. Intel Advanced Digital Media Boost enables most128-bit instructions to be dispatched at a throughput rate of one per clock cycle, effectively doubling the speed of execution and resulting in peak floating point performance of 24 GFlops (on each core, single precision, at 3 GHz frequency). Intel Advanced Digital Media Boost is particularly useful when running many important multi-media operations involving graphics, video, and audio, and processing other rich data sets that use SSE, SSE2, and SSE3 instructions.
8.Digital Thermal Sensor (DTS)
Provides for more efficient processor and platform thermal control improving system acoustics. The DTS continuously measures the temperature at each processing core. The ability to continuously measure and detect variations in processor temperature enables system fans to spin only as fast as needed to cool the system. The combination of these technologies can result in significantly lower noise emissions from the PC.
9.Intel Wide Dynamic Execution Access
Improves execution speed and efficiency, delivering more instructions per clock cycle. Each core can complete up to four full instructions simultaneously.
Dynamic execution is a combination of technique(data flow analysis, speculative execution, out of order execution, and super scalar) that Intel first implemented in the P6 micro architecture used in the Pentium Pro processor, Pentium II processor, and Pentium III processors. For Intel Net Burst micro architecture, Intel introduced its Advanced Dynamic Execution engine, a very deep, out-of order speculative execution enginedesigned to keep the processor’s execution units executing instructions. It also featured an enhanced branch-prediction algorithm to reduce the number of branch miss predictions.
Now with the Intel Core micro architecture, Intel significantly enhances this capability with Intel Wide Dynamic Execution. It enables delivery of more instructions per clock cycle to improve execution time and energy efficiency. Every execution core is wider, allowing each core to fetch, dispatch execute, and return up to four full instructions simultaneously. (Intel’s Mobile and Intel Net Burst micro architectures could handle three instruction set a time.) Further efficiencies include more accurate branch prediction, deeper instruction buffers for greater execution flexibility, and additional features to reduce execution time. One such feature for reducing execution time is micro fusion. In previous generation processors, each incoming instruction was individually decoded and executed. Macro fusion enables common instruction pairs (such as a compare followed by a conditional jump) to be combined into a single internal instruction (micro-op) during decoding. Two program instructions can then be executed as one micro-op, reducing the overall amount of work the processor has to do. This increases the overall number of instructions that can be run within any given period of time or reduces the amount of time to run a set number of instructions. By doing more in less time, macro fusion improves overall performance and energy efficiency.
The Intel Core micro architecture also includes an enhanced Arithmetic Logic Unit (ALU) to further facilitate macro fusion. Its single cycle execution of combined instruction pairs results in increased performance for less power. The Intel Core micro architecture also enhances micro-op fusion—an energy-saving technique Intel first used in the Pentium M processor. In modern mainstream processors, x86 program instructions (macro-ops) are broken down into small pieces, called micro-ops, before being sent down the processor pipeline to be processed. Micro-op fusion “fuses “micro-ops derived from the same macro-op to reduce the number of micro-ops that need to be executed. Reduction in the number of micro-ops results in more efficient scheduling and better performance at lower power. Studies have shown that micro-op fusion can reduce the number of micro-ops handled by the out-of-order logic by more than ten percent. With the Intel Core micro architecture the number of micro-ops that can be fused internally within the processor is extended.





Figure.: With the Intel Wide Dynamic Execution of the Intel Core micro architecture, every execution core in a multi-core processors wider. This allows each core to fetch, dispatch, execute, and return up to four full instructions simultaneously. A single multi-core processor with four cores could fetch, dispatch, execute, and return up to 16 instructions simultaneously.


INSTRUCTION SET

Developers know that by increasing the number of instructions processed concurrently, they can reduce the amount of time that an application will spend on code requiring many processor cycles to process data. Intel has long encouraged such coding practices to help increase overall processor throughput. Early on, Intel began a proactive program to improve application performance on Intel processors by developing special instruction sets. Early examples include the floating point (FP)instruction set extensions defined in the 8086 chip. More recent examples include Single Instruction, Multiple Data (SIMD)and Intel MMX ™ technology. SIMD was a technique employed by Intel to achieve increased parallelism in the P5 micro architecture through the use of special instructions that operated on multiple pieces of data simultaneously. Using Intel MMX technology instruction set, programmers had the ability to execute instructions on multiple data elements loaded into MMX technology registers that would deliver increased performance in media applications such as graphics, gaming, streaming video, and more. In the P6 micro architecture, Intel introduced Streaming SIMD Extensions (SSE). Designed for the Intel Pentium III processor, SSE extended MMX technology and allowed SIMD computations to be performed on four packed single-precision FP data elements simultaneously using 128-bit registers (namedXMM0-XMM7). With the Intel NetBurst® micro architecture(Intel®Pentium®4 processor), Intel introduced SSE2 to extend SSE (and MMX). SSE2 provided the ability to perform more computations in parallel by extending those instructions introduced in MMX technology and SSE, and enabling supportof128-bit integer and packed double-precision FP data types. In all, SSE2 added 144 instructions that delivered performance increases across a broad range of applications. For instance, SSE2 instructions gave software developers maximum flexibility in implementing algorithms and providing performance enhancements to software such asMPEG-2video, MP3, 3D graphics, and more.
The launch of the 90 nm process-based Pentium 4 process saw the introduction of SSE3. SSE3 includes 13 addition SIMD instructions over SSE2 that are primarily designed to improve thread synchronization and x87-FP math capabilities. A further advancement, Supplemental SSE3, is now available in Intel Core micro architecture. Included in Intel®Xeon®5100 processors (server and workstation) and theIntel Core 2 Duo processors (notebook and desktop) processors, Supplemental SSE3 adds 32 new opcodes—including align and multiply-add—for yet greater performance.
Overview of SSE4 for Intel Architecture
SSE4 is Intel’s largest ISA extension in terms of scope impact since SSE2. SSE4 has several compiler vectoriza primitives for even greater and more efficient media performance, as well as new and innovative string process instructions. Beginning with the 45 nm Intel micro architecture based processors (codenamed Penryn) slated for producing 2007, these new instructions will start to appear in of the volume market segments, including desktop, mobile and server. Intel has worked closely with industry partners including independent software vendors (ISVs) and operating system vendors (OSVs) to develop SSE4 as a new instruction standard. We have translated a wide range of ISV needs the best set of instructions for optimizing the unique capabilities, performance, and power-efficiency benefits of In micro architecture for their software.
SSE4 will offer dozens of new innovative instructions in two major categories:
•SSE4 Vectorizing Compiler and Media Accelerators
•SSE4 Efficient Accelerated String and Text Processing
Supported Instructions Sets:
MMX
Extended MMX
3DNow!,
Extended 3DNow!
SSE
SSE2
SSE3
SSE4
X86-64
ADVANTAGES
Mainstream processors now offer Intel(R) Turbo Boost Technology , automatically adapting to an individual's performance needs
First 32 nanometer processors and first time Intel is mass-producing a variety of chips at mainstream prices at start of new manufacturing process,
reflecting last year's $7 billion investment during economic recession
Intel(R) Core(TM) i5 processors are about twice as fast as comparable existing PCs for visibly faster video, photo and music downloading experience
Historic milestone: select processors integrate graphics directly on processors; also include Intel's second generation high-k metal gate transistors Beyond laptops and PCs, processors also target ATMs, travel kiosks, digital displays More than 10 new chipsets and new 802.11n WiFi and WiMAX products with new Intel(R) My WiFi features Intel Corporation introduced its all new 2010 Intel(R) Core(TM) family of processors today, delivering unprecedented integration and smart performance,
including Intel(R) Turbo Boost Technology for laptops, desktops and embedded devices. The introduction of new Intel(R) Core(TM) i7, i5 and i3 chips coincides with the arrival of Intel's groundbreaking new 32 nanometer (nm) manufacturing process- which for the first time in the company's history - will be used to immediately produce and deliver processors and features at a variety of price points, and integrate high-definition graphics inside the processor. This unprecedented ramp and innovation reflects Intel's $7 billion investment announced early last yeain the midst of a major global economic recession.
Intel is unveiling several platform products, including more than 25 processors, wireless adapters and chipsets, including new Intel Core i7, i5 and i3
processors, Intel(R) 5 Series Chipsets, and Intel(R) Centrino(R) Wi-Fi and WiMAX adapters that include new Intel(R) My WiFi features (see charts below). More than 400 laptop and desktop PC platform designs are expected from computer makers based on these products, with another 200 expected for embedded devices.
New 2010 Intel Core processors are manufactured on the company's 32nm process, which includes Intel's second-generation high-k metal gate transistors. This technique, along with other advances, helps increase a computer's speed while decreasing energy consumption.
There’s a new family of Intel processors with the industry's most advanced technology available immediately at virtually every PC price point Based on Intel's award-winning "Nehalem" micro architecture, these new desktop, mobile and embedded processors deliver smart performance for music,
gaming, videos, movies, photos, social networking and other demanding mainstream applications. In addition, ultra-thin laptops with all new 2010 Intel Core processors inside provide a balance of performance, style and long battery life for sleek systems less than an inch thick.

APPLICATIONS


One immediate benefit of core i7 processors is how they improve an operating system’s ability to multitask applications. For instance, say you have a virus scan running in the background while you’ reworking on your word-processing application. This often degrades responsiveness so much that when you strike a key, there can be delay before the letter actually appears on the screen. On multi-core processors, the operating system can schedule the tasks in different cores so that each task runs at full performance. Another major multi-core benefit comes from individual applications optimized for multi-core processors. These applications, when properly programmed, can split a task into multiple smaller tasks and run them in separate threads. For instance, a word processor can have “find and replace” run as a separate thread so doing a “find and replace” on a big document doesn’t have to keep you from continuing to write or edit. In a game, a graphics algorithm needing extensive processing power could be one thread, rendering the next scene on the fly, while another thread responds to your commands for character’s movements. The critical element in multi-core computing is the software. The throughput, energy efficiency, and multitasking performance of multi-core processors will all be more fully realized when application code is threaded and multi-core ready. Intel provides extensive partner programs with software developers, operating system vendors, ISVs, and academia to accelerate the delivery of dual-core and quad-core products. Intel has recently updated the Intel® Threading Building Blocks, Intel® Thread Profiler, and Intel® Thread Checker tools to support quad-core products.

• High end gaming

• Ultimate video experience

• Best multimedia editing and rendering

• High end multitasking

• High speed server applications

• Digital media creation






CONCLUSION

A Smarter Way to Work and Play Whether you’re casually checking e-mail and surfing the Web or multitasking compute-intensive applications such as HD
video encoding, you want a processor that enables maximum PC performance. With the Intel Core i7 processor, you’ll get just that. An unprecedented four-core, eight-thread design with Intel Hyper-Threading Technology ensures incredible performance, no matter what your computing needs. And with more than double the memory bandwidth for faster memory access , you’ll achieve more while waiting less. Shatter Your Limits It’s time for digital content creation that’s limited only by your imagination. Experience total creative freedom with the power to encode video up to 40% faster. And enjoy incredible performance on other multimedia tasks like image rendering, photo retouching, and editing.

REFERENCE

Pin set:Intel Core i7 Extreme Edition and Intel Core i7 Processor and LGA1366 Socket
http://download.intel.com/design/processor/designex/320837.pdf
Intel Overclock:
http://www.engadget.com/2008/12/03/intels-core-i7-extreme-edition-965-overclocked-to-5-5ghz/
Intel Nehalem Arch
http://download.intel.com/pressroom/kits/events/idffall_2008/SSmith_briefing_roadmap.pdf
2 vs 3 chip solution
http://www.neoseeker.com/Articles/Hardware/Reviews/intel_32nm/4.html
Intel i7 homepage
http://www.intel.com/products/processor/corei7/index.htm
Intel Quickpath
http://www.intel.com/technology/quickpath/introduction.pdf
Intel Turboboost
http://www.intel.com/technology/turboboost/index.htm
Nehalem Arch
http://www.intel.com/technology/architecture-silicon/next-
gen/whitepaper.pdf





By Arunlal For more mail me arunalc@gmail.com or visit luttusworld.blogspot.com