Execution unit - CompWisdom
About us  |  Why use us?  |  Press  |  Contact us

 

Topic: Execution unit



  
 BYTE.com
The other execution units in the 604 are an IEEE 754 compatible FPU, a load/store unit that moves data between registers and memory, and a branch unit that handles changes in the flow of instructions into the processor.
The execution units are fed instructions and data by separate 16-KB, four-way set-associative instruction and data caches, which in turn communicate off-chip through the bus interface unit.
For example, if an instruction executing out of order causes an exception, the exception is not ed in the reorder buffer and isn't handled until the instruction is retired from the buffer.
http://www.byte.com/art/9406/sec11/art1.htm   (1731 words)

  
 Technical Details of Pentium 4
With the high frequency of these execution units in the NetBurst micro-architecture and the implementation of the Rapid Execution Engine, where the Arithmetic Logic Units are running at two times the core frequency, Intel has implemented a number of features that ensure that these execution units have a continuous stream of instructions to execute.
In addition, the Execution Trace Cache stores these micro-ops in the path of program execution flow, where the results of branches in the code are integrated into the same cache line.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a Level 1-instruction cache.
http://www.wiu.edu/users/mfdmd/P4/technical.html   (1656 words)

  
 History of the PowerPC Architecture
The instruction fetch unit retrieves two instructions from the instruction cache per cycle and places them in a six-entry instruction queue.
On a cache miss, cache clocks are filled in the burst fill, which is performed as a “critical-double-word-first” operation; the critical double word is written to the cache and forwarded to the required unit simultaneously, thus minimizing stalls due to the cache fill latency.
If a CR bit is unavailable, the branch unit predicts either branch taken or branch not taken paths depending on the bits in the branch opcode.
http://www.cs.pitt.edu/~alanjawi/p603.html   (1448 words)

  
 POWER2 Fixed-Point, Data Cache, and Storage Control Units
Other responsibilities for the execution units include performing the data transformations required by fixed-point RR operations, computing the effective address for all storage references, and providing data flow controls during the execution of move to" and move from" special purpose register instructions.
Execution Unit 0 also performs special operations such as cache operations and all privileged operations.
Whenever a fixed-point load/store/cache operation is executed, it is compared to all entries in the FPSQ to check for a match.
http://www-ee.eng.hawaii.edu/~mhpcc/ibmhwsw/fxu.html   (7621 words)

  
 [No title]
The more execution units the processor's execution core has, the larger the instruction window needs to be in order to extract even more parallelism from the code stream.
Executing more instructions at once means that programs run faster and performance is increased.
These idle execution units are still taking up die space and drawing power, so widening a processor is therefore a relatively inefficient way to increase its performance.
http://www.arstechnica.com/articles/paedia/cpu/xbox360-2.ars/1   (718 words)

  
 POWER3: The next generation of PowerPC processors
In the case of data forwarding between execution units, or when, on the same execution unit, the first instruction is feeding the FRA operand of the dependent instruction, the latency is four cycles.
The single-cycle units execute all single-cycle instructions (arithmetic, shift, logical, compare, trap, and count leading zero) with a single-cycle latency (this means that instructions dependent upon the result can execute in the next cycle).
The execution units can pull instructions from the queue in an out-of-order fashion, allowing logically later instructions whose operands are available to bypass other instructions which are waiting for operands.
http://www.research.ibm.com/journal/rd/446/oconnell.html   (7426 words)

  
 History of the PowerPC Architecture
Even with the precise interrupts support, the out-of-order execution in the 620 is still able to achieve a reasonable degree of instruction-level parallelism, with average IPC of 1.23 for integer benchmarks and 1.26 for floating-point benchmarks.
After the instruction is executed, the result is sent to the destination rename buffer and forwarded to any waiting instruction, and the instruction is marked as finished.
The overall execution time that typically spent by an instruction is summarized below.
http://www.cs.pitt.edu/~alanjawi/p620.html   (1130 words)

  
 BDTI - Buyer's Guide to DSP Processors: Chap. 7
Generally, the execution units of data path one operate on registers in register file A and the units of data path two operate on registers in register file B. However, the register files are interconnected to the opposite data path's functional units via cross paths.
As illustrated in Figure 7.17-2, each data path has a set of four execution units, a general-purpose register file, and paths for moving data between memory and the data path.
This means that in an ideal situation all execution units in both data paths operate independently and eight simultaneous operations can be performed.
http://www.bdti.com/products/chap7-17.html   (1947 words)

  
 Multithread Execution on One Physical Processor - Intel® Software Network
When the slow instruction on the first unit completes, its results are folded into the results of the instructions from the second unit in such a way that it appears from the outside that these instructions were processed sequentially on a single execution unit.
For example, if one execution unit is waiting for an instruction to complete, other instructions can be executed on another unit.
This algorithm, which is heavily used in out-of-order execution, assumes that backwards jumps are always taken (as they would be in a loop) and that forward jumps are never taken.
http://www.intel.com/cd/ids/developer/asmo-na/eng/19934.htm   (2189 words)

  
 CPU with multiple execution units (EP0106670B1)
As instructions are issued to the execution units, the operation code identifying each instruction is also issued in program order to an instruction execution queue 18 of the collector.
Collector control 46 causes the results of the execution of instructions to program visible registers to be stored in a master safe store register 48 in program order which is determined by the order of instructions stored in the instruction execution stack on a first-in, first-out basis.
The results of the execution of each instruction by an execution unit are stored in a result stack 38, 40, 42, 44 associated with each execution unit.
http://www.delphion.com/details?pn=EP00106670B1   (264 words)

  
 X-bit labs - Print version
The instruction buffer is 16 instructions long and they are all waiting for the appropriate execution units to become free.
A modification is possible when several CPUs on a daughter card are united with one switch and are then attached to a higher-level switch and so on.
There are many algorithms (in some areas they form a majority of algorithms) that are not easily paralleled, being sequential in their nature.
http://www.xbitlabs.com/articles/cpu/print/server-systems.html   (11461 words)

  
 [No title]
Static execution is simple to implement and takes up much less die space than dynamic execution, since the processor doesn't need to spend a lot of transistors on the instruction window and related hardware.
This static execution scheme is pretty much the same one used in older, less complex designs, like the original Intel Pentium.
Instead, instructions pass through the processor in the order in which they're fetched, with the twist that two adjacent, non-dependent instructions are executed in parallel where possible.
http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars/2   (643 words)

  
 Pentium® III Processor Implementation Tradeoffs
We considered the effective hardware support of these instructions to be an important method to improve the utilization of FP units, since it allows for less time to be spent in data reorganization.
The multiplier resides on Port 0 and is a modification of the existing FP multiplier.
IA-32 instructions overwrite one of the operands of the instruction.
http://www.intel.com/technology/itj/q21999/articles/art_2d.htm   (984 words)

  
 VLIW Processors and Trace Scheduling
If a naive instruction encoding is used binaries containing instruction words with a fixed number of opcodes cannot be executed on another processor of the same family with a different number of execution units.
Since code bloat is avoided and sequential execution is possible the compiler will have to be more choosy about speculative execution rather than take the naive approach 'The space will be wasted, so fill in a speculative instruction anyway'.
Thus speculative execution is discouraged since it could potentially affect system throughput by executing useless operations.
http://www.cs.utah.edu/%7Embinu/coursework/686_vliw/old   (4469 words)

  
 AltiVec Performance Issues
While the G4 is considered an out of order execution machine, the instructions dispatched to a particular execution unit all execute in the order that they appear in the instruction stream.
In front of each execution unit on a 970, there is a queue where dispatched instructions reside until the data that they need becomes available so that they can execute.
In this phase the instruction is actually executed by the execution unit to which it was dispatched.
http://developer.apple.com/hardware/ve/performance.html   (2768 words)

  
 Execution unit - Wikipedia, the free encyclopedia
In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the program.
It is commonplace for modern CPUs to have multiple parallel execution units, referred to as scalar or superscalar design.
The simplest arrangement is to use one, the bus manager, to manage the memory interface, and the others to perform calculations.
http://en.wikipedia.org/wiki/Execution_unit   (125 words)

  
 Ace's Hardware
With in order execution, each time we encounter an instruction that depends on the result of another one, the CPU can not issue it until the result of the latter instruction is known.
The higher the latencies on the execution units, the longer a instruction must wait on the result of the previous instruction.
So the K7 has 3 integer units which match the decoders when the code is pure integer code, and 3 FPU pipes which match the decoders when executing pure MMX, 3Dnow!, or x87 code.
http://www.aceshardware.com/Spades/read.php?article_id=53   (2394 words)

  
 [No title]
The execution path of an instruction is the sequence of operations which each instruction must go through in the processor.
Another important question is if the output of execution units is to be directly connected to the input of other execution units.
With the help of this register the execution unit does not have to count the number of iterations in a FOR loop, and only serial code is passed over from the branching to the execution units.
http://www.inf.fu-berlin.de/lehre/WS94/RA/RISC-9.html   (12041 words)

  
 TechOnLine - Embedded Processors 2000
Each tile has 4 memory buffers, 7 datapath units, and 2 multipliers for a total of 24 multipliers and 84 datapath units or computing cells.
User generated PowerPlug units can be attached to new instructions that are added to the software tool chain, including the compilers.
Each PE has 5 execution units (ALU, MAU, DSU, LOAD, STORE, a data memory and a VLIW instruction memory) to drive the execution units.
http://www.techonline.com/community/ed_resource/feature_article/8103   (3749 words)

  
 IBM Research - Computer Architecture - Paper Abstracts
Basic to these techniques is a simple common data busing and register tagging scheme which permits simultaneous execution of the independent instructions while preserving the essential dependences inherent in the instruction stream.
The common data bus improves performance by efficiently utilizing the execution units without requiring specially optimized code.
The Yorktown Simulation Engine (YSE) is a high speed special purpose parallel processor designed and built at the IBM Thomas J. Watson Research Center to simulate the logical operation of large digital networks.
http://www.research.ibm.com/compsci/arch/abs.html   (648 words)

  
 week 9 notes [multiple execution units, vliw vs. superscalar, caches]
when we have a pipeline with multiple execution units, a number of problems appear that we didn't have to deal with before: structural hazards, and write-after-write dependencies.
we can have structural hazards if two instructions finish execution at the same time.
draw a pipeline timing diagram that shows how the following code executes on our multiple-execution-unit cpu.
http://www-cse.ucsd.edu/~j2lau/cs141/week9.html   (1492 words)

  
 Multiple execution units
As we have observed, the problem with eight execution units is still within the realm that can be solved with BDD's.
Note finally, that we have only verified our implementation of Tomasulo's algorithm for one execution.
However, if we want to verify the design for an arbitrary number of execution units, we'll need to deal with the problem of
http://www.cs.indiana.edu/classes/p415-sjoh/readings/smv/CadenceSMV-docs/smv/tutorial/node43.html   (354 words)

  
 Intel ready to ship dual-core processors - CNET.com
This idea differs from Intel's Hyper-Threading technology, which uses a single (physical) execution unit but allows the processor to run two separate (logical) execution threads.
Windows is a multitasking environment, and as such, there are usually applications running in both the foreground (such as the browser you are using to read this) and the background (such real-time virus scanning).
It will feature two 3.2GHz execution units--each of which includes 1MB of L2 cache and supports Hyper-Threading--for a total of 2MB of L2 cache and support for four execution threads.
http://www.cnet.com/4520-6022_1-5756419-1.html   (1243 words)

  
 AnandTech: Intel's Hyper-Threading Technology: Free Performance?
Unfortunately the reality of most x86 code is that there is not as much ILP as we would like there to be so we must find other ways to improve performance.
To help better illustrate this let's create a hypothetical CPU with three execution units: an ALU, FPU, and a Load/Store unit for reading from/writing to memory.
The most commonly used desktop software will perform a handful of integer calculations as well as loads and stores but leave the FP units untouched.
http://www.anandtech.com/showdoc.aspx?i=1576&p=2   (696 words)

  
 [No title]
In order to more consistently execute more instructions, a processing paradigm called out-of-order processing (OOP) can be used, and it has in fact become mainstream This paradigm arose because many instructions are dependent upon the outcome of other instructions, which have already been sent into the processing pipeline.
In this way, a CMP chip in a multithreaded (or multiprogrammed) environment is able to execute faster due to more efficient use of available resources over the various threads, and because of the potential to increase the clock rate over that of a monolithic processor.
Although the instruction fetch, dispatch, and execution is out of order, instructions are reordered after they complete execution and all mispredictions, including branch and data, are corrected.
http://oregonstate.edu/~anands/HPCA_Project_Mid-Term_Report.doc   (3739 words)

  
 [No title]
Distribution of hazard detection and control logic (Because of distributed reservation stations and CDB: Multiple instructions waiting on a single operand can all start execution as soon as the operand is broadcast on the common data bus) 2.
I nherent limitations of ILP · May have to unroll loops many times to get enough instructions to fill issue slots · Especially for VLIW · If units have long latency, may have to schedule many operations · Need about as many operations as (pipeline depth * number functional units) 2.
Write result: when result available, write on CDB and from there into any regis- ters or functional units waiting for the result · Differences from Scoreboarding: 1.
http://www.cc.gatech.edu/classes/cs4760_98_fall/lectures/lecture3.ascii   (840 words)

  
 Tom's Hardware Guide Processors: Intel's New Pentium 4 Processor - The Rapid Execution Engine
While Intel is only talking about the four fast execution units, the other four are the actual units that are responsible for Pentium 4's peculiar behavior in the benchmarks.
The story looks a lot different for the instructions that cannot be processed by the rapid execution units.
Instructions, making Intel's 'Rapid Execution Engine'-design sensible though not particularly amazing.
http://www.tomshardware.com/cpu/20001120/p4-10.html   (1225 words)

  
 Keeping Execution Units Busy
Each clock cycle, need to read 2N operands, write N results for N Execution Units
http://www.erc.msstate.edu/~reese/EE8063/html/VLIW/tsld002.htm   (15 words)

  
 Reconfigurable Pipelines in VLIW Execution Units
Compilers for VLIW machines may be unable to find instructions to fill every field in every word, and the empty fields waste memory bandwidth and reduce the average number of instructions completed per cycle.
Reconfigurable logic can perform a much wider range of tasks than optimized static functional units, but performance speed for any particular task rarely approaches that of custom logic.
The basic question addressed by this work was whether the greater hardware utilization offered by reconfigurable functional units could compensate for a reduction in clock rate.
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/fccm/1999/0375/00/0375toc.xml&DOI=10.1109/FPGA.1999.803705   (208 words)

  
 Energy Citations Database (ECD) - Energy and Energy-Related Bibliographic Citations
Availability information may be found in the Availability, Publisher, Research Organization, Resource Relation and/or Author (affiliation information) fields and/or via the "Full-text Availability" link.
Energy Citations Database (ECD) Document #5364840 - Storage access-exception detection for pipelined execution units
A technique is described that signals a storage access-exception condition for a data word after an execution unit pipeline has completed processing all preceding elements.
http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=5364840   (102 words)

  
 IP.com's Prior Art Database
In order to reduce the number of write ports to the Register File, the three execution units can share a single result bus.
However, due to different in execution latency between the type of instructions in the execution units, two or more instructions can produce the results in the same cycle.
This will allow the execution unit that is in the process of producing the result to gain total control of the result bus for 1 cycle.
http://www.priorartdatabase.com/IPCOM/000011910   (368 words)

  
 [No title]
The fact that both the K7 and the MPC7400 execute instructions out-of-order (OOO) constitutes a major architectural similarity between the two.  OOO execution requires lots of extra hardware, and it ads serious complexity to a CPU's design.  For those of you who aren't familiar with the concept of OOO execution, it's pretty easy to summarize.  
Oh, and one more thing.  I'm using the term "Execution Unit" to designate a "Functional Unit" that lives in the back end, i.e.
The G4 and the K7: an architectural look at two post-RISC processors
http://arstechnica.com/articles/paedia/cpu/g4vsk7.ars/3   (141 words)

  
 Exclusive Athlon 600 Preview
This is where L1 cache and internal buffering come into play.
This is what is known as a Superscaler instruction pipeline.
Once x86 instructions have been "decoded" to RISC, they have to be buffered before they can be executed by the CPU.
http://www.firingsquad.com/hardware/athlon600preview/page4.asp   (621 words)

  
 Chinese courts purchasing mobile execution units : AZ IMC
Unless otherwise stated by the author, all content is free for non-commercial reuse, reprint, and rebroadcast, on the net and elsewhere.
London-based rights group Amnesty International counted 1,060 publicly reported executions in China last year, but stated that the actual number is far more.
"Many large cities have permanent execution grounds, but in smaller cities it is difficult to carry out death sentences, so this is why we have these mobile execution units."
http://arizona.indymedia.org/mail.php?id=14290   (360 words)

  
 [No title]
With UltraSPARC Sun, architects add special graphics/math processing capabilities in the FPU, now known as the Floating-Point/Graphics Execution Unit.
The FPU/GEU can execute up to two floating-point/graphics operations (FGops) and one Floating-Point load/store operation in each pipeline clock cycle.
It also has 4 sets of FP condition code registers for more parallelism.
http://www.redhat.com/support/wpapers/cygnus/cygnus_evaluate/instruction.html   (556 words)

  
 Execution Units
This section will introduce you to execution units.
Register Sections button at right to begin the tutorial or another button to start with the section of your choice.
http://www.unf.edu/~swarde/Execution_Units/execution_units.html   (28 words)

  
 Controlling Stalls - the Execution Units
The execution units will never be stalled; they will either get a valid instruction or a NOP
The CRS and DRR should ignore any result packets whose valid flag is false.
If receive a NOP, the valid flag in the result packet for the NOP will be set to FALSE
http://www.erc.msstate.edu/~reese/EE8063/html/rename/tsld042.htm   (50 words)

Compwisdom
 About us   |  Why use us?   |  Press   |  Contact us

 Copyright © 2006 CompWisdom.com Usage implies agreement with terms.