|
| |
| | CPU with multiple execution units (EP0106670B1) |
 | | Collector control 46 causes the results of the execution of instructions to program visible registers to be stored in a master safe store register 48 in program order which is determined by the order of instructions stored in the instruction execution stack on a first-in, first-out basis. |  | | As instructions are issued to the execution units, the operation code identifying each instruction is also issued in program order to an instruction execution queue 18 of the collector. |  | | The results of the execution of each instruction by an execution unit are stored in a result stack 38, 40, 42, 44 associated with each execution unit. |
|
http://www.delphion.com/details?pn=EP00106670B1
(264 words)
|
|
| |
| | Data and Control Speculative Execution |
 | | Once the input dependencies have been resolved the instruction is executed, possibly out-of-order with respect to the programmed order of other instructions in the buffer. |  | | More of the speculative execution ends up waiting in retirement buffers for prior (in virtual order) execution to complete. |  | | The speculative execution of the BCD addition completed its computation in nine steps. |
|
http://www.cs.waikato.ac.nz/timewarp/wengine/papers/gc99_1/
(264 words)
|
|
| |
| | AltiVec Performance Issues |
 | | While the G4 is considered an out of order execution machine, the instructions dispatched to a particular execution unit all execute in the order that they appear in the instruction stream. |  | | In front of each execution unit on a 970, there is a queue where dispatched instructions reside until the data that they need becomes available so that they can execute. |  | | In this phase an instruction is moved from one of the dispatch buffers to the portal of the appropriate execution unit. |
|
http://developer.apple.com/hardware/ve/performance.html
(2768 words)
|
|
| |
| | BYTE.com |
 | | It supports out-of-order execution, branch prediction up to four levels deep, speculative execution, and dynamic-register renaming. |  | | This is important because loads account for about 20 percent of all instructions, and the speculative execution of other instructions will screech to a halt if they depend on data that isn't available yet. |  | | Load operations require two cycles if they hit the data cache, and they can be executed speculatively and out of order. |
|
http://www.byte.com/art/9411/sec8/art6.htm
(2768 words)
|
|
| |
| | HPCA_Project_Mid-Term_Report.doc |
 | | Although the instruction fetch, dispatch, and execution is out of order, instructions are reordered after they complete execution and all mispredictions, including branch and data, are corrected. |  | | In order to more consistently execute more instructions, a processing paradigm called out-of-order processing (OOP) can be used, and it has in fact become mainstream This paradigm arose because many instructions are dependent upon the outcome of other instructions, which have already been sent into the processing pipeline. |  | | In this way, a CMP chip in a multithreaded (or multiprogrammed) environment is able to execute faster due to more efficient use of available resources over the various threads, and because of the potential to increase the clock rate over that of a monolithic processor. |
|
http://oregonstate.edu/~anands/HPCA_Project_Mid-Term_Report.doc
(3739 words)
|
|
| |
| | BYTE.com |
 | | For example, if an instruction executing out of order causes an exception, the exception is not ed in the reorder buffer and isn't handled until the instruction is retired from the buffer. |  | | The other execution units in the 604 are an IEEE 754 compatible FPU, a load/store unit that moves data between registers and memory, and a branch unit that handles changes in the flow of instructions into the processor. |  | | The execution units are fed instructions and data by separate 16-KB, four-way set-associative instruction and data caches, which in turn communicate off-chip through the bus interface unit. |
|
http://www.byte.com/art/9406/sec11/art1.htm
(1731 words)
|
|
| |
| | POWER3: The next generation of PowerPC processors |
 | | The execution units can pull instructions from the queue in an out-of-order fashion, allowing logically later instructions whose operands are available to bypass other instructions which are waiting for operands. |  | | In the case of data forwarding between execution units, or when, on the same execution unit, the first instruction is feeding the FRA operand of the dependent instruction, the latency is four cycles. |  | | The independence of the fixed-point execution units and the load/store execution units is obviously a large performance benefit for calculations that are predominately integer in nature, such as Monte Carlo simulations. |
|
http://www.research.ibm.com/journal/rd/446/oconnell.html
(7426 words)
|
|
| |
| | 1 |
 | | The more execution units the processor's execution core has, the larger the instruction window needs to be in order to extract even more parallelism from the code stream. |  | | Increasing the width of the execution core definitely has its advantages, but there's a limit to how wide the core can be before you reach a point of diminishing returns. |  | | These idle execution units are still taking up die space and drawing power, so widening a processor is therefore a relatively inefficient way to increase its performance. |
|
http://www.arstechnica.com/articles/paedia/cpu/xbox360-2.ars/1
(718 words)
|
|
| |
| | History of the PowerPC Architecture |
 | | Even with the precise interrupts support, the out-of-order execution in the 620 is still able to achieve a reasonable degree of instruction-level parallelism, with average IPC of 1.23 for integer benchmarks and 1.26 for floating-point benchmarks. |  | | For each execution unit, the overall execution latency is the actual execution time of an instruction plus the waiting time in the reservation stations. |  | | The overall execution time that typically spent by an instruction is summarized below. |
|
http://www.cs.pitt.edu/~alanjawi/p620.html
(1130 words)
|
|
| |
| | Multithread Execution on One Physical Processor - Intel® Software Network |
 | | This algorithm, which is heavily used in out-of-order execution, assumes that backwards jumps are always taken (as they would be in a loop) and that forward jumps are never taken. |  | | When the slow instruction on the first unit completes, its results are folded into the results of the instructions from the second unit in such a way that it appears from the outside that these instructions were processed sequentially on a single execution unit. |  | | For example, if one execution unit is waiting for an instruction to complete, other instructions can be executed on another unit. |
|
http://www.intel.com/cd/ids/developer/asmo-na/eng/19934.htm
(2189 words)
|
|
| |
| | 2 |
 | | Static execution is simple to implement and takes up much less die space than dynamic execution, since the processor doesn't need to spend a lot of transistors on the instruction window and related hardware. |  | | Each individual thread has a limited amount of ILP and each PPE has a very narrow execution core, but the combined effect of a large number of simultaneously running threads means that more total execution units are put to work at once across all three PPEs. |  | | The end result is that the when running a multithreaded workload, the Xbox 360's three-core processor will contain a higher total number of execution units that are busy at any given moment than a comparable single-core processor. |
|
http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars/2
(643 words)
|
|
| |
| | Disjoint Eager Execution: An Optimal Form of Speculative Execution - Uht, Sindagi, Hall (ResearchIndex) |
 | | Traditional speculative code execution is the execution of code down one path of a branch (branch prediction) or both paths of a branch (eager execution), before the condition of the branch has been evaluated, thereby executing code ahead of time, and improving performance. |  | | executes the path that is the most likely to be correct out of Mispredicted Cycles Wasted 1 Incorrect Paths Correct Branch1 Reclaimed 1... |  | | Abstract: Instruction Level Parallelism (ILP) speedups of an order-of-magnitude or greater may be possible using the techniques described herein. |
|
http://citeseer.nj.nec.com/uht95disjoint.html
(643 words)
|
|
| |
| | VLIW Processors and Trace Scheduling |
 | | If a naive instruction encoding is used binaries containing instruction words with a fixed number of opcodes cannot be executed on another processor of the same family with a different number of execution units. |  | | Since code bloat is avoided and sequential execution is possible the compiler will have to be more choosy about speculative execution rather than take the naive approach 'The space will be wasted, so fill in a speculative instruction anyway'. |  | | The model describes the number of register banks, ports, functional units, nature of the interconnect etc.Initially the compiler generates an expression DAG and the nodes are traversed in topological order for functional unit assignment. |
|
http://www.cs.utah.edu/%7Embinu/coursework/686_vliw/old
(4469 words)
|
|
| |
| | Functional programming in the Java language |
 | | From a functional programming standpoint, the expression is not yet a general piece of logic; that is, it cannot be passed around and asked to execute whenever you want, without regard to the current position of execution control. |  | | A higher order function is able to take another function (indirectly, an expression) as its input argument, and in some cases it may even return a function as its output argument. |  | | That is, you need to compose a function that internally calls the first functor and streams the output of that evaluation as input into the evaluation of the second functor. |
|
http://www-128.ibm.com/developerworks/library/j-fp.html?ca=dnt-528
(4910 words)
|
|
| |
| | Ace's Hardware |
 | | With in order execution, each time we encounter an instruction that depends on the result of another one, the CPU can not issue it until the result of the latter instruction is known. |  | | The higher the latencies on the execution units, the longer a instruction must wait on the result of the previous instruction. |  | | Each execution unit is specialized in executing certain types of instructions, so CPU designers have to make sure that the execution units are able to gobble up what the decoders feed them. |
|
http://www.aceshardware.com/Spades/read.php?article_id=53
(2394 words)
|
|
| |
| | execution - OneLook Dictionary Search |
 | | Phrases that include execution: writ of execution, stay of execution, small order execution system, instrument of execution, concurrent execution, more... |  | | noun: (computer science) the process of carrying out an instruction by a computer |  | | execution : Free On-line Dictionary of Computing [home, info] |
|
http://www.onelook.com/cgi-bin/cgiwrap/bware/dofind.cgi?word=execution
(346 words)
|
|
| |
| | Citations: parallel instruction execution) computers - Thorlin, for (ResearchIndex) |
 | | ....allow concurrent execution of instructions, the same instruction ordered in different ways can produce different execution times. |  | | Thus, a goal of most compilers is to reduce the time for executing the instructions by selecting an order to maximize the concurrent execution of the instructions [ |  | | Code generation for PIE (parallel instruction execution) computers. |
|
http://sherry.ifi.unizh.ch/context/527762/0
(296 words)
|
|
| |
| | Irisa : thèse proposée pour la rentrée 2001 |
 | | These mechanisms allow to expose to the compiler a part of the speculative execution that is managed by hardware on current out-of-order execution superscalar processors. |  | | On the other hand, support for part of this speculative execution can be provided by the ISA as in the new EPIC instruction set IA64 from Intel/HP. |  | | The path for higher performance using current ISAs seems to be more and more speculative execution (branches, memory dependencies, data,..). |
|
http://www.irisa.fr/theses2001/caps1.htm
(296 words)
|
|
| |
| | Patent 5454117: Configurable branch prediction for a processor performing speculative execution |
 | | It is possible to perform speculative execution (also known as conditional, or out-of-order execution) past predicted branches, if additional state is provided for backing up the machine state upon mispredicted branches. |  | | While opcode information is used to address different sets of history information, the prediction hardware and algorithm themselves are invariant with instruction execution. |  | | Pipeline processors decompose the execution of instructions into multiple successive stages, such as fetch, decode, and execute. |
|
http://www.freepatentsonline.com/5454117.html
(296 words)
|
|
| |
| | Gulf Stream Finder Execution |
 | | It reads in the analyzed Gulf Stream if there is one, then evolves the best function for finding the Gulf Stream in ROFS output and prints out that result. |  | | The nowcast and forecast Gulf Stream output (analyzed.* is the native output from gsfinder, and reformed.* is the reformatted output for NAWIPS use) are copied to polar:/home/ftp/pub/gsf/, and to sgi100:~wd21rg/gsf/ |  | | It invoked executables parse.pl, gsfinder, and form (in that order), as well as managing data collection and distribution. |
|
http://polar.ncep.noaa.gov/gsf/gsfexecution.html
(527 words)
|
|
| |
| | Es: A shell with higher-order functions |
 | | The ability to replace primitive functions in es is key to its extensibility; for example, a user can override the definition of pipes to cause remote execution, or the path-searching machinery to implement a path look-up cache. |  | | On the other hand, if the functions are run in a subshell, the connection between their lexical scopes is lost as a consequence of them being exported in separate environment strings. |  | | Additionally, functions in the environment are an optimization for file I/O and parsing time. |
|
http://www.webcom.com/~haahr/es/es-usenix-winter93.html
(5809 words)
|
|
| |
| | Into the Itanium, Part 2 |
 | | Instead of having the hardware dynamically assign the order of execution, and try to figure out what can all be done at the same time in parallel by looking at dependencies and available resources, this is all done ahead of time when the program is first compiled. |  | | They also can be executed "out of order," which is one reason why x86 processors are capable of being so blindingly quick. |  | | At this point, the instructions are decoded further into smaller pieces called "micro-ops" or "uops," which the hardware tries to align as much as possible in parallel, and pass off to the various execution units. |
|
http://www.devhardware.com/c/a/Computer-Processors/Into-the-Itanium-Part-2
(5809 words)
|
|
| |
| | aspects.html |
 | | With the gulf of execution being examined in prototype one, the next logical question to ask is, "How quickly can one perform the required actions?" More specifically, this prototype is concerned with the speed at which the user can navigate through the different screens to group together an order. |  | | Having objects in their most expected places potentially impacts both the user's gulf of execution and gulf of evaluation, and having objects in the most convenient places can affect how well actions can be performed. |  | | One prototype will examine features of the interface which relate to the gulf of execution. |
|
http://www.cc.gatech.edu/computing/classes/cs6751_94_fall/groupj/phase3/aspects.html
(283 words)
|
|
| |
| | Part 3 of HCI Project |
 | | The gulf of execution is increased and hence short-term memory strain is increased as users are forced to remember the order for a longer amount of time. |  | | The increased gulf of execution for each item will strain the short-term memory load of users trying to remember and process orders with many items. |  | | Of course the null hypothesis is that there is no significant difference in the gulf of execution and that participants will have a very similar reaction to both interfaces in the questionnaire. |
|
http://filebox.vt.edu/users/dturnbul/cs3724/part3.html
(5268 words)
|
|
| |
| | WCET-With-Unknowns.doc |
 | | Upon evaluating this response time, it would then be up to the system to determine the feasibility and improvement in execution time to perform cache pre-loading (provided a consistent footprint could be generated in order to pre-load this footprint before preemption). |  | | This engine would attempt to gradually lower the WCET long before the program is scheduled to run, thereby allowing later events to be scheduled in time slots that the original WCET would have forbidden (by lengthening the expected execution time of the event containing the program in question). |  | | However, in the case where the vendor of the chip is not even known, it may be wise to collect the worst case timing data for all possible vendors. |
|
http://www.cs.umd.edu/Honors/reports/wjhun/WCET-With-Unknowns.doc
(2807 words)
|
|
| |
| | Theses from Uppsala University : 1832 - Processor Pipelines and Static Worst-Case Execution Time Analysis |
 | | Static worst-case execution time analysis is a family of techniques that promise to quickly provide safe execution time estimates for real-time programs, simultaneously increasing system quality and decreasing the development cost. |  | | The representation and analysis is more powerful than previous approaches in that pipeline timing effects across more than pairs of instructions can be handled, and in that no assumptions are made about the program structure. |  | | We prove several interesting properties of processors with in-order issue, such as the freedom from timing anomalies and the fundamental safety of WCET analysis for certain classes of pipelines. |
|
http://publications.uu.se/theses/abstract.xsql?isbn=91-554-5228-0
(417 words)
|
|
| |
| | AbsInt: Worst-Case Execution Time Analysis |
 | | It is essential that the worst-case execution time (WCET) of each task is known in order to ensure that the system works correctly. |  | | The computed time bounds are valid for all inputs and for each execution of the task. |  | | Simply measuring the execution time of a task for a given input is typically not safe. |
|
http://www.absint.com/wcet.htm
(485 words)
|
|
| |
| | Frequently asked questions |
 | | There is also a branch of WCET research that aims to compute a WCET bound with a very low risk of being exceeded, such as 10^-8 (G. Bernat, University of York), which may be sufficient for critical systems if the risk of other failures such as hardware errors is of the same order of magnitude. |  | | Schedulability analysers are often based on the Rate Monotonic Analysis (RMA) and are available from other vendors, but all need worst-case execution times as input. |  | | This is a program that is given the set of threads and the execution frequency and worst-case execution time of each thread, and tells you whether the threads can be scheduled so that all deadlines are met. |
|
http://www.tidorum.fi/bound-t/faq.html
(1393 words)
|
|
| |
| | Guillem Bernat Research topics |
 | | The low-level analysis includes the investigation of statistical methodsfor the determination of accurate execution profiles of programs by means of measurement and new statistical methods to reason about such nformation in order to be used effectively into a flexible scheduling context. |  | | Current timing analysis techniques of real-time systems assume absolute knowledge of the arrival time of tasks and their execution time, as well as assuming that execution times of programs do not vary. |  | | The high-level analysis includes the investigation of scheduling algorithms that provide in addition to a minimum guaranteed level of service an effective use of available resources in a context in which the scheduling information is not precisely known (exact arrival times and execution time of tasks). |
|
http://www-users.cs.york.ac.uk/~bernat/research.html
(811 words)
|
|
|