CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Needed: clear thinking about multithreading and multi-core
For Kevin Kissell of MIPS, decisions about multi-core or multithreading are more than just either/or



Embedded.com
Is multithreading better than multi-core? Is multi-core better than multithreading? One might as well ask whether a diesel engine is better than four-wheel drive! The best vehicle for a given application might have one, the other, or both. Or neither. They are independent " but complementary " design decisions.

With multithreaded processors and multi-core chips becoming the norm, architects and designers of digital systems need to understand their respective attributes, advantages, and disadvantages.

Tapping Concurrency as a Resource
What multithreading and multi-core have in common is that they both exploit the concurrency in a computational workload. The cost, in silicon, energy, and complexity, of making a CPU run a single instruction stream ever faster goes up non-linearly, and eventually hits a wall imposed by the physical limitations of circuit technology.

That wall keeps moving out a little further every year, but cost and power-sensitive designs are constrained to follow the bleeding edge at a safe distance. Fortunately, virtually all computer applications have some degree of concurrency: At least some of the time, there are two or more independent tasks that need to be performed. Taking advantage of concurrency to improve computing performance and efficiency isn't always trivial, but it's certainly easier than violating the laws of physics.

Multi-processor, or multi-core, systems exploit concurrency to spread work around a system. As many software tasks can run at the same time as there are processors in the system. This can be used to improve absolute performance, cost, or power/performance. Clearly, once one has built the fastest single processor possible in a given technology, the only way to get even more compute power is to use more than one of them.

More subtly, if a load that would saturate a 1GHz processor could be evenly spread across 4 processors, those processors could be run at roughly 250MHz each. If each 250MHz processor is less than ¼ the size of the 1GHz processor, or consumes less than one-fourth the power, either of which may be the case because of the non-linear cost of higher operating frequencies, the multi-core system might be more economical.

Many designers of embedded SoCs are already exploiting concurrency with multiple cores. Unlike general-purpose workstations and servers, whose workload is variable and unknowable to system designers, it's often possible to analyze and decompose a fixed set of embedded device functions into specialized tasks, and assign tasks across multiple processors, each of which has a specific responsibility, and each of which can be specified and configured optimally for that specific job.

Multithreaded processors also exploit the concurrency of multiple tasks, but in a different way, and for a different reason. Instead of a system-level technique to spread CPU load, multithreading is a processor-level optimization to improve area and energy efficiency.

Multithreaded architecture is driven to a large degree by the observation that single-threaded high-performance processors spend a surprising amount of time doing nothing. When the results of a memory access are required for a program to advance, and that access must reference RAM whose cycle time is tens of times slower than that of the processor, a single-threaded processor is condemned to stall until the data is returned.

The multithreading hypothesis can be stated as: If latencies prevent a single task from keeping a processor pipeline busy, then a single pipeline should be able to complete more than one concurrent task in less time than it would take to run the tasks serially. This means running more than one task's instruction stream, or thread, at a time, which in turn means that the processor has to have more than one program counter, and more than one set of programmable registers.

Replicating those resources is far less costly than replicating an entire processor. In the MIPS32 34K processor, for example, which implements the MIPS MT multithreading architecture, an additional 14% of area can buy an additional 60% of throughput, relative to a comparable single-threaded core. (Measured using the EEMBC PKFLOW and OSPF benchmarks, run sequentially on a MIPS32 24KE core versus concurrently on a dual-threaded MIPS32 34K core.)

Multi-processor architectures are infinitely scalable, in theory. However many processors one has, one can always imagine adding another, though only a limited class of problems can make practical use of thousands of CPUs. Each additional processor core on an SoC adds to the area of the chip at least as much as it adds to the performance.

Multithreading a single processor can only improve performance up to the level where the execution units are saturated. However, up to that limit, it can provide a "superlinear" payback for the investment in die size.

While the means and the motives are different, multi-core systems and multithreaded cores have a common requirement that concurrency in the workload be expressed explicitly by software. If the system has already been coded in terms of multiple tasks running on a multi-tasking OS, there may be no more work to be done.

Monolithic, single-threaded applications need to be reworked and decomposed either into sub-programs or explicit software threads. This work must be done for both multithreaded and multi-core systems, and once completed, either can exploit the exposed concurrency - another reason why the two techniques are often confused, and something that makes them highly complementary.

When is Multi-core a Good Idea?
For embedded SoC designs, a multi-core design makes the most sense when the functions of the SoC decompose cleanly into subsystems with a limited need for communication and coordination between them.

Instead of running all code on a single, large, high-frequency core, connected to a single, large, high-bandwidth memory, assigning tasks to several simpler, slower cores allows code and data can be stored in per-processor memories, each of which has both a lower requirements for capacity and bandwidth. That normally translates into power savings, and potentially in area savings as well, if the lower bandwidth requirement allows for physically smaller RAM cells to be used.

If the concurrent functions of an SoC cannot be statically decomposed at system design time, an alternative approach is to emulate general-purpose computers and build a coherent SMP cluster of processor cores. Within such a cluster, multiple processors are available as a pool to run the available tasks, which are assigned to processors "on the fly".

The price to be paid for this flexibility is that it requires a sophisticated interconnect between the cores and a shared main memory, and the shared main memory needs to be relatively large and high-bandwidth. This negates the area and power advantages alluded to above for functionally partitioned multi-core systems, but can still be a good trade-off.

Every core represents additional die area, and even in a "powered down" standby state, each core in a multi-core configuration consumes some amount of leakage current, so the number of cores in an SoC design should in general be kept to the minimum necessary to run the target application. There is no point in building a multi-core design if the problem can be handled by a single core within the system's design constraints.

When is Multithreading a Good Idea?
Multithreading makes sense whenever an application with some degree of concurrency is to be run on a processor that would otherwise find itself stalled a significant portion of the time waiting for instructions and operands. This is a function of core frequency, memory technology, and program memory reference behavior.

Well-behaved real-world programs in a typical single-threaded SoC processor/ memory environment might be stalled as little as 30% of the time at 500MHz, but less cache-friendly codes may be stalled a whopping 75% of the time in the same environment. Systems where the speeds of processor and memory are so well matched that there is no loss of efficiency due to latency will not get any significant bandwidth improvement from multithreading.

Going Beyond Multi-Core
The additional resources of a multithreaded processor can be used for other things than simply recovering lost bandwidth, if the multithreading architecture provides for it. A multithreaded processor can thus have capabilities that have no equivalent in a multi-core system based on conventional processors.

For example, in a conventional processor, when an external interrupt event needs to be serviced, the processor takes an interrupt exception, where instruction fetch and execution suddenly restarts at an exception vector. Interrupt vector code must save the current program state before invoking the interrupt service code, and must restore the program context before returning from the exception.

A multithreaded processor, by definition, can switch between two program contexts in hardware, without the need for decoding an exception or saving/restoring state in software. A multithreaded architecture targeted for real-time applications can potentially exploit this and allow for threads of execution to be suspended, then unblocked directly by external signals to the core, providing for zero-latency handling of interrupt events.

Multithreaded, Multi-core: The Best of Both Worlds
Arguably, from the standpoint of area and energy efficiency, the optimal SoC processor solution would be to use multithreaded cores as basic processing elements, and to replicate them in a multi-core configuration if the application demands more performance than a single core can provide.

Kevin D. Kissell is Principal Architect, MIPS Technologies Inc.

To learn more about thi topic, go to More about multicores and multithreading.

1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :