Home Fruit trees Processors, cores and threads. System topology. Topology definition programmatically

Processors, cores and threads. System topology. Topology definition programmatically

But with the conquest of new peaks of frequency indicators, it became more difficult to increase it, since this affected the increase in TDP of processors. Therefore, developers began to grow processors in width, namely to add cores, and this is how the concept of multi-core emerged.

Just 6-7 years ago, there was practically no mention of the multi-core processors. No, multi-core processors from the same IBM company existed before, but the appearance of the first dual-core processor for desktop computers, took place only in 2005, and this processor was called Pentium D. Also, in 2005 a dual-core Opteron from AMD was released, but for server systems.

In this article, we will not delve into historical facts in detail, but we will discuss modern multi-core processors as one of the characteristics of a CPU. And most importantly, we need to figure out what this multicore gives in terms of performance for the processor and for you and me.

Increased performance due to multi-core

The principle of increasing processor performance at the expense of multiple cores is to split the execution of threads (various tasks) into several cores. To summarize, we can say that almost every process running on your system has multiple threads.

Let me make a reservation right away that the operating system can virtually create a multitude of threads for itself and perform it all as if simultaneously, even if the processor is physically single-core. This principle implements the very Windows multitasking (for example, listening to music and typing at the same time).


Take antivirus software for example. One stream will scan the computer, the other will update the anti-virus database (we have simplified everything very much in order to understand the general concept).

And consider what will happen in two different cases:

a) The processor is single-core. Since two threads are executed at the same time, we need to create for the user (visually) this very simultaneity of execution. The operating system does cleverly:there is a switch between the execution of these two threads (these switches are instantaneous and the time goes in milliseconds). That is, the system "performed" the update a little, then abruptly switched to scanning, then back to the update. Thus, for you and me, the impression is that these two tasks are being performed simultaneously. But what is lost? Performance, of course. So let's look at the second option.

b) Multi-core processor. In this case, this switch will not occur. The system will clearly send each thread to a separate core, which, as a result, will allow us to get rid of the destructive performance switching from thread to thread (we idealize the situation). Two threads run at the same time, this is the principle of multicore and multithreading. In the end, we will be much faster performing scans and updates on a multi-core processor than on a single-core one. But there is a catch - not all programs support multicore. Not every program can be optimized this way. And everything is far from being as perfect as we described. But every day, developers create more and more programs that have perfectly optimized code for execution on multi-core processors.

Do you need multi-core processors? Everyday Reason

At processor selection for a computer (namely, when thinking about the number of cores), you should determine the main types of tasks that it will perform.

To improve your knowledge in the field of computer hardware, you can read the material about processor sockets .

The starting point can be called dual-core processors, since there is no point in going back to single-core solutions. But dual-core processors are also different. It may not be the "freshest" Celeron, but it may be Core i3 on Ivy Bridge, just like AMD - Sempron or Phenom II. Naturally, due to other indicators, their performance will be very different, so you need to look at everything comprehensively and compare multicore with others. characteristics of processors.

For example, Core i3 on Ivy Bridge has Hyper-Treading technology, which allows processing 4 threads simultaneously (the operating system sees 4 logical cores instead of 2 physical ones). And the same Celeron does not boast of such a thing.

But let's return directly to thinking about the required tasks. If a computer is necessary for office work and surfing the Internet, then a dual-core processor is enough for it.

When it comes to gaming performance, 4 or more cores are needed to feel comfortable in most games. But here the same snag comes up: not all games have optimized code for 4-core processors, and if they are optimized, then not as efficiently as we would like. But, in principle, for games now the optimal solution is precisely the 4th core processor.


Today, the same 8-core AMD processors are redundant for games, the number of cores is redundant, but the performance does not hold out, but they have other advantages. These very 8 cores will greatly help in tasks where powerful work with high-quality multi-threaded work is required. These include, for example, rendering (rendering) video, or server-side computing. Therefore, such tasks require 6, 8 or more cores. And in the near future, games will be able to load 8 or more cores with high quality, so in the future, everything is very rosy.

Do not forget that there are a lot of tasks that create a single-threaded load. And it's worth asking yourself a question: do I need this 8-core processor or not?

Summing up small results, I would like to note once again that the advantages of multicore appear in "weighty" computational multithreaded work. And if you do not play games with exorbitant requirements and do not do specific types of work that require good computing power, then there is simply no point in spending money on expensive multi-core processors (

* always topical questions, what you should pay attention to when choosing a processor, so as not to be mistaken.

Our goal in this article is to describe all the factors that affect processor performance and other operational characteristics.

Surely it's not a secret for anyone that the processor is the main computing unit of a computer. You can even say - the most important part of the computer.

It is he who deals with the processing of almost all processes and tasks that occur in the computer.

Be it watching video, music, surfing the Internet, writing and reading in memory, processing 3D and video, games. And much more.

Therefore, to the choice C central NS processor should be taken very carefully. It may turn out that you have decided to install a powerful video card and a processor that does not match its level. In this case, the processor will not reveal the potential of the video card, which will slow down its operation. The processor will be fully loaded and literally boil, and the video card will wait for its turn, working at 60-70% of its capabilities.

That is why, when choosing a balanced computer, not costs neglect processor in favor of a powerful video card. The processor power must be sufficient to unleash the potential of the video card, otherwise it’s just wasted money.

Intel vs. AMD

* catch up forever

Corporation Intel, has huge human resources, and almost inexhaustible finances. Many innovations in the semiconductor industry and new technologies come from this company. Processors and developments Intel, on average 1-1,5 years ahead of engineers' developments AMD... But as you know, you have to pay for the opportunity to have the most modern technologies.

Processor Pricing Policy Intel, is based on both number of cores, amount of cache but also on "Freshness" of architecture, performance per cyclewatt,chip technology... The meaning of the cache memory, the "subtleties of the technical process" and other important characteristics of the processor will be discussed below. For the possession of such technologies as a free frequency multiplier, you will also have to pay an additional amount.

Company AMD, unlike the company Intel, strives for the availability of its processors for the end user and for a competent pricing policy.

You could even say that AMD– « People's brand". In its price tags you will find what you need at a very attractive price. Usually a year after a new technology is available to the company Intel, there is an analogue of technology from AMD... If you are not chasing the highest performance and pay more attention to the price tag than to the availability of advanced technologies, then the company's products AMD- just for you.

Price policy AMD, is more based on the number of cores and quite a bit - on the amount of cache memory, the presence of architectural improvements. In some cases, for the ability to have a third-level cache memory, you will have to pay a little extra ( Phenom has a cache memory of 3 levels, Athlon content with only limited, 2 levels). But sometimes AMD Pampers his fans the ability to unblock cheaper processors to more expensive ones. You can unlock kernels or cache memory. Improve Athlon before Phenom... This is possible due to the modular architecture and the lack of some cheaper models, AMD just disables some on-chip blocks of more expensive ones (programmatically).

Kernels- remain practically unchanged, only their number differs (valid for processors 2006-2011 years). Due to the modularity of its processors, the company does an excellent job of selling rejected chips, which, when some blocks are turned off, become a processor from a less productive line.

The company has been working on a completely new architecture for many years under the codename Bulldozer, but at the time of entering 2011 year, the new processors showed not the best performance. AMD sinned on operating systems that they did not understand the architectural features of dual cores and "other multithreading."

According to company representatives, special fixes and patches should be expected to experience the full performance of these processors. However, at the beginning 2012 years, company representatives postponed the release of the update to support the architecture Bulldozer for the second half of the year.

Processor frequency, number of cores, multithreading.

In times Pentium 4 and before him - CPU frequency was the main factor in processor performance when choosing a processor.

This is not surprising, because processor architectures were specially designed to achieve a high frequency, this was especially strongly reflected in the processor. Pentium 4 on architecture NetBurst... The high frequency was not effective with the long pipeline used in the architecture. Even Athlon XP frequency 2GHz, in terms of performance, was higher than Pentium 4 c 2.4GHz... So it was pure marketing. After this error, the company Intel realized my mistakes and returned to the good side I started working not on the frequency component, but on the performance per clock cycle. From architecture NetBurst had to be abandoned.

What we gives multicore?

Quad-core processor with frequency 2.4 GHz, in multi-threaded applications, will theoretically be the approximate equivalent of a single-core processor with a frequency 9.6GHz or a 2-core processor with a frequency 4.8 GHz... But that's only in theory. Practically however, two dual-core processors in a two-socket motherboard will be faster than one 4-core processor at the same operating frequency. Bus speed and memory latency limitations are evident.

* subject to the same architectures and the amount of cache memory

Multi-core, makes it possible to execute instructions and calculations in parts. For example, you need to perform three arithmetic operations. The first two are executed on each of the processor cores and the results are added to the cache memory, where the next action can be performed on them by any of the free cores. The system is very flexible, but without proper optimization it may not work. Therefore, it is very important to optimize for multicore for the architecture of processors in the OS environment.

Apps that "love" and use multithreading: archivers, video players and encoders, antiviruses, defragmenter programs, graphic editor, browsers, Flash.

Also, to the "lovers" of multithreading, you can refer such operating systems as Windows 7 and Windows Vista, as well as many OS kernel based Linux that run noticeably faster with a multi-core processor.

Most games, a 2-core processor at a high frequency is quite enough. Now, however, more and more games are released, "sharpened" for multithreading. Take at least such SandBox games like Gta 4 or Prototype, in which on a 2-core processor with a frequency lower 2.6 GHz- you don't feel comfortable, the frame rate falls below 30 frames per second. Although in this case, most likely the reason for such incidents is the "weak" optimization of games, lack of time or "indirect" hands of those who transferred games from consoles to PC.

When buying a new processor for games, now you should pay attention to processors with 4 or more cores. But still, you shouldn't neglect 2-core processors from the "upper category". In some games, these processors sometimes feel better than some multi-core ones.

Processor cache memory.

- This is a dedicated area of ​​the processor crystal, in which intermediate data between processor cores, RAM and other buses are processed and stored.

It operates at a very high clock speed (usually at the frequency of the processor itself), has a very high bandwidth and processor cores work directly with it ( L1).

Because of her shortages, the processor can be idle in time-consuming tasks, waiting for new data to arrive in the cache for processing. Also cache memory serves for Records of frequently repeated data, which, if necessary, can be quickly restored without unnecessary calculations, without forcing the processor to waste time on them again.

Performance is also added by the fact that the cache memory is unified, and all cores can equally use the data from it. This provides additional opportunities for multi-threaded optimization.

This technique is now used for 3rd level cache... Processors Intel there were processors with shared L2 cache ( C2D E 7 ***,E 8 ***), thanks to which this method appeared to increase multi-threaded performance.

When overclocking the processor, the cache memory can become a weak point, preventing the processor from overclocking more than its maximum operating frequency without errors. However, the upside is that it will run at the same frequency as the overclocked processor.

In general, the more cache memory, the faster CPU. In which applications?

All applications where a lot of floating point data, instructions, and streams are used, the cache memory is actively used. They love cache memory very much archivers, video encoders, antiviruses and graphic editor etc.

Favorable to a large amount of cache memory include games... Especially strategies, auto-simulations, RPGs, SandBox and all games where there are many small details, particles, geometry elements, information flows and physical effects.

Cache memory plays a very significant role in unlocking the potential of systems with 2 or more video cards. After all, some share of the load falls on the interaction of the processor cores both with each other and for working with streams of several video chips. It is in this case that the organization of the cache memory is important, and the large-volume L3 cache is very useful.

Cache memory, always equipped with protection against possible errors ( ECC), upon detection of which, they are corrected. This is very important, because a small error in the memory cache, during processing, can turn into a gigantic, continuous error, from which the entire system will fall.

Proprietary technologies.

(hyper-threading, Ht)–

for the first time the technology was applied in processors Pentium 4, but it did not always work correctly and often slowed down the processor more than accelerated it. The reason was a too long pipeline and an incomplete branch prediction system. Applied by the company Intel, there are no analogues of the technology yet, except for an analogue then? what the company's engineers have implemented AMD in architecture Bulldozer.

The principle of the system is such that for each physical core, a two computational threads instead of one. That is, if you have a 4-core processor with Ht (Core i 7), then you have virtual threads 8 .

The performance gain is achieved due to the fact that data can enter the pipeline already in the middle of it, and not necessarily first. If any processor units capable of performing this action are idle, they receive a task for execution. The performance gain is not the same as in real physical cores, but comparable (~ 50-75%, depending on the kind of application). It rarely happens that in some applications, HT negatively affects on performance. This is due to poor optimization of applications for this technology, the inability to understand that there are "virtual" flows and the absence of limiters for the load of flows evenly.

TurboBoost Is a very useful technology that increases the frequency of operation of the most used processor cores, depending on the level of their workload. It is very useful when the application does not know how to use all 4 cores, and loads only one or two, while their operating frequency increases, which partially compensates for the performance. An analogue of this technology for the company AMD is the technology Turbo Core.

, 3 dnow! instructions... Designed to speed up the processor in multimedia calculations (video, music, 2D / 3D graphics, etc.), as well as speed up the work of such programs as archivers, programs for working with images and video (with the support of instructions by these programs).

3dnow! - quite old technology AMD, which contains additional instructions for handling multimedia content, in addition to SSE first version.

* Namely, the ability to stream processing of single precision real numbers.

The presence of the newest version is a big plus, the processor begins to more efficiently perform certain tasks with proper software optimization. Processors AMD have similar names, but slightly different.

* Example - SSE 4.1 (Intel) - SSE 4A (AMD).

In addition, these instruction sets are not identical. These are analogs in which there are slight differences.

Cool'n'Quiet, SpeedStep, CoolCore, Enchanced Half State (C1E) andT... d.

These technologies, at low load, reduce the processor frequency by reducing the multiplier and voltage on the core, disabling part of the cache, etc. This allows the processor to heat up much less and consume less power and make less noise. If power is needed, the processor will return to its normal state in a split second. At standard settings Bios almost always enabled, if desired, they can be disabled to reduce possible "freezes" when switching in 3D games.

Some of these technologies control the speed of the fans in the system. For example, if the processor does not need increased heat dissipation and is not loaded, the processor fan speed decreases ( AMD Cool'n'Quiet, Intel Speed ​​Step).

Intel Virtualization Technology and AMD Virtualization.

These hardware technologies allow using special programs to run several operating systems at once, without any significant loss in performance. Also, it is used for the correct operation of servers, because often, more than one OS is installed on them.

Execute Disable Bit and# eXecute Bit technology designed to protect your computer from virus attacks and software errors that can cause a system crash by buffer overflow.

Intel 64 , AMD 64 , EM 64 T - this technology allows the processor to work both in an OS with a 32-bit architecture and in an OS with a 64-bit one. System 64 bit- from the point of view of benefits, for an ordinary user it differs in that this system can use more than 3.25GB of RAM. On 32-bit systems, use b O Larger amount of RAM is not possible due to the limited amount of addressable memory *.

Most applications with 32-bit architecture can be run on a system with 64-bit OS.

* What can you do if back in 1985, no one could even think of such gigantic, by the standards of that time, volumes of RAM.

Additionally.

A few words about.

This point is worth paying close attention to. The thinner the technical process, the less the processor consumes energy and, as a result, it heats up less. And among other things, it has a higher margin of safety for overclocking.

The finer the technical process, the more you can "wrap" in a chip (and not only) and increase the processor's capabilities. At the same time, heat dissipation and power consumption are also reduced proportionally, due to lower current losses and a decrease in the core area. You can notice a tendency that with each new generation of the same architecture on a new technical process, energy consumption also grows, but this is not so. It's just that manufacturers are moving towards even greater performance and step over the heat dissipation line of the previous generation of processors due to the increase in the number of transistors, which is not proportional to the decrease in the technical process.

Built into the processor.

If you don't need an integrated video core, then you shouldn't buy a processor with it. You will only get worse heat dissipation, excess heat (not always), worse overclocking potential (not always), and overpaid money.

In addition, those kernels that are built into the processor are suitable only for loading the OS, surfing the Internet and watching videos (and even then not of any quality).

Market trends are still changing and the opportunity to buy a productive processor from Intel without a video kernel, it drops out less and less. The policy of forcing the imposition of an embedded video core, appeared with processors Intel codenamed Sandy bridge, the main innovation of which was the embedded kernel based on the same technical process. The video core is located jointly with processor on one crystal, and not as simple as in previous generations of processors Intel... For those who do not use it, there are disadvantages in the form of some overpayment for the processor, the displacement of the heating source relative to the center of the heat distribution cover. However, there are also pluses. Disabled video core, can be used for very fast video encoding using technology Quick Sync coupled with special software supporting this technology. In future, Intel promises to expand the horizons of using the embedded video core for parallel computing.

Processor sockets. Platform lifespan.


Intel has a rough policy for their platforms. The lifespan of each (the period for the beginning and end of sales of processors for it), usually does not exceed 1.5 - 2 years. In addition, the company has several platforms developing in parallel.

Company AMD, has the opposite policy of compatibility. On her platform on AM 3, will fit all future generations of processors that support DDR3... Even when the platform exits to AM 3+ and later, either new processors will be released separately for AM 3, or the new processors will be compatible with old motherboards, and it will be possible to make a painless upgrade for the wallet by changing only the processor (without changing the motherboard, RAM, etc.) and flashing the motherboard. The only nuances of incompatibility may be when changing the type, since a different memory controller built into the processor will be required. So the compatibility is limited and not supported by all motherboards. But in general, for an economical user or those who are not used to changing the platform completely every 2 years - the choice of the processor manufacturer is clear - this is AMD.

Cooling the processor.

As standard, the processor comes with BOX- a new cooler that will simply cope with its task. It is a piece of aluminum with a not very high dispersion area. Efficient coolers based on heatpipes and fixed fins are designed for highly efficient heat dissipation. If you do not want to hear unnecessary noise from the fan, then you should buy an alternative, more efficient cooler with heat pipes, or a closed-loop or open-loop liquid cooling system. Such cooling systems will additionally give the possibility of overclocking for the processor.

Conclusion.

All important aspects affecting processor performance and performance have been considered. Let's repeat what you should pay attention to:

  • Select manufacturer
  • Processor architecture
  • Technical process
  • CPU frequency
  • Number of processor cores
  • Processor cache size and type
  • Technology and instruction support
  • Quality cooling

We hope this material will help you understand and decide on the choice of a processor that meets your expectations.

You won't surprise anyone with multi-core processors nowadays. On the contrary, everyone is trying to make their computer support as many cores as possible, and therefore work faster, and this is correct.
As far as processors are concerned, for a long time there have been only two manufacturers on the market - these are Intel and AMD. And if the latter talk about their 8 and 10-core processors (meaning that there are a lot of them, which means they are more powerful), then the first have 2 and 4 cores, but they focus on their threads (no need to write angry comments that there are cores and more since hereinafter, processors for home use are described).

And if you look at the comparative graphs of processor performance, you can see that the 4-core processor (not all) from Intel will outperform the 8-core from AMD. Why is it so? After all, 4 is less than 8, which means it should be weaker ... But if you dig deeper (not directly to caches, frequency, bus, etc.), you can see one interesting word that Intel processors often describe - Hyper-threading support.

Hyper-threading technology ("hyper-threading" in the common people) was invented by Intel and is used only in their processors (not in all). I will not go into its details very deeply, if you want, you can read about it at. This technology allows, as it were, to divide each core in two, and as a result, instead of one physical, we have two logical (or virtual) ones and the Windows operating system thinks that two are installed instead of one.

How to find out how many threads are in the processor?

If you want to know about a specific processor, then most often in the descriptions in the stores they indicate support for Hyper-threading, either by inserting this phrase, or simply the HT abbreviation. If there is no such description, then you can always use the most truthful information on the official Intel page http://ark.intel.com/ru/search/advanced/?s=t&HyperThreading=true
I recommend using only this information because it is the most accurate.

If you want to find out already being in the system and these very streams are specifically used in your system, then there is nothing easier.

Launch in any convenient way (the easiest way is to use the shortcut Ctrl + Shift + Esc) from anywhere (even while reading this article) and, if you have Windows 7, go to the Performance tab.


Pay attention to the top line with the processor load and specifically the number of "squares". That's just how many of them there will be - there will be so many of all cores, including all threads. More precisely, all logical / virtual cores are displayed here, and threads are exactly what they are.

If you have Windows 8, 8.1 or 10, then there will be no such tab, but there is Performance.


Here I have highlighted where you need to pay attention. By the way, it was not for nothing that I right-clicked on this graph, because if you select the Logical Processes item, the graph will change and will look like the one in Windows 7, i.e. there will be 8 "squares" and load graphs for each core.
If you have the opposite picture, i.e. not one, but several charts are displayed, which means that this item is selected in the properties of the chart itself.

Of course, there are several more ways of doing, and in this case, streams.

For example, you can call a property of the system (keyboard shortcut Win + R and enter systeminfo) and see it there.

Having dealt with the theory of multithreading, let us consider a practical example - Pentium 4. Already at the stage of development of this processor, Intel engineers continued to work on increasing its performance without introducing changes to the programming interface. Five simplest ways were considered:
1. Increase the clock frequency.
2. Placing two processors on one microcircuit.
3. Introduction of new functional blocks.
1. Extension of the conveyor.
2. Using multithreading.
The most obvious way to improve performance is to increase the clock speed without changing other parameters. As a rule, each subsequent processor model has a slightly higher clock speed than the previous one. Unfortunately, with a straight-line increase in clock speed, developers are faced with two problems: an increase in power consumption (which is important for laptop computers and other computing devices running on batteries) and overheating (which requires the creation of more efficient heat sinks).
The second method - placing two processors on a microcircuit - is relatively simple, but it involves doubling the area occupied by the microcircuit. If each processor is supplied with its own cache memory, the number of chips on a platter is halved, but this also means a doubling of production costs. By providing a shared cache for both processors, a significant increase in footprint can be avoided, but another problem arises - the amount of cache per processor is halved, and this inevitably affects performance. In addition, while professional server applications are able to fully utilize the resources of multiple processors, in ordinary desktop programs, internal parallelism is much less developed.
The introduction of new functional blocks is also not difficult, but it is important to strike a balance here. What's the point in a dozen ALU blocks if the microcircuit cannot issue commands to the conveyor at such a speed that it can load all these blocks?
A conveyor with an increased number of rungs, capable of dividing tasks into smaller segments and processing them in short periods of time, on the one hand, increases productivity, on the other hand, increases the negative consequences of mispredicted transitions, cache misses, interruptions and other events that disrupt the normal flow processing commands in the processor. In addition, to fully realize the capabilities of the extended pipeline, it is necessary to increase the clock frequency, and this, as we know, leads to increased power consumption and heat dissipation.
Finally, you can implement multithreading. The advantage of this technology is that it introduces an additional program stream to bring in hardware resources that would otherwise be idle. Based on the results of experimental studies, Intel developers found that a 5% increase in chip area when implementing multithreading for many applications gives a performance gain of 25%. The first Intel processor to support multithreading was the 2002 Xeon. Subsequently, starting at 3.06 GHz, multithreading was introduced into the Pentium 4 line. Intel calls the implementation of multithreading in the Pentium 4 hyperthreading.
The basic principle of hyper-threading is the simultaneous execution of two software threads (or processes - the processor does not distinguish between processes and software threads). The operating system views the Pentium 4 hyper-threaded processor as a dual-processor complex with shared caches and main memory. The operating system performs scheduling for each program thread separately. Thus, two applications can run at the same time. For example, a mailer can send or receive messages in the background while the user interacts with an interactive application — that is, the daemon and the user program run concurrently, as if the system had two processors available.
Application programs that can be executed as multiple program threads can use both "virtual processors". For example, video editing programs usually allow users to apply filters to all frames. These filters adjust brightness, contrast, color balance and other properties of the frames. In such a situation, the program can assign one virtual processor to process even frames, and another to process odd frames. In this case, the two processors will work completely independently of each other.
Since the software threads access the same hardware resources, coordination of these threads is necessary. In the context of hyperthreading, Intel has identified four useful strategies for managing resource sharing: resource duplication, and hard, threshold, and full resource sharing. Let's take a look at these strategies.
Let's start with resource duplication. As you know, some resources are duplicated for the purpose of organizing program flows. For example, since each program thread requires individual control, a second instruction counter is needed. In addition, it is necessary to enter a second table for mapping architectural registers (EAX, EBX, etc.) to physical registers; Likewise, the interrupt controller is duplicated, since interrupt handling for each thread is done individually.
What follows is a technique for partitioned resource sharing between program streams. For example, if the processor provides a queue between two functional stages of the pipeline, then half of the slots can be given to thread 1, the other half to thread 2. Sharing of resources is easy to implement, does not lead to imbalance and ensures complete independence of program threads from each other. With the complete sharing of all resources, one processor actually turns into two. On the other hand, a situation may arise in which one program thread does not use resources that could be useful to the second thread, but for which it does not have access authority. As a result, resources that could otherwise be used are idle.
The opposite of hard sharing is full resource sharing. In this scheme, any program thread can access the required resources, and they are serviced in the order in which access requests are received. Consider a situation in which a fast stream, consisting primarily of addition and subtraction operations, coexists with a slow stream that implements multiplication and division operations. If instructions are called from memory faster than the multiplication and division operations are performed, the number of instructions called within the slow thread and queued to the pipeline will gradually grow. Eventually, these commands will fill the queue, as a result, the fast flow will stop due to lack of space. Complete resource sharing solves the problem of non-optimal use of common resources, but creates an imbalance in their consumption - one thread can slow down or stop another.
The intermediate scheme is implemented within the framework of threshold resource sharing. According to this scheme, any program thread can dynamically acquire a certain (limited) amount of resources. When applied to replicated resources, this approach provides flexibility without the threat of downtime for one of the program threads due to inability to obtain resources. If, for example, you forbid each of the threads to occupy more than 3/4 of the command queue, the increased resource consumption of a slow thread will not interfere with the execution of a fast one.
The Pentium 4 hyper-threading model combines different resource sharing strategies. Thus, an attempt is made to solve all the problems associated with each strategy. Duplication is implemented in relation to resources that are constantly required by both program threads (in particular, in relation to the instruction counter, the register mapping table and the interrupt controller). Duplication of these resources increases the area of ​​the microcircuit by only 5% - agree, quite a reasonable payment for multithreading. Resources that are available in such a volume that it is virtually impossible for them to be captured by one thread (for example, cache lines) are allocated dynamically. Access to the resources that control the operation of the pipeline (in particular, its numerous queues) is divided - half of the slots are assigned to each program thread. The main pipeline of the Pentium 4 Netburst architecture is shown in Fig. 8.7; the white and gray areas in this illustration represent the resource allocation mechanism between the white and gray program streams.
As you can see, all the queues in this illustration are separated - each program thread is allocated half of the slots. Neither thread can restrict the work of the other. The distribution and substitution block is also split. The scheduler resources are shared dynamically, but based on some threshold, so no thread can occupy all the slots in the queue. For all other stages of the conveyor, complete separation takes place.
However, multithreading is not so simple. Even this progressive technique has drawbacks. Rigid resource sharing does not come with significant overhead, but dynamic partitioning, especially with regard to thresholds, requires tracking resource consumption at runtime. In addition, in some cases, programs perform significantly better without multithreading than with it. Suppose, for example, that if you have two threads, they each require 3/4 of the cache to function properly. If they were executed in turn, each would show sufficient efficiency with a small number of cache misses (as you know, associated with additional costs). In the case of parallel execution, each would have significantly more cache misses, and the end result would be worse than without multithreading.
More information about the multithreading mechanism of RepPit 4 can be found in.

New on the site

>

Most popular