By Matt GillespieBuildingmulti-threading into .NET* applications is a powerful tool for thedeveloper. It empowers applications that demand high scalability, suchas enterprise applications, as well as desktop applications that needto process more than one task at the same time. In order to enablemulti-threading technology, Intel developed Hyper-Threading Technology(HT Technology) for the Pentium® 4 processor on the desktop and theIntel® Xeon™ processor for workstations and servers. These platformsmake a particularly good choice for the deployment of threaded .NETsolutions.
The coming trend of parallelism in hardware is thatof multi-core processors, which will make multi-threading even moreimportant. Intel has announced planes to introduce the first versionsof both the Itanium® processor and the Pentium 4 processor that havemultiple processor cores on a single die in 2005. These introductionswill be followed in 2006 by the first multi-core Intel Xeon processorsand Intel Xeon processors MP in 2006. As these and other advances makehardware platforms ever more parallel in nature, the stage is set forsoftware threading to deliver ever-larger performance gains. Softwaremakers that gear up for this advance now, stand to gain a competitiveadvantage as this trend continues.
This article providesdecision makers and developers with information about the ways in whichthreading improves the performance of .NET applications, as well as howthreaded .NET applications take advantage of the hardware features ofIntel® architecture. It also gives examples of threading pitfalls in.NET and ways of resolving them. The article closes with a briefintroduction to threading-related Intel® software-development tools forthe .NET developer.
Using Threads to Increase Performance
Operatingsystems can allocate processor resources to individual threads, each ofwhich executes application code. Those threads are each assigned slicesof processor time, and they take turns with the other threads, inround-robin fashion, to carry out useful work for the user. Thiscapability enables a number of performance advantages, including theability to prioritize work and to allow background tasks to take place,for instance, while a user interface is idle.
Each thread mustmaintain a certain overhead associated with its context relative to thelarger application, such as its priority and pointers to associate itwith its host process and its set of CPU registers. Processor resourcesare also required to suspend and resume thread execution, so there isan inherent tradeoff between the advantages and the overhead associatedwith creating additional threads. It is also vital to avoid conflictsbetween threads for the same piece of data.
Since the slices ofprocessor time allocated to each thread are very small, the operatingsystem's rapid switching between threads creates the effect ofsimultaneous execution on a single processor, although in reality,instructions are being retired on only one thread at any given time.Once the executing thread's allocated time slice expires, the operatingsystem stores the context information of that thread and suspendsexecution. It then reloads the context information for the next thread,and resumes execution on that next thread.
HT Technology, whichis supported by Intel Xeon processors and many Pentium 4 processors,enables a single physical processor to expose two logical processors tothe operating system. Thus, when a thread must wait for data to befetched from memory, the second logical processor is able to continuedoing useful work. This capability builds further upon the performanceincreases that are available through threading, by allowing two threadsto run concurrently on each physical processor.
A detailedinquiry into threading best practices is beyond the scope of thisarticle, although ample resources are available from the
Intel® Software Network Threading Developer Center; the
Threading Knowledge Base is particularly useful for obtaining solutions to specific threading challenges quickly.
Threadingshould be used whenever you can make your program more responsive,efficient or usable by assigning specific functionality to a distinctthread. Of course, the corollary is to employ multi-threading only whenit is needed, as overuse can lead to development and performanceissues. A few examples of places where multi-threading would beadvantageous are explored below:
- Multiple threads can help improve user-interfaces responsiveness and foster efficient communication with other systems.By allowing a user interface to run in a separate processsimultaneously with background processing tasks, users are able tocontinue input tasks and other interactions, even while the applicationis performing heavy computations in the background.
- Multi-threading can reduce the performance impacts associated with long-running processes.Since operations that take a long time to complete might otherwiseblock other tasks from executing, generating multiple threads canprovide improved resource sharing. For example, if an inventoryapplication takes a long time to generate a weekly report based on acomplex database query, you would not want incoming orders in ane-commerce environment slowed down by that operation.
- By aiding in the prioritization of tasks, threaded code can ensure that time-critical tasks are handled appropriately.Assigning different threads appropriate priority levels can help toensure that tasks that require low latencies are not slowed down bylower-priority tasks. For instance, in a Voice over IP application,voice-decoding and encoding operations must run at a high priority toensure real-time fidelity, while other tasks such as repainting thescreen should run at a lower priority.
The .NET Common Language Runtime (CLR) provides rich threading support, including excellent support for HT Technology.
How Intel® Architecture Benefits Multi-Threaded .NET Applications
BothIntel Xeon processors and Pentium 4 processors provide a solidfoundation upon which to deploy .NET solutions. The .NET Framework isoptimized for both platforms, enabled by a strong tradition ofcollaboration between
Microsoft* and Intel thatensures the strength of ongoing efforts on the parts of both parties tomake sure that the products of each company takes best advantage of theother's. Developers can therefore have a very high degree of confidencein the correctness and optimization of the threading implementation ofthe .NET Framework for Intel architecture.
One strong example ofthe benefits of that collaboration is the .NET platform's richoptimization for HT Technology, which allows the developer to takeadvantage of the technology with no specific changes to the code beyondsolid general threading fundamentals. This optimization at theFramework level clearly simplifies the developer's task in tuning anapplication's threading behavior.
Intel has announced thatfuture plans for both the Pentium 4 processor and the Intel Xeonprocessor include the introduction of multi-core versions. By placingmultiple processor cores on a single processor die, Intel expects togenerate extremely high performance by decreasing the latency incommunication between the cores, relative to completely separateprocessors. This increase in performance will add further benefit tothreaded .NET applications, beyond that which is available today.
Avoiding Deadlocks in Threaded .NET Applications
Oneof the most common pitfalls associated with threaded .NET code is thedeadlock. This situation occurs when each of the two threads is waitingfor the other to complete some process in order to proceed. At the leftside of the following figure,
Thread X places an exclusive lock on
Resource A, while
Thread Y places an exclusive lock on
Resource B. At the right side of the figure, they have acquired those locks. Meanwhile,
Thread X has attempted to place an exclusive lock on
Resource B, and
Thread Y has attempted to place an exclusive lock on
Resource A. Since neither of those resources are available, if neither process can proceed without them, both threads are deadlocked.
Figure 1. Threads X and Y become deadlocked as each waits for the other to release a locked resource.Thisrepresentation is a highly simplified instance of the situation whereeach thread is waiting for the other to release a resource beforeproceeding. Neither thread can perform any work, and therefore, theprocessor resources that each of the threads has assigned to it remainidle for the duration of the deadlock. The .NET Framework provides anumber of useful methods that can be used to prevent deadlocks,including the
Monitor.TryEnter method, which is documented in the method's
MSDN .NET Framework Class Library entry*.
Briefly,
Monitor.TryEnterallows the developer to have the Framework try to acquire an exclusivelock on a specific resource for a specified period of time. The methodreturns a logical
True or
Falsevalue that indicates whether or not the exclusive lock was obtained.Using this technique, one can create a timeout that determines whetheracquiring an exclusive lock on a resource is possible, and if not,logic can be built into the code to work around that unavailability,rather than simply waiting idle until the resource becomes available.
Avoiding Race Conditions
Arace condition is another common threading pitfall. In thiscircumstance, multiple threads will each operate on a particular blockof code, but the overall outcome of the execution is dependent uponwhich thread reaches the block of code first. For example, consider thecase of two processes that each increment the value
X by 1, which consists of three distinct operations:
- Load the value of the variable X into a register.
- Add 1 to the value in the register.
- Write the contents of the register back into the variable X.
In this highly simplified example, it is clear that if the two processes each operate one after the other, the final value of
X after both processes complete will be equal to the initial value plus 2. Thus, if the initial value of
Xis 5, the first process will increment it to 6, and then the secondprocess will increment it to 7. Another possible outcome, however, isthat the first process could complete the first two steps outlinedabove, and then that process could be preempted by the second process.The second process would read the value of
X as 5, increment it to 6, and then write it back to the variable
X. Once the first process resumes, it would continue where it left off, writing the 6 back as the value of
X. Thus, in the first case, the code generates a final value of 7 for
X, and in the second case, the same code generates a final value of 6 for
X. Which result would be generated by any particular execution of the code could be unpredictable.
The .NET Framework provides the
Increment method (along with its companion
Decrement method) of the
Interlocked class to prevent this particular type of race condition. This class is documented in its
MSDN .NET Framework Class Library entry*.These methods perform the three steps required to increment ordecrement a variable as a single operation, preventing a threadperforming it from being preempted before it writes the value back tomemory. Related issues and other types of race conditions aredelineated in the
Synchronizing Data for Multithreading* section of the .NET Framework Developer's Guide.
Thegeneral best practice for developers with regard to potential raceconditions in threaded code is that one must always consider what wouldhappen if the thread executing a passage were preempted beforecompleting that execution. As the forgoing example illustrates, issuesof preemption can also occur when a thread is preempted between themachine-code operations that make up even a single line of higher-levelprogramming code.
Plug-Ins to Simplify Threading
Intelprovides the Intel® Thread Checker and Thread Profiler as plug-ins thatwork with the Intel® VTune™ Performance Analyzer environment to improvethe rate of success that developers enjoy in creating threaded code.Under .NET,
Intel Thread Checker integrateswith the Microsoft Visual Studio* .NET environment (including olderversions of the Visual Studio compilers) to identify threading errorssuch as deadlocks and race conditions in applications, supportingsource instrumentation and allowing Intel Thread Checker to drill downto the specific variable that caused the error. In fact, even if thesource code is not available, the tool supports binary instrumentationthat can identify the line of source code associated with the error,allowing the developer to identify the specific variable(s) throughexternal debugging methods.
Once the application isinstrumented, the developer runs it in conjunction with Intel ThreadChecker to monitor the application and generate diagnostic reports.Those reports give rich detail about each error, including theinteractions with other threads that may underlie them. Analysis ofthese reports helps developers to efficiently resolve the errors bymeans of on-board diagnostics, error detection and classification, andgraphical representations of errors that allow you to categorize andmanipulate errors, sorting them against each other and against specificportions of source code for dynamic analysis.
It is important toremember that, even in the absence of actual errors, incorrectthreading practices can dramatically reduce the performance of anapplication, relative to an unthreaded version. In addition toresolving threading runtime errors, therefore, it is also vital to findthose places in code where threading has created inefficiencies.Moreover, detecting and resolving those issues can be a very complexundertaking, and so it is very important to have the correct tools tohelp.
Intel Thread Profiler integrateswith the Visual Studio .NET environment and Intel VTune PerformanceAnalyzer to analyze the threading performance of applications in realtime during execution. Key aspects of threading behavior that thedeveloper will want to look at in order to improve performance includesynchronization/threading overhead and load balancing among threads.Thread Profiler is extremely helpful in identifying and resolving bothtypes of issues.
Thread Profiler performs critical-path analysisto generate runtime statistics that the developer can view in a numberof ways, organizing the data by thread or by region of code. Bydrilling down to specific bodies of performance data for a givenapplication, the developer is able to analyze the behavior of aspecific piece of code. In this manner, one can identify hotspots inserial regions, parallel regions, and critical sections, as well asfinding synchronization tasks that have a detrimental impact onperformance.
The Intel VTune analyzer environment equipped withThread Profiler provides specific, targeted tuning advice for improvingthe performance associated with each threading issue, such assynchronization delays, stalled threads, and excessive blocking time.It also provides analysis of the utilization of processor resources atspecific parts of program execution, since either under-utilization orover-utilization indicates an issue that should be addressed. Mostimportant, Thread Profiler provides intelligent analysis that enablesthe developer to determine which tuning tasks to prioritize, in orderto meet performance goals with the least amount of tuning effort.
Conclusion
Multi-threadingis a powerful means of getting highest performance out of enterpriseapplications, and its role will continue to grow as hardware platformscontinue to become more parallel in nature. The introduction ofmulti-core processors on all of Intel's desktop and server processorlines by 2006 shows an important aspect of where the industry isheading, and threading your applications now places you in a goodposition to take advantage of this trend as the industry moves forward.The next major phase of this advance begins in 2005 with the firstmulti-core Itanium processor and multi-core Pentium 4 processor, whichwill run the next release of many software products.
Both Intelarchitecture and the .NET Framework provide a number of technologiesthat enable users to put threading to work efficiently and safely.Nevertheless, the benefits of threading are accompanied by dangers, ifthe technologies are not implemented correctly, and so Intel hasprovided software-development tools that help to ensure optimalutilization of threading in your applications. Those tools can help youto manage the complexity of threading well, making it easier to achievesuccess.
Developers should take the time to become familiar withthe threading support that is built into the .NET Framework, in orderto build the most robust code possible. Familiarity with the hardwarefeatures such as HT Technology in the latest Intel processors alsohelps to support high threading performance, and there is no substitutefor completing your development environment with low-cost Intelthreading tools that will allow you to detect threading errors andsimplify the tuning process for threaded applications.