![]() |
|||||||||||||||||||||
|
|||||||||||||||||||||
| May 14, 2008 ___________________________________________ Understanding Critical Section Deadlocks in Windows CE 6 Have you ever wondered why the Windows CE OS can sometimes hang and no longer be able to respond to user events such as touching the touch screen or typing on the keyboard? One of the more common causes of this problem is due to the fact that most all of the components in the OS use objects that are called critical sections. Critical sections are thread synchronization objects that can be used in a single process to ensure that only one thread can enter a region code at any one given time. When these critical sections are not implemented correctly, the system can hang in a critical section deadlock. Critical Sections are used in many areas of the operating system. In drivers, you would use critical sections to protect access to the registers of the hardware to ensure that the accesses are atomic. In GWES the operating system uses critical sections to make sure that graphics operations occur in the correct order. These are just a few examples of how the operating system uses critical sections. Critical sections are great when they are used properly, but there are many cases where they can add inter-thread complexities and they can be difficult to diagnose. Let’s first talk about how to use critical sections correctly and then I will show you how to track down some of the cases where critical sections can cause what are called “deadlocks”. When a thread wants to protect a region of code from other threads entering that same code it will call EnterCriticalSection(). Any other thread that reaches this point in the code will be blocked until the first thread calls LeaveCriticalSection(). There are two important things to remember about this; always use LeaveCriticalSection() in every branch of your code and if you call EnterCriticalSection() on the same critical section more than once you need to call LeaveCriticalSection() as many times as you called EnterCrticalSection() for that thread. If you don’t follow these rules you will create a simple deadlock situation. This seems simple enough when you have one simple critical section, but the more complex and common problem becomes apparent when you have multiple threads and multiple critical sections that you are trying to manage in the operating system. Let’s take for example the Graphical Windowing Event Subsystem or GWES. GWES is a component that is connected to a number of lower layer components such as the touch screen driver, the display driver, and the keyboard code translation layer/keyboard driver. Many of these components have their own critical sections. If a thread of GWES enters its own critical section and then passes control to the keyboard driver and the keyboard driver is waiting on a critical section that needs the original GWES critical section to be released, you will create a complex critical section deadlock. This can happen more often than you realize since a number of these critical sections can be interrelated in a complex operating system such as Windows CE. Now that I have described an example of a complex critical section deadlock, let’s talk about how to locate the problem thread and resolve these issues. Let’s say that you have access to a KITL based debugger on your device and you are able to get the device in a hang situation. How can we find out if this is truly a critical section deadlock? Of course, there are tools that can search for this automatically, but I think that it is important to understand how to do this manually so you can understand the information that the tool is providing you. The first step is to look at all of the call stacks for all of the threads in the operating system. Most all of the threads should be either waiting for multiple objects or sleeping. Once you take a look at the call stacks you will begin to see a pattern. The type of thread call stack that you are looking for has CRITEnter() as its topmost entry. This is the lowest layer in the kernel and tells you that that thread is waiting on a critical section. Once you find this thread, you’ll need to see which thread owns this critical section. This can be done by looking at the hOwner element in the critical section structure. This value will be the handle to the thread that needs to call LeaveCriticalSection() and has not yet done so. Now you can use the debugger to find the thread with that thread handle and take a look at the call stack to this thread. If it is waiting on another critical section you need to repeat this process for the owner thread and see who owns this other critical section. If it is waiting on multiple objects by having DoWaitForObjects() as the thread’s call stack topmost entry, you need to verify that it is necessary to call a WaitforMultipleObjects() for WaitForSingleObject() inside a critical section. This would normally be a very bad idea. If the thread is just spinning you need to evaluate why LeaveCriticalSection() has not been called or why it is spinning. So the bottom line is to find the thread that is waiting on a critical section, and then find the owner of the critical section and then find out why that thread has not called LeaveCriticalSection(). If you can find this answer, your deadlock can be fixed. . . . . . . . . . . . . . . . . . . . . . . . |
|||||||||||||||||||||