9.8 Stacks

近20年来,无论是开发人员还是系统管理员,如果想探究Windows核心部件的运作机理或者各种技术细节,都会求助于这部毋庸置疑的权威著作。《深入解析Windows 操作系统 (第5版英文版)》书中深入透彻地阐述了Windows底层的方方面面,包括系统架构,各种系统机制和管理机制,进程、线程和作业,安全,I/O系统,存储管理、内存管理和缓存管理,文件系统,联网,启动与停机,崩溃转储分析等内容,使Windows的内幕在你面前变得一目了然。本节选自第九章的内容。

作者:Mark Russinovich/David A. Solomon/Alex Ionescu来源:人民邮电出版社|2009-11-03 09:15

Whenever a thread runs, it must have access to a temporary storage location in which to store function parameters, local variables, and the return address after a function call. This part of memory is called a stack. On Windows, the memory manager provides two stacks for each thread, the user stack and the kernel stack, as well as per-processor stacks called DPC stacks. We have already described how the stack can be used to generate stack traces and how exceptions and interrupts store structures on the stack, and we have also talked about how system calls, traps, and interrupts cause the thread to switch from a user stack to its kernel stack. Now, we’ll look at some extra services the memory manager provides to efficiently use stack space.

User Stacks

When a thread is created, the memory manager automatically reserves a predetermined amount of memory, which by default is 1 MB. This amount can be configured in the call to the CreateThread or CreateRemoteThread function or when compiling the application, by using the /STACKRESERVE switch in the Microsoft C/C++ compiler, which will store the information in the image header. Although 1 MB is reserved, only the first 64 KB (unless the PE header of the image specifies otherwise) of the stack will be committed, along with a guard page. When a thread’s stack grows large enough to touch the guard page, an exception will occur, causing an attempt to allocate another guard. Through this mechanism, a user stack doesn’t immediately consume all 1 MB of committed memory but instead grows with demand. (However, it will never shrink back.)

EXPERIMENT: Creating the Maximum Number of Threads

With only 2 GB of user address space available to each 32-bit process, the relatively large memory that is reserved for each thread’s stack allows for an easy calculation of the maximum number of threads that a process can support: a little less than 2,048, for a total of nearly 2 GB of memory (unless the increaseuserva BCD option is used and the image is large address space aware). By forcing each new thread to use the smallest possible stack reservation size, 64 KB, the limit can grow to about 30,400 threads, which you can test for yourself by using the TestLimit utility from Sysinternals. Here is some sample output:

  1. C:\>testlimit -t  
  2. Testlimit - tests Windows limits  
  3. By Mark Russinovich  
  4. Creating threads...  
  5. Created 30399 threads. Lasterror: 8 

If you attempt this experiment on a 64-bit Windows installation (with 8 TB of user address space available), you would expect to see potentially hundreds of thousands of threads created (as long as sufficient memory were available). Interestingly, however, TestLimit will actually create fewer threads than on a 32-bit machine, which has to do with the fact that Testlimit.exe is a 32-bit application and thus runs under the Wow64 environment. (See Chapter 3 for more information on Wow64.) Each thread will therefore have not only its 32-bit Wow64 stack but also its 64-bit stack, thus consuming more than twice the memory, while still keeping only 2 GB of address space. To properly test the thread-creation limit on 64-bit Windows, use the Testlimit64.exe binary instead.

Note that you will need to terminate TestLimit with Process Explorer or Task Manager—using Ctrl+C to break the application will not function because this operation itself creates a new thread, which will not be possible once memory is exhausted.

Kernel Stacks

Although user stack sizes are typically 1 MB, the amount of memory dedicated to the kernel stack is significantly smaller: 12 KB, followed by another guard PTE (for a total of 16 KB of virtual address space). Code running in the kernel is expected to have less recursion than user code, as well as contain more efficient variable use and keep stack buffer sizes low. Additionally, because kernel stacks live in system address space, their memory usage has a bigger impact of the system: the 2,048 threads really consumed only 1 GB of pageable virtual memory due to their user stacks. On the other hand, they consumed 360 MB of actual physical memory with their kernel stacks.

Although kernel code is usually not recursive, interactions between graphics system calls handled by Win32k.sys and its subsequent callbacks into user mode can cause recursive re-entries in the kernel on the same kernel stack. As such, Windows provides a mechanism for dynamically expanding and shrinking the kernel stack from its initial size of 16 KB. As each additional graphics call is performed from the same thread, another 16-KB kernel stack is allocated (anywhere in system address space; the memory manager provides the ability to jump stacks when nearing the guard page). Whenever each call returns to the caller (unwinding), the memory manager frees the additional kernel stack that had been allocated, as shown in Figure 9-31.

This mechanism allows reliable support for recursive system calls, as well as efficient use of system address space, and is also provided for use by driver developers when performing recursive callouts through the KeExpandKernelStackAndCallout API, as necessary.

FIguRE 9-31 Kernel stack jumping

EXPERIMENT: Viewing Kernel Stack usage

You can use the MemInfo tool from Winsider Seminars & Solutions to display the physical memory currently being occupied by kernel stacks. The –u flag displays physical memory usage for each component, as shown here:

  1. C:\>MemInfo.exe -u | findstr /i "Kernel Stack" 
  2. Kernel Stack: 980 ( 3920 kb) 

Note the kernel stack after repeating the previous TestLimit experiment:

  1. C:\>MemInfo.exe -u | findstr /i "Kernel Stack" 
  2. Kernel Stack: 92169 ( 368676 kb) 

Running TestLimit a couple more times would easily exhaust physical memory on a 32-bit system, and this limitation results in one of the primary limits on systemwide 32-bit thread count.

DPC Stack

Finally, Windows keeps a per-processor DPC stack available for use by the system whenever DPCs are executing, an approach that isolates the DPC code from the current thread’s kernel stack (which is unrelated to the DPC’s actual operation because DPCs run in arbitrary thread context). The DPC stack is also configured as the initial stack for handling the SYSENTER or SYSCALL instruction during a system call. Because the CPU is responsible for switching the stack, it doesn’t know how to access the current thread’s kernel stack, as this is an internal Windows implementation detail, so Windows configures the per-processor DPC stack as the stack pointer.

【责任编辑:王苑 TEL:(010)68476606】

回书目   上一节   下一节
点赞 0

读 书 +更多