2014-03-26

[CaseStudy][WinDbg] Hang

[Scenario]
User 發生了Hang Issue 

[開始查案]
根據收到的Dump 通常會發生Hang 應該是resource被佔住 或是deadlock之類的
所以第一步

!locks
0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks...

Resource @ Ntfs!NtfsData (0x88b01474)    Exclusively owned
     Threads: 84acd020-01<*> 
KD: Scanning for held locks..

Resource @ 0x851d756c    Exclusively owned
    Contention Count = 41
    NumberOfSharedWaiters = 29
     Threads: 84e79d48-02<*> 85c98790-01    863ec940-01    86557030-01    
              84bc9d48-01    8603aa60-01    84e7b8f0-01    861073b8-01    
              8639f5b0-01    84e42648-01    84acd020-01    84c1da58-01    
              84cf3d48-01    84c48030-01    84c54a88-01    84e3f078-01    
              8602ec58-01    86203d48-01    85ecd660-01    86202640-01    
              8620f310-01    862bc030-01    862a9d48-01    84e79410-01    
              86434c78-01    86a5fa60-01    863f82b8-01    8645d998-01    
              8605f030-01    86a46a08-01    
KD: Scanning for held locks..............................................................

Resource @ 0x85d8ae68    Exclusively owned
    Contention Count = 2
     Threads: 8605f030-01<*> 
KD: Scanning for held locks.

Resource @ 0x85d88840    Exclusively owned
    Contention Count = 1
    NumberOfExclusiveWaiters = 1
     Threads: 8605f030-01<*> 
     Threads Waiting On Exclusive Access:
              860d88d8       

KD: Scanning for held locks...................................................................................................................................................................................................................................

Resource @ 0x84d8ff34    Shared 1 owning threads
    Contention Count = 2
    NumberOfExclusiveWaiters = 1
     Threads: 84acd420-01<*> 
     Threads Waiting On Exclusive Access:
              84e79d48       

KD: Scanning for held locks..............................
10293 total locks, 5 locks currently held

Threads: 84e79d48-02<*> 打*代表 這條Thread 佔了這個resource
從上面資訊來看
最下面的Resource
Resource @ 0x84d8ff34    Shared 1 owning threads
卡住了Thread 84e79d48

而Thread 84e79d48 的Resource
Resource @ 0x851d756c    Exclusively owned
卡住了Thread 8605f030
Resource @ 0x85d88840    Exclusively owned
Resource @ 0x85d8ae68    Exclusively owned

所以才造成System Hang

讓我們來看看罪魁禍首 Thread 84e79d48到底在幹嘛


0: kd> !thread 84acd420
THREAD 84acd420  Cid 0004.001c  Teb: 00000000 Win32Thread: 00000000 WAIT: (WrQueue) UserMode Non-Alertable
    82948600  QueueObject
Not impersonating
DeviceMap                 89a088d8
Owning Process            84a41a20       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      23593          Ticks: 1024 (0:00:00:15.974)
Context Switch Count      1966           IdealProcessor: 0             
UserTime                  00:00:00.000
KernelTime                00:00:00.483
Win32 Start Address nt!ExpWorkerThread (0x8288999e)
Stack Init 8ad0bfd0 Current 8ad0bc10 Base 8ad0c000 Limit 8ad09000 Call 0
Priority 14 BasePriority 13 UnusualBoost 1 ForegroundBoost 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr  Args to Child              
8ad0bc28 8288a69d 84acd420 8293af08 82937d20 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
8ad0bc60 828894f7 84acd4e0 84acd420 82948600 nt!KiSwapThread+0x266
8ad0bc88 8288a1ed 84acd420 84acd4e0 00000000 nt!KiCommitThreadWait+0x1df
8ad0bcec 82889a83 82948600 00000001 00000000 nt!KeRemoveQueueEx+0x4f8
8ad0bd50 82a15f5e 00000000 a9364519 00000000 nt!ExpWorkerThread+0xe5
8ad0bd90 828bd219 8288999e 00000000 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
結果看來很正常 看不出什麼東西
這時候來看看Resource吧

對Resource @ 0x84d8ff34位置下dt
dt 0x84d8ff34 _eresource
Why _eresource 我想應該是ntdll的變數吧
總之 我們看第7個變數
 +0x018 OwnerEntry       : _OWNER_ENTRY
所以一樣對它下dt
dt 0x84d8ff34 _eresource OwnerEntry.
要記得打"." 才能看更多資訊

//
// Checking of resource of 0x84d8ff34.
//
0: kd> dt 0x84d8ff34 _eresource
ntdll!_ERESOURCE
   +0x000 SystemResourcesList : _LIST_ENTRY [ 0x84d8f884 - 0x84d8fed4 ]
   +0x008 OwnerTable       : 0x84d8e158 _OWNER_ENTRY
   +0x00c ActiveCount      : 0n1
   +0x00e Flag             : 4
   +0x010 SharedWaiters    : (null) 
   +0x014 ExclusiveWaiters : 0x84d8f838 _KEVENT
   +0x018 OwnerEntry       : _OWNER_ENTRY
   +0x020 ActiveEntries    : 1
   +0x024 ContentionCount  : 2
   +0x028 NumberOfSharedWaiters : 0
   +0x02c NumberOfExclusiveWaiters : 1
   +0x030 Address          : (null) 
   +0x030 CreatorBackTraceIndex : 0
   +0x034 SpinLock         : 0
//
// no thread exclusively owns this resource.
//
0: kd> dt 0x84d8ff34 _eresource OwnerEntry.
ntdll!_ERESOURCE
   +0x018 OwnerEntry  : 
      +0x000 OwnerThread : 0
      +0x004 IoPriorityBoosted : 0y0
      +0x004 OwnerReferenced : 0y0
      +0x004 OwnerCount  : 0y000000000000000000000000000000 (0)
      +0x004 TableSize   : 0

沒東西
來看看OwnerTable吧

dt 0x84d8e158 _OWNER_ENTRY
看到TableSize = 7 所以後面可能有7個Entry
我們可以先用 ?? sizeof(_OWNER_ENTRY) 看OWNER_ENTRY的size

unsigned int 8

所以我們知道下一個Entry 就要再+8

dt 0x84d8e160 _OWNER_ENTRY



0: kd> dt 0x84d8e158 _OWNER_ENTRY
ntdll!_OWNER_ENTRY
   +0x000 OwnerThread      : 0
   +0x004 IoPriorityBoosted : 0y1
   +0x004 OwnerReferenced  : 0y1
   +0x004 OwnerCount       : 0y000000000000000000000000000001 (0x1)
   +0x004 TableSize        : 7
0: kd> dt 0x84d8e160 _OWNER_ENTRY
ntdll!_OWNER_ENTRY
   +0x000 OwnerThread      : 0x84acd420  ---> thread of 0x84acd420 shared owns this resource 
   +0x004 IoPriorityBoosted : 0y0
   +0x004 OwnerReferenced  : 0y0
   +0x004 OwnerCount       : 0y000000000000000000000000000001 (0x1)
   +0x004 TableSize        : 4
從第二個Entry看到了 OwnerThread 0x84acd420
結果還是這條System Thread呀

所以案子到這邊查不下去了
我們僅知道System Thread這個Resource沒有釋放
所以Lock住 其他需要Exclusively owned的Resource 也造成了Hang

0 意見: