Archive

Archive for November, 2010

Debug dmp files

November 15th, 2010 No comments

No doubt you all have experienced a BSOD. Often a dump file is written with information about what, how and why of the BSOD. These files can be found in C: \Windows\Minidump for Windows 7 and Windows Server 2008 R2. If not then you have to search for *. dmp in C: \Windows :-D .

If you want to read these files you the “Debugging Tools for Windows” are required: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx#b or http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx.

Besides these “Debugging Tools for Windows” you also need some symbols etc. These can be found here: http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx#d

Install these symbols etc at a location you can easily find, eg C: \Symbols. This path you need later to import the symbols in windbg.

Figured out what you need, I always just install everything:-D. Very easy! After installing the tools you have access to a tool called windbg.exe.

First add the symbols by opening File menu> “Symbol File Path”, specify: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

NO SPACES!

When you don’t specify the symbol path, windbg will show you all kinds of error saying symbols cannot be found etc. So, first specify the Symbol path!

Once the symbol path is set a DMP file can be loaded and examination of its contents can start to find out what was the reason for the BSOD.

Below an example ;-)

*********************************************************************************************************Loading Dump File [C:\minidumps\051810-13057-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are availableWARNING: Whitespace at end of path element
Symbol search path is: SRV*D:\Symbols*http://msdl.microsoft.com/download/symbolsExecutable search path is:
Windows 7 Kernel Version 7600 MP (16 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 7600.16539.amd64fre.win7_gdr.100226-1909
Machine Name:
Kernel base = 0xfffff800`01c55000 PsLoadedModuleList = 0xfffff800`01e92e50
Debug session time: Tue May 18 16:48:49.648 2010 (UTC + 1:00)
System Uptime: 0 days 0:22:41.011
Loading Kernel Symbols
………………………………………………………
……………………………………………………….
……..
Loading User Symbols
Loading unloaded module list
……….
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************Use !analyze -v to get detailed debugging information.

BugCheck A, {4, 2, 1, fffff80001cd71c1}

Probably caused by : msiscsi.sys ( msiscsi!iSpProcessWMIRequestTimeout+71e )  <<<<<<< D I S C O !!!!!

Followup: MachineOwner
———

8: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000004, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, bitfield :
 bit 0 : value 0 = read operation, 1 = write operation
 bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80001cd71c1, address which referenced memory

Debugging Details:
——————
WRITE_ADDRESS: GetPointerFromAddress: unable to read from fffff80001efd0e0
 0000000000000004

CURRENT_IRQL:  2

FAULTING_IP:
nt!IoReleaseRemoveLockEx+21
fffff800`01cd71c1 f0834304ff      lock add dword ptr [rbx+4],0FFFFFFFFh

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

BUGCHECK_STR:  0xA

PROCESS_NAME:  System

TRAP_FRAME:  fffff8800233d180 — (.trap 0xfffff8800233d180)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=fffffa80323dc820 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001cd71c1 rsp=fffff8800233d310 rbp=0000000000000000
 r8=0000000000000020  r9=fffff80001fc2804 r10=fffff80001c55000
r11=fffff8800233d2e0 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz ac pe cy
nt!IoReleaseRemoveLockEx+0×21:
fffff800`01cd71c1 f0834304ff      lock add dword ptr [rbx+4],0FFFFFFFFh ds:00000000`00000004=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff80001cc4b69 to fffff80001cc5600

STACK_TEXT: 
fffff880`0233d038 fffff800`01cc4b69 : 00000000`0000000a 00000000`00000004 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
fffff880`0233d040 fffff800`01cc37e0 : fffffa80`3a8ed270 00000000`00000000 fffffa80`6f952b80 00000000`00000000 : nt!KiBugCheckDispatch+0×69
fffff880`0233d180 fffff800`01cd71c1 : 00000000`00010009 fffff800`01e6a5f8 fffffa80`31c1cb60 fffffa80`31c1cc68 : nt!KiPageFault+0×260
fffff880`0233d310 fffff880`055bf2b6 : fffffa80`703df010 fffffa80`703df010 00000000`00000002 fffffa80`323dc820 : nt!IoReleaseRemoveLockEx+0×21
fffff880`0233d380 fffff880`055be68b : fffffa80`3a400018 fffff880`0233d400 00000000`00000000 00000000`00000000 : msiscsi!iSpProcessWMIRequestTimeout+0x71e
fffff880`0233d420 fffff800`01cb0493 : 00000000`00000000 00000000`00000003 00000000`00000001 00000000`ffffffff : msiscsi!iSpTickHandler+0x11f
fffff880`0233d460 fffff800`01cd16a6 : 00000000`00000002 fffff880`0233d618 00000000`00000000 00000000`00000100 : nt!IopTimerDispatch+0×132
fffff880`0233d570 fffff800`01cd0a26 : fffffa80`31ecc6e8 fffffa80`31ecc6e8 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0×66
fffff880`0233d5e0 fffff800`01cd157e : 00000003`2b39f2be fffff880`0233dc58 00000000`000154cb fffff880`02318ee8 : nt!KiProcessExpiredTimerList+0xc6
fffff880`0233dc30 fffff800`01cd0d97 : 00000003`2dafbfc5 fffff880`000154cb 00000000`00000001 00000000`000000cb : nt!KiTimerExpiration+0x1be
fffff880`0233dcd0 fffff800`01ccddfa : fffff880`02315180 fffff880`023202c0 00000000`00000002 fffff800`00000000 : nt!KiRetireDpcList+0×277
fffff880`0233dd80 00000000`00000000 : fffff880`0233e000 fffff880`02338000 fffff880`0233dd40 00000000`00000000 : nt!KiIdleLoop+0x5a
STACK_COMMAND:  kb

FOLLOWUP_IP:
msiscsi!iSpProcessWMIRequestTimeout+71e
fffff880`055bf2b6 33d2            xor     edx,edx

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  msiscsi!iSpProcessWMIRequestTimeout+71e

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: msiscsi

IMAGE_NAME:  msiscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a5bcad7

FAILURE_BUCKET_ID:  X64_0xA_msiscsi!iSpProcessWMIRequestTimeout+71e

BUCKET_ID:  X64_0xA_msiscsi!iSpProcessWMIRequestTimeout+71e

Followup: MachineOwner
———

 *********************************************************************************************************

A quick search on the Interwebs show this result >>> “Stop error message on a Computer That is Running Windows 7 or Windows Server 2008 R2 and iSCSI storage That HAS: “0x0000000A”” with a hotfix! Jeuh! Ofcourse you have to meet the “Sympthoms” and “Applies to”.

Hopefully with this post you will find it easier to debug these DMP files created with a BSOD.

WinRM, SCVMM and Token Size

November 9th, 2010 No comments

Sometime ago I ran into a WinRM problem when adding two Hyper-V R2 nodes to an existing Hyper-V R2 Failover cluster.

The errors which I got in System Center Virtual Machine Manager 2008 R2 (SCVMM) where: 

 Error (2916)
VMM is unable to complete the request. The connection to the agent hvn-srv001.domain.local was lost.
(Unknown error (0×80338126))
Recommended Action
Ensure that the WS-Management service and the agent are installed and running and that a firewall is not blocking HTTP traffic. If the error persists, reboot hvn-srv001.domain.local and then try the operation again.
Error (2927)
A Hardware Management error has occurred trying to contact server hvn-srv001.domain.local.
(Unknown error (0×80338171))
Recommended Action
Check that WinRM is installed and running on server hvn-srv001.domain.local. For more information use the command “winrm helpmsg hresult”.

This was strange because the failover cluster already consisted out of five nodes, so why these errors all of a sudden!?!

What did I check? Well actually not that much because the messages only refer to the firewall and a hardware management error when contacting the server. A way to see what “0×80338171” means by using the” winrm helpmsg 0×80338171” command. This has as result:

The WinRM client received an HTTP bad request status (400), but the remote service did not include any other information about the cause of the failure.

Okay…something is not going as it should! I tried to disable the firewall completely…with no satisfying results unfortunately! So the firewall is out for being the reason!
When searching the interwebs with the “0×80338171” error code I ended up at this article http://support.microsoft.com/kb/970875. This article is saying that because of a large Kerberos token it is possible that in some domain environments the user may be a member of so many security groups that the security token used to authenticate the user to the server may be > 16k. This is causing either http.sys or WinRM server to reject the request.

D-I-S-C-O!

Why? Due to a weird nested group construction my account had a membership of 400+ Domain Local groups which where nested by serveral Global groups! Don’t ask why! Don’t!

A script to find out the group membership count can be found here http://forums.techarena.in/active-directory/1074988.htm#post4089330

I remembered that because of this insane amount of group memberships a hotfix was applied that fixed an issue with connection via RDC to a server. Or actually, we encountered a problem when connecting to server using RDC from a Windows 7 laptop. The cause of this was a very large number of group memberships. The hotfix can be found here: http://support.microsoft.com/kb/978918. This hotfix only solved the logon issues, not the issue I encountered next when opening dsa.msc e.g. and change domain controller. This was solved my setting the MaxTokenSize to a Microsoft recommended maximum value of 65535 decimal or FFFF hexadecimal.

Name: MaxTokenSize
Type: REG_DWORD
Value: 65535
Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters

When this was added all issues seemed to be gone….except the WinRM issues.

The quick fix is to remove as many groups so that your token size is less than 16KB. Another fix is to change some WinRM related settings of the http.sys. All of this is described in http://support.microsoft.com/kb/971244 but the “Applies to” does not included any flavor of Windows Server 2008 R2. The entries that should be added to resolve the WinRM error are:

Name: MaxFieldLength
TYPE: REG_DWORD
Value: default (16384). Range (64 to 65534)
Location: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\HTTP\Parameters
Name: MaxRequestBytes
Type: REG_DWORD
Value: default (16384). Range (64 to 65534)
Location: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\HTTP\Parameters

For both entries I entered a value of 65534. Reboot the machine to make sure both http.sys and WinRM pick up the changes or restart the http and WinRM services.

After this all was fine!

All done….well no!

Bottom line…it’s all about Token size. Mine was large then the standard 16KB so Kerberos, RDP and WinRM where not functioning as they should. The reason for this was this insane high number of group membership. This was the cause of all evil and should be addressed. I resolved this by applying different folder permissions which made it possible to substitute some 230 Domain Local groups by only 1 without compromising Delegation of Control or any other security principal.

AGDLP is good…but be aware of the token size when applying!