Jul 26 2010

GPMC hangs connected to one domain controller

I will describe in this post an incident we had in our production environment and the different troubleshooting steps to resolve this issue. When we launched a GPMC, the console froze when we clicked on an OU in order to display the Policy Objects linked to it. The problem occurred only when the GPMC was connected to a particular Domain Controller (PDC emulator in our case), if we switched to another DC the GPMC was OK.

There was no problem with GPOs in our domain: Replication was ok and GPOs were applied correctly on our computers/users objects. But we could not edit anymore GPOs connected to this DC. While the GPMC was hanging there was a lsass.exe CPU overload on the DC until the console was killed. Therefore we had to edit GPOs connected to any other DC, so the production environment was working near normal during the resolution of the incident.

The first thing we did was to launch an analysis with Server Performance Advisor on the DC while the GPMC was hanging, here is the result:

Apparently the DC has a problem performing an LDAP request initiated by the GPMC, and that is why the lsass.exe process on the DC consumes so much CPU time. In order to confirm this we will do two network packets captures with WireShark on our admin workstation, we will perform the same manipulations under GPMC, clicking on an OU to display GPOs linked to it: First with the GPMC connected to the problematic DC, console frozen after clicking on the OU. Second with the GPMC connected to another DC, where GPOs display and edition works fine:

On the first capture you can see that the LDAP request returns no result after two minutes, then we decided to kill the console. On the second one the LDAP request returns a result (a GPO DistinguishedName) after a few seconds.

We need now to identify what kind of LDAP request the DC could not perform, because all other LDAP requests were performed by the DC, for example searching for computers under the DSA console connected to this DC worked fine. For that we need to turn on Active Directory diagnostic event logging, especially Field Engineering , Event ID 1644 will be written in the event log when the LDAP request initiated by the GPMC will be performed, this event ID tells us that an inefficient LDAP query was performed and gives us the following information:

Results returned by the query should be the attributes GPLink and GPOptions, exactly what we expect, because by clicking on an OU under GPMC you want to display the GPOs linked to it (by the way that’s the result returned by the second network capture).
The attribute used as a filter for the LDAP query is ObjectCategory which is an indexed attribute of the AD Database. As a consequence the LDAP request should be performed quickly, which is not our case at all. We tried to launch the same request with ldp.exe tool, here the result:

ldap_search_ext_s(ld, “DC=ldap389,DC=info”
, 2, “( | (objectCategory=CN=Domain-DNS,CN=Schema,CN=Configuration,DC=ldap389,DC=info) (objectCategory=CN=Organizational-Unit,CN=Schema,CN=Configuration,DC=ldap389,DC=info) )”, attrList, 0, svrCtrls, ClntCtrls, 10, 0 ,&msg)
Error: Search: Timeout. <85>
Server error:
Error<94>: ldap_parse_result failed: No result present in message
Getting 0 entries:

We have a TimeOut, we tried the same request on other DCs and got no TimeOut at all. As a first step we decided to increase the value of the parameter MaxQueryDuration of the DC LDAP policy, and also increase the TimeOut parameter of ldp.exe, but with no significant results.

After some research on the web I stumbled on this very interesting article written by Tim Springston. It says that:

“if the attribute(s) which are showing as taking an extended amount of time to search for are already indexed then the lack of an index is clearly not your problem. Sometimes indices, perhaps through frequent changes or other reasons, need to be re-indexed to remove “whitespace” or other problems. So checking the integrity of the database or doing an offline defrag may be the way to go.”

So let’s do a semantic analysis of the DC’s ntds.dit file:

Inconsistent refcounts were detected: We launch this time the same analysis but in “fix mode” (KB’s step 11). Finally we proceed with an offline defragmentation of the database and exit DSRM mode.

Once the DC restarted we can use GPMC again connected to it: No more hanging when displaying GPOs and no more lsass.exe process CPU overconsumption.

This post is also available in: French

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment


WordPress Themes

Blossom Icon Set

Software Top Blogs