SharePoint – Slow People Picker – Troubleshooting Performance
Posted On February 3, 2020
Poor people picker performance is usually caused by one of two things:
1. People Picker (hereafter abbreviated as PP) is connecting to a Domain Controller across a slow network link.
2. People Picker is trying to query domains that are not available on the network (usually due to firewall settings).
Note: If PP performance is bad enough, it can take over 30 seconds, in which case the call will timeout on the client side, and you’ll get the generic “Sorry, we’re having trouble reaching the server” error.
Start by explicitly setting the domains for PP to search:
The number one thing you should do in every SharePoint environment (even if PP is current performing just fine) is explicitly set your People Picker settings for each web application.
The default configuration for People Picker is that nothing is set within the “SearchActiveDirectoryDomains” property. That means that SharePoint will try to search every trusted domain during a People Picker query. That sounds ok, but there are a few problems with it:
If your Active Directory team adds a trust with a new forest or domain, but does not configure DNS or Firewalls to allow SharePoint to access those domain controllers (DCs), it will cause the PP query to timeout.
If the SharePoint servers can communicate with the DCs for the new domain, but those calls go over slow network links, PP results may return, but will be delayed.
The best practice is to always explicitly specify the domains you want People Picker to search. That way you don’t waste time querying domains that don’t matter.
So check your current People Picker settings with PowerShell:
The “IsForest” property is important. You must specify either $true or $false.
$True = You want it to search the entire AD forest, including the domain specified, and all child domains. (For example: contoso.com, corp.contoso.com, hr.contoso.com, etc)
$False = You only want it to search the specified domain.
If you have a one-way trust with one of the domains, there is a little more to it. By default, we use the Application Pool account to connect to Active Directory. In a one-way trust scenario, that won’t work. You’ll have to use an account from the trusted domain. You have to set an encryption key and then specify those trusted domain credentials within the People Picker settings.
Set the Encryption Key:
Unlike most SharePoint configuration, this one needs to be run on every server in the farm because it sets a value in the Registry:
In my example, the encryption key I chose is “Password1”. It needs to be set as the same key across all servers.
If you find that People Picker is fast when searching by account name, for example: “contoso\user1”, but is very slow when searching by display name (“Doe, John”), it’s likely that the performance is hampered by trying to resolve “isolated” names. In that case, we can change the way that SharePoint resolves those names by setting the ActiveDirectoryRestrictIsolatedNameLevel property to true.
OK, I did that, and People Picker is still slow. Now what?
The next step is to review the SharePoint ULS logs to try to identify which domain is causing the delay.
Turn your SharePoint ULS logging up to up to Verbose: Set-SPLogLevel -TraceSeverity verbose
Then reproduce the behavior by searching for a few users in People Picker.
Then review the logs from the Web-Front-End (WFE) you were hitting while reproducing the issue. If you have multiple load-balanced WFEs, you could either use a HOSTS file entry to target one of them, or simply review the logs from all of them.
You’re looking for requests that include calls to the “SearchFromGC” method. You could optionally search the logs for the user account name that you were trying to find with PP. Once you find the proper request, it helps to then filter the log by that correlation ID. Here’s a log example (with commentary) where I searched for a user, and I did get results, but it took 20 seconds. You want to pay attention to the time stamps in the log entries:
-We get 2 results from contoso.com in less than a second:
01/13/2020 11:42:03.83 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq1 Verbose SearchFromGC name = contoso.com. start c0adb6e3-c824-4b11-8d0e-aee7470e8107
01/13/2020 11:42:03.86 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq2 Verbose SearchFromGC name = contoso.com. returned. Result count = 2 c0adb6e3-c824-4b11-8d0e-aee7470e8107
-We get 0 results from fabrikam.com, but that’s also fast:
01/13/2020 11:42:03.86 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq1 Verbose SearchFromGC name = fabrikam.com. start c0adb6e3-c824-4b11-8d0e-aee7470e8107
01/13/2020 11:42:03.86 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq2 Verbose SearchFromGC name = fabrikam.com. returned. Result count = 0 c0adb6e3-c824-4b11-8d0e-aee7470e8107
-Then we spend 20 seconds trying to query mysteryDomain.local, and fail.
01/13/2020 11:42:03.86 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq1 Verbose SearchFromGC name = mysteryDomain.local. start c0adb6e3-c824-4b11-8d0e-aee7470e8107
01/13/2020 11:42:23.94 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation Performance ftq3 Verbose SearchFromGC name = mysteryDomain.local. Error Message: The server is not operational. c0adb6e3-c824-4b11-8d0e-aee7470e8107
01/13/2020 11:42:23.94 w3wp.exe (0x25B4) 0x44E8 SharePoint Foundation General 7fbh Verbose Exception when search “Doe, J” from domain “mysteryDomain.local”. Exception: “The server is not operational. “, StackTrace: ” at System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail) at System.DirectoryServices.DirectoryEntry.Bind() at System.DirectoryServices.DirectoryEntry.get_AdsObject() at System.DirectoryServices.DirectorySearcher.FindAll(Boolean findMoreThanOne) at Microsoft.SharePoint.WebControls.PeopleEditor.SearchFromGC(SPActiveDirectoryDomain domain, String strFilter, String rgstrProp, Int32 nTimeout, Int32 nSizeLimit, SPUserCollection spUsers, ArrayList& rgResults) at Microsoft.SharePoint.Utilities.SPUserUtility.SearchAgainstAD(String input, SPActiveDirectoryDomain domainController, SPPrincipalType scopes, SPUserCollection usersContainer, Int32 maxCount, String customQuery, String customFilter, TimeSpan searchTimeout, Boolean& reachMaxCount)”. c0adb6e3-c824-4b11-8d0e-aee7470e8107
— In summary, contoso.com and fabrikam.com were queried quickly, but we spent 20 seconds trying “mysteryDomain.local” and ultimately failed to connect to that domain.
Remember to turn your logging back down: Clear-SPLogLevel
We’ve identified the slow domain. Now what?
If the slow domain (in my case “mysteryDomain.local”) is not one that you need to get PP results from, refer to the top of this post and configure the web app to only search (for example) contso.com and fabrikam.com.
On the other hand, if the slow domain is one you need to get PP results from, then you’ll need to take some network traces to figure out what’s going wrong. In my experience, it’s one of three things:
1. DNS is not configured properly and gives no results when PP asks for a domain controller for that domain.
2. DNS configuration sends PP to a domain controller that is valid, but firewall settings do not allow the SharePoint servers to connect on LDAP ports 389 and 3268.
3. DNS configuration sends PP to a domain controller that is valid, and is reachable, but is on the other side of the world, or is accessed across a slow network link.
You’ll need to take a network trace using Netmon 3.4 or Wireshark. Again, this will need to be taken from the Web-Front-End that is servicing the PP requests, so using a HOSTS file entry to target a specific WFE is highly encouraged.
You’ll also want a new set of verbose SharePoint logs to go with the network trace. By lining up timestamps between the log and the trace, it’s much easier to find the proper packets in the network trace.
And you’ll want to flush the DNS cache on the WFE before starting the trace so you can see the DNS calls in the trace.
So proper data capture goes something like this:
Edit the HOSTS file on your client to point the host name portion of the site URL at the IP address of one of your WFEs.
Install Netmon on that WFE.
Turn SharePoint logging up to Verbose.
Clear the DNS cache: ipconfig /flushdns
Run Netmon as Administrator and start a new capture.
Reproduce the behavior by searching for a user in People Picker. — Try to do this quickly to keep the network trace small.
Stop the Network trace.
Turn the SharePoint logging back down.
Analyze the network trace.
This is a bit beyond the scope of this blog post, but I’ll give you a few high-level things to look for.
First, use this display filter: Property.TCPSynReTransmit
If you see a number of those in a row for a domain controller from your “slow domain”, you can be pretty sure PP found a domain controller, but a firewall is blocking port 389 (LDAP) and / or 3268 (Global Catalog).
Here’s an example of a SharePoint server trying to connect to several different domain controllers on port 389. Notice there’s a bunch of retransmits for SYN packets. That’s a sign that the SharePoint server got no response when sending the first SYN packet while trying to establish a TCP connection on port 389.
If I then filter the trace to the IP addresses of those Domain controllers, I see the whole story. SharePoint sent a SYN packet to 5 different DCs. It got no response, so it waited 3 seconds and retransmitted the packet, and it still got no response, so it backed off another few seconds and sent another retransmit.
This is a classic example of a Firewall eating those packets.
If you’re not seeing any retransmits, then try this filter on your trace: DNS
That will show all DNS traffic. You’re looking for the DNS calls the SharePoint server uses to find a proper domain controller for a given domain.
When attempting to find a DC for a particular domain, SharePoint will first try to find one in the same “site”. This is done with a DNS query like this:
Dns: QueryId = 0x30C8, QUERY (Standard query), Query for _ldap._tcp.Fargo._sites.mysteryDomain.local of type SRV on class Internet
… where “Fargo” is the name of the local “site”. If that DNS query fails (because there are no DCs for the specified domain within that “site”), you would see this response from the DNS server:
If the DNS queries are successful, then you should see the DNS server respond with a list of IP addresses for the domain controllers in that domain.
Then you should see a TCP connection established with one of the domain controllers from your “slow domain”, and you should see some LDAP queries being sent to that DC (use Netmon filter: LDAP ).
If you’ve made it that far, it would seem that you have a valid network connection to the DC. In that case, look at the response times for the LDAP queries. It might be that the DC is on the other side of a slow network link and network latency is making an otherwise successful connection drag out.