SharePoint: Delete user from site collection creates site outage

Consider the following scenario:

You try to delete a user from a site collection, either in the UI (Site Settings | People and Groups | delete users from site collection), or by using the PowerShell command Remove-SPUser.

The operation runs for a long time, during which the sites within that content database may suffer severe performance problems like:

  • Users receive 404 – not found errors.
  • Users receive 503 – service unavailable errors.
  • The pages load very slowly, or completely hangs.
  • Pages timeout or fail to load.
  • On the SQL Server side, you may see thousands (or millions) of calls to SQL stored procedure proc_SecUpdateAclForScope and significant blocking, or even deadlocks.

Note: This problem can also be triggered in a less common scenario. Lets say you had a user: contoso\user1 who left the company. Their account was deleted in Active Directory. Then the same user comes back to the company, and a new account with the same account name is created, or maybe your AD team decides to reuse contoso\user1 for a completely new employee. In any case, you try to add the “new” User1 to site permissions, and that creates a site outage. Or if they get their permission from an Active Directory group, the user tries to browse a site and that action creates a site outage. This occurs because the “old” contoso\user1 still has an active entry in the UserInfo table of the content database. Since the “new” contoso\user1 has the same account name, but different SID, SharePoint treats it as a different user. The record for the “old” User1 must be deleted before adding a new record for the “new” User1. So it’s a roundabout way of getting there, but in the end, we have the same cause: we’re trying to delete a user from a site collection.

Why might this happen?

In most cases, this occurs due to a large number of broken permission inheritance across the site. This is more prevalent in Publishing site collections.

My colleague Joe wrote an excellent post about this issue a while back: Item level permissions and performance, which was immensely helpful to me.

The purpose of this post is to add some additional information about the problem, and to provide some additional tools and ideas for fixing it.

The Problem:

As detailed in Joes post, this can occur due to a large number of security scopes, a large number of role assignments, or a combination of the two.

A new security scope is created any time you break permission inheritance on any object (sub-site, list, library, folder, item) in the site. The more you do it, the more scopes must be evaluated when deleting a user from the site collection. Likewise, each time you give a user or group a certain permission to a certain scope, you create a new role assignment. The more you do that, the more role assignments must be evaluated when deleting a user from the site collection. When the number of scopes and role assignments grow past a particular threshold (I can’t give you a number because it will be different in every environment), you’ll start seeing performance problems.

Then there’s the “limited access problem”:

In a Publishing site collection, many elements (master page, page layouts, images, etc) that are used to render the site pages are stored at the root of the site. If you break permission inheritance at the item level and give a user “read” permission there, SharePoint automatically gives that same user “limited access” permission at the site level. The purpose of the “limited access” permission is to give the user just enough permission to be able to render the page when trying to access the item(s) they have direct permission to.

The problem occurs when you keep breaking permission inheritance all over the place and giving individual users permission. Each of those individual users are then given “limited access” at the site level. Each time you break permission inheritance in the UI, all of the permissions of the parent object are automatically copied down to the child, including all limited access permissions. For example, if you have 2,000 users with limited access permission to the site, when you break permission inheritance on an item, all 2,000 of those role assignments are copied down to the item. What’s worse is that these limited access permissions don’t actually have a function at the item level. They are only used at the list and site level. So you can see that over time, a practice of breaking permission inheritance and giving permission to individual users results in a ton of additional role assignments within the site, including a large number of completely useless item-level limited access role assignments.

Note: This appears to only be a problem in SharePoint 2013 and 2016. I couldn’t get it to happen in SharePoint 2019. In 2019, when breaking permission inheritance on an item, none of the limited access permissions from the parent were copied down to it. However, if you have upgraded to 2019 from earlier versions, the “damage” may have already been done.

A word on Role Assignments:

If you break permission inheritance and give 20 individual users Read permission to that item, you just created 20 new role assignments. If you do the same thing, but use a SharePoint group or Active Directory group that contains all 20 of those same users, you just created 1 new role assignment. A practice of breaking permission inheritance and giving individual permission will eventually lead to problems and should be avoided when possible.

Diagnosis:

If you experience a site outage and SQL blocking on calls to proc_SecUpdateAclForScope when trying to delete a user from a site collection, you can be pretty sure it’s this issue. However, you’d probably like to know to what extent the problem exists, and to also create a baseline so you can gauge the progress of your cleanup efforts. Below are some SQL queries that can help you evaluate the extent of the problem.

Note: While many site collections can be stored in a single content database, deleting a user is done at the site collection level. You may choose to add a “where” clause to the queries below to target one site collection at a time.

This first SQL query I completely stole from Joe. It should give you an idea of how many unique security scopes you’re dealing with. Run this against the problem content database:

-- Show security scope count for lists with over 1000 items
SELECT al.tp_WebId as WebId, 
  ad.ListId, 
  N'/'+aw.FullUrl as WebUrl, 
  al.tp_title as ListTitle, 
  Count(ad.Id) as ItemCount, 
  Count(Distinct(ad.ScopeId)) as SecurityScopeCount
FROM AllDocs ad with(nolock)
  Join AllLists al with(nolock) on ad.Listid = al.tp_ID
  Join AllWebs aw with(nolock) on al.tp_WebId = aw.Id
Group by ad.Listid, al.tp_title, al.tp_WebId, aw.FullUrl
Having Count(ad.Id) > 1000
  or Count(Distinct ad.ScopeId) > 1000
Order by SecurityScopeCount desc 

This second SQL query I also stole from Joe, but enhanced it slightly by adding the ScopeUrl column so it’s easier to see which subsites / lists / folders / items have the largest number of role assignments:

-- Role Assignments Per Scope
select ra.SiteId, ra.ScopeId, perms.ScopeUrl, count(ra.RoleId) as AssignmentCount
from RoleAssignment ra with(nolock)
join perms (nolock) on ra.SiteId = perms.siteid and ra.ScopeId = perms.scopeid
group by ra.SiteId, ra.ScopeId, perms.scopeurl
order by count(ra.RoleId) desc 

I have found that the symptoms noted above are not equal for every user. It seems to depend on the number of role assignments for that individual. For example, you may be able to delete a user with 20 role assignments without issue, but deleting a user with 1000 role assignments locks up the site. The following SQL query will show all the role assignments for your users:

-- Show number of Role Assignments Per-User
select RA.SiteId,  RA.PrincipalId, UI.Tp_ID, UI.tp_Deleted, ui.tp_Login, UI.tp_Title, count(ra.RoleId) as AssignmentCount
From RoleAssignment RA (nolock)
join userinfo UI (nolock) on RA.siteid = UI.tp_siteid and RA.principalid = UI.tp_id
where UI.tp_Deleted = 0
group by RA.PrincipalId, RA.siteid, UI.tp_id, UI.tp_deleted, ui.tp_login, UI.tp_title
order by AssignmentCount desc 

You can take your top offenders from the previous query and use this one to get more information about where their permissions exist within the site. You just have to specify their account name.

-- Show Security Scopes for a specified user
Declare  @userAccount nvarchar(50) = 'i:0#.w|contoso\user1'
select ra.SiteId, ra.ScopeId, perms.webid, perms.ScopeUrl, ui.tp_login
from RoleAssignment ra with(nolock)
join perms (nolock) on ra.SiteId = perms.siteid and ra.ScopeId = perms.scopeid
join userinfo UI (nolock) on RA.siteid = UI.tp_siteid and RA.principalid = UI.tp_id
where UI.tp_login = @userAccount 
group by ra.SiteId, ui.tp_login, ra.ScopeId, perms.webid, perms.scopeurl 
order by perms.ScopeUrl

Fixing the problem:

Hopefully the results from the above SQL queries gave you some insight into exactly where your largest problems exist. To fix this, you’ll have to reduce the number of security scopes and role assignments. By how much? It’s hard to say, as it depends on your environment and hardware. Below I have provided three options for reducing permission scopes and role assignments. They are listed in order of most impactful to least. I recommend only moving on to the next option after you’ve exhausted the previous one.

Important: These scripts make changes within your site. While I have tested them in SharePoint 2013, 2016, and 2019 environments, I strongly recommend testing them in your own non-production environment first, and as always, have good and current database backups.

Option 1: Re-inherit permissions everywhere you can.

Doing this will reduce both security scopes and role assignments. This may take some time, as you may have to consult your users to see if certain sub-sites, lists, and items can re-inherit from their parent. For example, let’s say you have a list, and 10 items within it have unique permissions. You have 100 role assignments at the list level. Due to the “limited access problem” I described above, you will most likely have at least 100 role assignments at each item level. If you re-inherit permission at those 10 items, you just cleaned up 10 security scopes and 1,000 role assignments. To help automate this process, you can use the following PowerShell script. It goes to a specified list / library and re-inherits permission at every folder and item within it.

#Variables for Web URL, List Name
$WebURL = "https://teams.contoso.com"
$ListName ="List1"

#Get the list
Add-PSSnapin microsoft.sharepoint.powershell -ErrorAction SilentlyContinue
$web = Get-SPweb $WebURL
$list = $web.lists[$ListName]
#Get folders with unique permissions
Foreach ($folder in $list.Folders)
  { 
        if ($folder.HasUniqueRoleAssignments)
        {
            Write-Host "Resetting Folder inheritance at:" $folder.Url  
            $folder.ResetRoleInheritance() 
            $folder.Update()
        }
   }
#Get list items with unique permissions
$ListItems = $list.Items | Where {$_.HasUniqueRoleAssignments}
Foreach($ListItem in $ListItems)
{
      #Reset broken list item inheritance
      $ListItem.ResetRoleInheritance()
      Write-host "Resetting Inheritance on List Item:" $ListItem.URL
}

Option 2: Try manually removing permissions for the users you’re trying to delete.

Through your analysis of this problem, you may find that the user you’re trying to delete — and is casing the site outage — has a large number of role assignments. You may also find there are a number of other “old” / defunct users that you could remove. Deleting these users from the site collection would be the quickest way to remove all role assignments for them, but ironically, that’s the very thing that causes the site outage. In that case, we can take a more targeted / batched approach to removing all the permissions for a problem user. Doing this should reduce the users role assignment count to zero without creating the SQL blocking that causes the site outage. Once their role assignment count has been reduced, you should be able to delete them from the site collection without issue.

The following script reads in a list of user accounts and systematically removes their permissions from each web, after which, it deletes them from the site collection. The input file should contain the users account names, one on each line. In the example below, the site is using Windows-claims authentication, so the user account names are listed that way. This can also be run for a single user by entering a single account name in the input file. You’ll want to run this while logged in as an account that has full control on the entire site collection and read access to the content database. The application pool account is a good bet.

#Title: Remove-ListofUsersFromAllPermissions.ps1
#Author: Joroar
#Date: 12/23/21
#This script is provided as-is with no warranties expressed or implied. Use at your own risk.
#Synopsis: Reads in user accounts from a text file, removes every permission they have within the specified site collection, and then deletes them
#Inputs: Just set the top 3 variables:  $URL, $inputFile, and $log

$URL = "https://teams.contoso.com"
$inputFile = Get-Content "C:\temp\UsersToDelete.txt"
$log = "c:\temp\Remove-ListofUsersFromAllPermissions.txt"

### Change nothing below this line ###
Add-PSSnapin *sharepoint*
$ErrorActionPreference = "Stop"
$stopwatchAll =  [system.diagnostics.stopwatch]::StartNew()
#Loop through the input file, processing one user at a time.
foreach($line in $inputFile)
{$userAccount = $line
Write-Host -ForegroundColor Yellow "`n Processing user " $userAccount
"----- Processing user " + $userAccount + " -----" | out-file $log -Append
$site = get-spsite $url
$siteURL = $site.url
try{$user = get-spuser -Identity $userAccount -Web $site.RootWeb}
catch{Write-host -ForegroundColor Red "Could not find user $userAccount in site $siteURL"
"ERROR: Could not find user " + $userAccount + " in site " + $site.url | out-file $log -Append}
$web = $null
$stopwatch =  [system.diagnostics.stopwatch]::StartNew()
#SQL Query function
function Run-SQLQuery ($ConnectionString, $SqlQuery)
{$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = $ConnectionString
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = $SqlQuery
$SqlCmd.Connection = $SqlConnection
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd
$DataSet = New-Object System.Data.DataSet
$SqlAdapter.Fill($DataSet)
$SqlConnection.Close()
$DataSet.Tables[0]
}
$connStr = $site.ContentDatabase.DatabaseConnectionString
#Get the specified users role assingments and scopes per web from SQL
$queryResults = Run-SQLQuery -ConnectionString $connStr -SqlQuery "Declare  @userAccount nvarchar(50) = '$userAccount' select distinct perms.WebId as ID from RoleAssignment ra (nolock) join perms (nolock) on ra.SiteId = perms.siteid and ra.ScopeId = perms.scopeid join userinfo UI (nolock) on RA.siteid = UI.tp_siteid and RA.principalid = UI.tp_id where UI.tp_login = @userAccount" 
If($queryResults)
{
#Loop through each web ID we got from SQL
$queryResults  | Foreach-Object($_){
$webid = $_.id
#Get the web
Try{$web = $site.AllWebs | ?{$_.id -eq $webid}
}
Catch{write-host -f Red "Failed to get the web"}
    if($web)
    {write-host "Removing permission from Web: " $web.Url
        "Removing permission from Web: " + $web.Url | out-file $log -Append
            try{$web.RoleAssignments.Remove($user)}
            catch{}
    }
}
$stopwatch.stop()
Write-host -ForegroundColor Green "Removed all permission for user " $userAccount " in " $stopwatch.Elapsed.TotalMinutes.ToString("#.##") " Total Minutes"
"----- Removed all permissions for user: " + $userAccount + ". Total Minutes: " + $stopwatch.Elapsed.TotalMinutes.ToString("#.##") | out-file $log -Append
Write-Host "Deleting user" $userAccount "from the site collection for good measure."
"Deleting user " + $userAccount + " from the site collection" | out-file $log -Append
try{Remove-SPUser -Identity $user -Web $web -Confirm:$false}
catch{Write-host -ForegroundColor Red "Could not find user $userAccount in site $siteURL so cannot delete them"
"ERROR: Could not find user " + $userAccount + " in site " + $site.url + " so cannot delete them" | out-file $log -Append}
}
else{#We got no query results.  Could be that the user doesn't have any permission within the site
write-host -f Red "We got no query results.  Either the user has no role assignments, or the process account does not have permission to query the content db." 
"We got no query results.  Either the user " + $userAccount +  " has no role assignments, or the process account does not have permission to query the content db." | out-file $log -Append
#Try to delete the user from the site collection anyway.  Use the sites root web.
$web = Get-SPWeb $URL
Write-Host "Trying to delete user" $userAccount "from the site collection anyway..."
"Trying to delete user " + $userAccount + " from the site collection anyway..." | out-file $log -Append
try{Remove-SPUser -Identity $user -Web $web -Confirm:$false}
catch{Write-host -ForegroundColor Red "Could not find user $userAccount in site $siteURL so cannot delete them."
"ERROR: Could not find user " + $userAccount + " in site " + $site.url + " so cannot delete them." | out-file $log -Append}
}
}
$stopwatchAll.stop()
Write-host -f Green "`n `n ========== Done.  Processed "  ($inputfile.Count) " users in " $stopwatchAll.Elapsed.TotalMinutes.ToString("#.##") " Minutes =========="
"========== Done.  Processed " + ($inputfile.Count) + " users in " + $stopwatchAll.Elapsed.TotalMinutes.ToString("#.##") + " Minutes ==========" | out-file $log -Append 

Option 3: Remove Limited Access permissions from items.

Note: This is probably only applicable to SharePoint 2013 and 2016. SharePoint 2019 and above has a timer job to automatically clean up extraneous limited access permissions. Also, this cleanup option is significantly less impactful than the two above. It should probably only be used after exhausting the above two options.

As we discussed above, the “Limited Access” permission has no real function at that item level. The only reason it exists there is because (by default) it gets copied down from the parent list or site when permission inheritance is broken on the item. If, as part of your diagnosis work above, you find that you have a large number of limited access role assignments at the item level, your first choice should be checking to see if those items could simply re-inherit permission from their parent. Doing so would remove all the item level role assignments. If that is not possible, you could choose to manually clean up the item level limited access role assignments using the following PowerShell script. Simply point it at a list / library, and it will loop through each item within and remove only the “limited access” permissions from it.

#Title: Remove-LimitedAccessFromItems
#Author: Joroar
#Date: 12/16/21
#This script is provided as-is with no warranties expressed or implied. Use at your own risk.
#Synopsis: Use this to remove all Limited Access permissions from the item and folder level on a specified list
#          It will log to the console and also to the log file you specify as $Log
#Inputs: Just set the top 3 variables: $WebURL, $ListName and $Log

#Variables for Web URL, List Name, and Log file
$WebURL = "https://teams.contoso.com"
$ListName = "Lib1"
$Log = "c:\temp\Remove-LimitedAccessFromItems.txt"

### Change nothing below this line ###
Add-PSSnapin microsoft.sharepoint.powershell -ErrorAction SilentlyContinue 
#Get the list
$web = Get-SPweb $WebURL
$list = $web.lists[$ListName]
$role = $web.RoleDefinitions["Limited Access"]
#Get list items with unique permissions
$listItemsAndFolders = $list.Items | Where {$_.HasUniqueRoleAssignments} 
#Folders are a bit dangerous. If they have any subfolders or items with unique permissions below them, they should be left alone.
#Leaving this commented out for now.
#$listItemsAndFolders += $list.Folders | Where {$_.HasUniqueRoleAssignments}
#Loop through each item in the list
Foreach($listItemAndFolder in $listItemsAndFolders)
{#Start a stopwatch so we can tell how long it takes to process each item
     $stopwatch =  [system.diagnostics.stopwatch]::StartNew() 
     Write-host "`n Checking for Limited Access permission on list item or folder:" $listItemAndFolder.URL  | out-file $Log -Append
     $listItemAndFolder.URL + "  ...Checking for Limited Access permission" | out-file $Log -Append
     [array]$memberArray = $null
     $needtoremove = $false
     $ras = $listItemAndFolder.roleassignments
     #Loop through each role assingment on the item
         foreach($ra in $ras)
         {#Get only Limited Access role assignments
             if($ra.roledefinitionbindings.name -eq $role.name  -and  $ra.roledefinitionbindings.Count -eq 1)
             {write-host -foregroundcolor red "Removing " $ra.roledefinitionbindings.name " for user or group " $ra.member
             "Removing " + $ra.roledefinitionbindings.name + " for user or group " + $ra.member| out-file $Log -Append
             #Add principals that only have Limited Access to an array so we can remove them all later
             $memberArray += $ra.member.ToString()
             $needtoremove = $true
             }
         }
     if($needtoremove)
     {#Loop through the array and remove users from role assignments
         foreach($member in $memberarray)
         {#check if member is a user object or AD group
             if(($member -match "i:0") -or ($member -match "c:0"))
             {
             $user = get-spuser $member -web $web
             $listItemAndFolder.RoleAssignments.Remove($user)
             } 
             Else{
             #member is a SharePoint group
             $SPGroup = $web.sitegroups[$member]
             $listItemAndFolder.RoleAssignments.Remove($SPGroup)
             }
         }
     }
     Else{Write-Host -ForegroundColor green "No Limited Access Permisisons found on" $listItemAndFolder.URL
         "No Limited Access Permisisons found on" + $listItemAndFolder.URL | out-file $Log -Append}
#commit the change to the item or folder
$listItemAndFolder.SystemUpdate()
#Stop stopwatch and report to log file
$stopwatch.stop()
"-- Done -- Removed " + $memberArray.Count + " Limited Access permissions in " + $stopwatch.Elapsed.TotalSeconds.ToString("#.##") + " Seconds" | out-file $Log -Append
}

Some parting tips to avoid problems:

  • Only break permission inheritance where you really need to.
  • Use SharePoint or Active Directory groups instead of individual users wherever possible.
  • Regularly reassess where you have broken permission inheritance to see if it can be re-inherited.
  • Have a process in place for cleaning up users that have left the company. — While SharePoint does have functionality to automatically clean up user profiles within the User Profile Service Application, it does not have any process for removing any user permission at the site level, or for deleting the user from the site collection. That is a manual step that you should have a process around. For example: x number of months after the user is terminated, run Remove-SPUser for them across all of your site collections.
  • See this for more information: Best practices for using fine-grained permissions in SharePoint Server

Add a Comment