Powershell : Diagnosing TCP Session Exhaustion


If you have a sever that runs out of TCP sessions and then causes bottlenecks or famous complications because it cannot create new TCP sessions then you can get mirage of symptoms including, but not limited to, likewise not all these issues are caused by session exhaustion:

  1. Trust operations might fail between domain controllers
  2. Replication might fail between domain controllers
  3. Inability connect to file shares on a remote server
  4. DNS name registration might fail
  5. Kerberos/NTLM Authentication might fail
  6. MMC consoles won’t work or won’t be able to connect to remote servers.
  7. Network connection errors with "connection refused" or "network unreachable"
  8. Web browsing issues with errors like "unable to connect"  or "server not found"
  9. Slow application performance and/or freezing
  10. High CPU usage (when processes wait for sessions)
  11. High Memory usage (when processes wait for sessions)

Netstat : Friend on the Chaos

To check this you can use the command:

netstat –ano

This will then show you all the session which will include TCP and UDP like this, however here this is TCP and you can see the Source, Destination, State and PID (or the process ID)



Netstat : Friend in the Chaos with Process ID

If you update that command to:

netstat –anob

Then this will give you the process name as well to the list, like this, the same as before but with the process over there on the left (in this case vnetd.exe)


Finding Process ID's with lots of connections

You will see that all these processes have different a process ID (or PID) on the whole, you have a couple there with two processes however if you have lots of the same processes with the same PID as below:


This can be an indiciate sign of process exhaustion, if you have a single process talking to a single remote address or seberal remote addresses with the state of TIME_WAIT that usally points that to the fact that the process is not releasing those session ports correctly.

Netstat : Friend in the Chaos with TIME_WAIT

If you wish to look for only these connections then you can use this command:

netstat -anob | findstr "TIME"

Check Session Status

We now need to check the status of the device that is "malfunctioning" so for that we can use a script that will count all the session states, lets get cracking.

Script : TCPSessionChecker.ps1

This will check the number of ports for each which the colour coding system of:

Count under 150  : Green
Count between 150-200 : Amber
Count over 200 : Red

This is the script:

function Write-ColorizedOutput {
    param (
        [string]$Label,
        [int]$Count
    )
    
    if ($Count -gt 200) {
        $color = 'Red'
    } elseif ($Count -gt 150) {
        $color = 'DarkYellow'  
    } else {
        $color = 'Green'
    }
    
    Write-Host "$Label Count: $Count" -ForegroundColor $color
}

# Get count of Time_Wait connections
$timeWaitCount = (Get-NetTCPConnection -State TimeWait).Count
Write-ColorizedOutput -Label "TimeWait" -Count $timeWaitCount

# Get count of Close_Wait connections
$closeWaitCount = (Get-NetTCPConnection -State CloseWait).Count
Write-ColorizedOutput -Label "CloseWait" -Count $closeWaitCount

# Get count of Established connections
$establishedCount = (Get-NetTCPConnection -State Established).Count
Write-ColorizedOutput -Label "Established" -Count $establishedCount

# Get count of Listen connections
$listenCount = (Get-NetTCPConnection -State Listen).Count
Write-ColorizedOutput -Label "Listen" -Count $listenCount

That will look like this when you run it, notice this server is all green:


High CloseWait or Time_wait is not good

If you are getting over a count of 200 in the CloseWait or Timewait then this could point to port exhaustion, this is where you need to either restart the process causing the issue or optimse your server to see if that helps with the issue you are expeirencing.

Check the EventLog for Event ID 4227/4231

The Windows Event Viewer might log errors related to network issues, such as Event ID 4227 or 4231, indicating TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.


Check assigned TCP ports to Windows (and optimise Timeout)

This can we done with a script and that script will check the values and then set the recommended settings to "allow more" connections with a custom smaller timeout value.

Note : If you make the timeout value to short you will cause more issues than you solve, but this can be customised with this script.

Script : TCPRemediate.ps1

When you run this script it will check the set values and based oin the values in the script will recommend values on those variables, the recommended variables are in bold:

# Function to check and set registry value
function Set-RegistryValue {
    param (
        [string]$Path,
        [string]$Name,
        [string]$Type,
        [object]$Value
    )

    # Check the current value
    $currentValue = Get-ItemProperty -Path $Path -Name $Name -ErrorAction SilentlyContinue

    if ($null -eq $currentValue) {
        Write-Output "$Name is not currently set."
        $currentValue = "Not Set"
    } else {
        $currentValue = $currentValue.$Name
        Write-Output "Current value of $Name is $currentValue."
    }

    # Prompt the user for confirmation to update
    $response = Read-Host "Would you like to set $Name to $Value? (Y/N)"
    if ($response -eq 'Y' -or $response -eq 'y') {
        Write-Output "Setting $Name to $Value at $Path"
        Set-ItemProperty -Path $Path -Name $Name -Value $Value -Type $Type
    } else {
        Write-Output "$Name will not be changed."
    }
}

# Paths and recommended values
$tcpipParametersPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"

$settings = @{
    "TcpTimedWaitDelay"  = 30
    "MaxUserPort"        = 65534
    "KeepAliveTime"      = 300000
    "KeepAliveInterval"  = 1000
}

# Check and prompt for each setting
foreach ($setting in $settings.GetEnumerator()) {
    Set-RegistryValue -Path $tcpipParametersPath -Name $setting.Key -Type "DWORD" -Value $setting.Value
}

# Flush DNS cache
$response = Read-Host "Would you like to flush the DNS cache? (Y/N)"
if ($response -eq 'Y' -or $response -eq 'y') {
    Write-Output "Flushing DNS cache"
    ipconfig /flushdns
} else {
    Write-Output "DNS cache will not be flushed."
}

# Prompt to restart TCP/IP service
$response = Read-Host "Would you like to restart the TCP/IP service to apply changes? (Y/N)"
if ($response -eq 'Y' -or $response -eq 'y') {
    Write-Output "Restarting TCP/IP service"
    Restart-Service -Name "Tcpip" -Force
} else {
    Write-Output "TCP/IP service will not be restarted."
}

Write-Output "Script execution completed."

This is an example of the script in action, when running you need to do this from an elevated powershell console, if not it will fail:


Detailed Routing Capture?

If you run this Powershell this will output the destination server and port of the time_wait connection, this is the script:

# Lists all Time_Wait bottom gives count of Time_Wait connections.
Get-NetTCPConnection -State TimeWait
(Get-NetTCPConnection -State TimeWait).Count
# Lists all Close_Wait bottom gives count of Close_Wait connections.
Get-NetTCPConnection -State CloseWait
(Get-NetTCPConnection -State CloseWait).Count

This will output more detail on the session status of the TCP sessions

Summary of Port Exhaustion

The issues you get from this are unique to the problem you have with this symptom therefore no blog post can diagonise every case and find a fix for all cases, it takes a lot of investigation and troublshooting.

While rebooting the server will fix the issue temporarily, this is not a long term solution or fix.

Microsoft does a good job covring all the basic troublehooting as well on this article here

Previous Post Next Post

نموذج الاتصال