Though the title gives away the issue, its more complex than that when you are in the mist of the issue you are trying to fix, so this is something I call the "ohhh nooo" second, you know something is wrong but at the time you have that feeling you are not sure how wrong stuff is.....
Hybrid Exchange?
If you have local Exchange and Exchange Online and you require both of these environments, you have what is called a Hybrid setup, which means if you have local mailboxes on Exchange they use the Hybrid to send mail to EXO and then in EXO mails designated for Local Exchange will use the Hybrid connection to be sent locally to Exchange.
Hybrid DNS and TCP Ports?
This is usually routed via a DNS name that is secured with a certificate that identifies that Hybrid connection, which for the majority required TCP:25 and TCP:587, which TCP:587 is more the native secure port, you can also secure TCP:25 with a certificate using STARTTLS, so when the certificate has expired it can be a little shock......
Exchange Queue Checking
This is the check I do for queuing mails in Exchange, the command queries all the Mailbox servers (the one with queues) and then retrieves all the queues that have a message account above 10 from all the mailbox servers....
Get-MailboxServer | Get-Queue -Filter "MessageCount -gt 10"
This usually returns a line feed like this, which is good, nothing in the queue.....
[PS] C:\Windows\system32>Get-MailboxServer | Get-Queue -Filter "MessageCount -gt 10" [PS] C:\Windows\system32>
The ohhh-no Feeling
When you run the command above and what you see is not normal at all, you get that sinking feeling that something has gone very wrong somewhere.....
BearA\6 Retry 1443 Normal 0 <exo connector name> BearB\6 Retry 1122 Normal 0 <exo connector name> BearC\5 Retry 1442 Normal 0 <exo connector name> BearD\6 Retry 800 Normal 0 <exo connector name> BearE\6 Retry 2887 Normal 0 <exo connector name>
The <exo connector name> will be your actual connector name but its been changed for security here, but here you can see rather than a clean queue we have 7,694 messages queuing up, well that's not good at all
Get with the Power of Powershell
The first thing I think of is can I resume the queue with this command:
Resume-Queue BearA\6
This should then make the queue say "Active" and processing the backlog will commence, however this command did not fix that issue, the queue stayed in "Retry" - therefore someone more fundamental is wrong here, all the queues behaved like that, they would not enter "Active" but say in "retry"
Locate the real error, with LastError
So we need to look a little deeper, so we need to know what it means by retry, so lets get that from the queue with this command:
Get-Queue -Identity BearA\6 | fl
This will return a list of variables and the "LastError" is the one you are interested in.....
If you only want the see the error you can run this, that will only show you the error field....
Get-Queue -Identity BearA\6 | fl LastError
Now we can where the error is the actual error is this: The certificate specified in TlsCertificateName of the SendConnector could not be found.
Right, this helps a lot, this is telling me that the certificate use to send and receive e-mails does not have a valid name in the TLSCertificateName on the send connector, right something go work with
Confirm with the Send Connector
The send connector is how the e-mails are sent from your local Exchange to the outside world and if you are in a hybrid environment the EXO mailboxes as well, so we need to check that with this:
Get-SendConnector
That will return all your Send Connectors, like this
Internet Mail (Primary) {SMTP:*;10} Outbound to Office 365 {smtp:growllikeabear.mail.onmicrosoft.com;1}
This shows we have two send connectors, so the question is which one is giving us the issues, well lets get some more PowerShell out to fix this
Get-SendConnector | fl Name,TLS*
This will tell us, that both send connectors use a TLS certificate to send to both the internet and Office 365 or EXO like this:
Name : Internet Mail (Primary)
TlsDomain :
TlsAuthLevel :
TlsCertificateName : <I>CN=Some Random CA, C=US<S>CN=mailrelay.pokebearswithsticks.com, O=BearWorld, L=Bearville, C=US
Name : Outbound to Office 365 TlsDomain : mail.protection.outlook.com TlsAuthLevel : DomainValidation TlsCertificateName : <I>CN=Some Random CA, C=US<S>CN=mailrelay.pokebearswithsticks.com, O=BearWorld, L=Bearville, C=US
Right, so this is affecting both connectors and it does not like something in the TlsCertificateName so it does not like something with the certificate mailrelay.pokebearswithsticks.com
Query Certificates in Exchange
Right, that helps with the issues, next we need to see what is wrong with the certificate and why it is not working.....so lets look at those certificates with this:
Get-ExchangeCertificate | fl Subject,NotAfter,Status
That will tell us this....
Subject : CN=mailrelay.pokebearswithsticks.com, O=BearWorld, L=BearVille, C=US NotAfter : 06/06/2023 17:04:42 Status : DateInvalid Subject : CN=Bear1.bear.local, C=US NotAfter : 05/03/2029 17:48:30 Status : Valid
Well, what you do you know the certificate has expired, this now explains the issue we are having with mail queuing, the certificate being used is invalid and there cannot be used by the send connector, that also explains why the resume command fails immediately, you cannot resume with an expired certificate.
SSL Monitoring: Don't let it trip you up!
When monitoring certificates, and with hindsight, I would use an external monitor, as the record internally in your company may be served by an internal load balancer which will give you the internal certificate that will not expire at the same time as the external certificate depending on your setup, this was the case here.
mailrelay.pokebearswithsticks.com | Mar 5 17:48:30 2024 GMT | 264 | Healthy
When the actual truth was this.....
Monitoring was querying the internally assigned certificate as from an internal point for view, that record was being queried by a delegated DNS zone that was in turn querying the wrong certificate, well that an oversight, but something we need to fix here later on.
Generate the new certificate
Now we have found the source of the problem. We now need to get it resolved, this will involve generating. Can you certificate with your certificate authority, this will need to be a public facing certificate authority, and cannot be an internal certificate authority.
The certificate required will handle communications with the Internet, so it’s not acceptable for you to use an internal certificate authority that’s only trusted locally in your company.
The process is the same for all the certificate authorities however, you will need to generate a CSR or a certificate signing, which will generate a public and private key, the private key, obviously stays on your server, you generated it on, and the public key is the one you sent your certificate authority in the form of a CER file
Once you had the response from your certificate authority, you will then need to complete the Sonia request by marrying the response you get from the certificate authority back to the original certificate, which should then give you a certificate with a private key, if you do not have the private key attached to the certificate, you cannot find the certificate to the connector to get mail flowing again.
Visual certificate check….
Before proceeding, it’s definitely worth with a visual indicator double check to confirm you have a certificate that’s valid, so without further ado…..
This is an example of a certificate that will not work, as the red box highlights the fact that the certificate should have a private key, notice this is blank meaning here you have the public key for the certificate but not the private key
However, this one looks healthy, here you can see you have the public and private key with the tell-tale icon here
Right now you have a valid certificate you can move on, lets get cracking.......
Install the Certificate on all Exchange servers
If you have one Exchange server then create a share another server that houses the PFX file and then you can run this.........
Import-ExchangeCertificate -Server "<server_name>" -FileData ([System.IO.File]::ReadAllBytes('<share_path>')) -PrivateKeyExportable:$true -Password (ConvertTo-SecureString -String '<pfx_password>' -AsPlainText -Force)
<server_name> is the name of the Exchange Server like BearA.bear.local
<share_path> is the path to the share like this : \\pfxshare\pfx\mailcert.pfx.pfx
<pfx_password> is the password for the PFX file like this : S3c34tPa55word
If you have more than one server then update the <server_name> for all the servers in your environment, once installed they will appear when you query the server below for the certificates.
Activate the Certificate for SMTP
Now you need to activate the certificate for SMTP services, for this you will require the thumbprint, so to get this run this command:
Get-ExchangeCertificate | fl CertificateDomains,NotBefore,NotAfter,Services,Thumbprint
This will return something like this, redacted for privacy, you will notice that the new certificate (green) has no services linked to it, and the expired certificate (red) does have SMTP linked to it, this is wrong......and needs to be fixed.
CertificateDomains : {mailrelay.pokebearswithsticks.com}
NotBefore : 07/05/2022 17:04:42
NotAfter : 06/06/2023 17:04:42
Services : SMTP
Thumbprint : 061B4B3812C91BF9727242AC4C33C3F384E45617
CertificateDomains : {mailrelay.pokebearswithsticks.com}
NotBefore : 07/06/2023 07:23:29
NotAfter : 07/06/2024 07:23:29
Services :
Thumbprint : 117958926ACC3A5571B7FF683C3C6693007ADFE9
Note: You need to issue this to command to every Exchange server.
Enable-ExchangeCertificate -Thumbprint 117958926ACC3A5571B7FF683C3C6693007ADFE9 -Services SMTP
Then when you run the certificate check command again it will say this:
CertificateDomains : {mailrelay.pokebearswithsticks.com}
NotBefore : 07/06/2023 07:23:29
NotAfter : 07/06/2024 07:23:29
Services : SMTP
Thumbprint : 117958926ACC3A5571B7FF683C3C6693007ADFE9
Run the Hybrid Wizard
Well if after your have done all the above you notice that only once queues resumes processing, or a couple of queues resume processing and the other do not, you have missed something out, if this is your situation it will look like this:
BearA\6 Active 443 Normal -0.72 <exo connector name>