Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
Andy___H1 Super Contributor.
Super Contributor.
177 views

DP 10.30 Exchange 2016 Random Connection Errors

Hi,

Earlier in the week we finally got all our Exchange 2016 clients upgraded to 10.30 to allow for backups to run. We have had 3 successful nights of backups, no failures. Then last night we Critical errors on a number of individual mailbox databases suggesting secure comm errors, but between individual nodes of the Exchange cluster:

  • Cannot connection to Media Agent on system x.y.z, port 100276 (Secure communication protocol negotiation error when trying to establish a connection. Check the validity of certificates and their configuration) => aborting

So our backup is set up to use each of the 16 Exchange nodes, and one DAG node, as their own gateways, we don't use separate "media" servers as gateways. We've been running on 10.30 for 3 nights for Exchange 2016, no issues, 16 nodes, lots of mailboxs

0 Likes
9 Replies
Micro Focus Expert
Micro Focus Expert

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hello @Andy___H1

Even when the MA is another node in the DAG, it's always recommended to run secure comm between all the nodes of the cluster (any kind of cluster).

And if for some reason the active node changes to another server, run this: omnicc -secure_comm -reconfigure_peer VirtualName
(This is just a suggestion in case you do that in the future)

Regards, 

Andres Fallas Salazar
Customer Support Engineer

If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a LIKE by clicking on the bottom at the left of the post and show your appreciation.
0 Likes
michael_d1 Trusted Contributor.
Trusted Contributor.

Re: DP 10.30 Exchange 2016 Random Connection Errors

Thanks Andres. I'm Andy's colleague so thought I'd respond. As per your suggestion we ran the 'omnicc -secure_comm' command against every host in the DAG, referencing every other host and hey presto it worked. No more errors, however this only lasted for two backups. Since then we've been getting errors each night. Each night we're seeing different objects fail so it's not the same ones that fail night after night

System A. Cannot connect to media agent on System B (secure communication protocol negotiation error when trying to establish a connection. Check the validity of certificates and their configuration) .

Elsewhere in the job though we can see objects that did backup fine with System A as the client and System B as the media device (suggesting the certificates are OK?).

You mentioned in your reply that if the active node changes to another server we should run the -reconfigure_peer. Checking with the Messaging team, they have been moving databases between servers for patching purposes but then returning them to their original location. Could this be causing the errors? Are you saying that in this instance we'd need to re-configure the certificates?

 

Thanks,

 

Mike.

 

0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hello @michael_d1,

My guess is that the something in the cluster has changed. I have something for you  to try. We will make all nodes use the same fingerprint/certificate.

  1. Go to the first node and copy localhost_key.pem and localhost_cert.pem stored in C:\ProgramData\OmniBack\Config\client\sscertificates to all the other nodes in the DAG setup
  2. Run omnicc -secure_comm -configure_peer <clientFQDN> -overwrite from the Cell Manager to all physical and virtual nodes in the DAG setup

Regards,
Sebastian Koehler

---
Please use the Like button below, if you find this post useful.
Micro Focus Expert
Micro Focus Expert

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hello @michael_d1

This makes completely sense. The DB is changing the host. And the active node is changing. @Sebastian.Koehler gave a good idea to try, all the server will have the same certificate and can save this configuration. 

Regards, 

Andres Fallas Salazar
Customer Support Engineer

If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a LIKE by clicking on the bottom at the left of the post and show your appreciation.
0 Likes
Andy___H1 Super Contributor.
Super Contributor.

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hi guys,

So we've implemented the change to make all the keys the same in Non-Prod, and tested it out, and it works as expected, all keys the same for the node names and DAG cluster name, and backup test has worked ok.

We've made the same change to the keys in our PROD environment, but have to leave it until tonight to run the backup again, we're not "allowed" to run the backup during the day due to potential impact to Exchange for end users!

So we'll report back when we've had our next backup.

Thanks,

Andy

Andy___H1 Super Contributor.
Super Contributor.

Re: DP 10.30 Exchange 2016 Random Connection Errors

Unfortunately the news is not good for the above. We implemented the change to alter the keys on every Exchange server to be the same, ran secure_comm across them all, and cell manager, and after a single day of a backups with no secure_comm errors (only a single VSS API error), the secure_comm errors returned since every night.

We have now raised a case with Micro Focus for support. Always very difficult for our customer given we can't send logs, but we will at least try.

So our obvious options as we see it whilst waiting for MF support is to:

  • upgrade from DP 10.30 to DP 10.50 in the hope this will fix Exchange secure_comm errors. But this is a long winded process as it has to go through our test environment, and we have 300+ clients to upgrade in our LIVE environment, so can take a couple of months!
  • Or downgrade just the Exchange DP clients back down to DP 9.09, which seemed to work when cell was at 10.30 for a few days and Exchange clients were DP 9.09. We still have the 9.00/9.09 code onsite so could do that, and is obviously quicker to action than the above just to get us a stable backup until we do go to DP 10.50 (or another version!)
  • Or if tonights backup works, as we have reset the secure_comm's on all clients, we automate this reset every day during the day to allow it to work, until we upgrade in the hope that fixes our issues.

My real query is the secure_comm errors we are getting actually a red herring, and it is simply a communications error between Exchange hosts, and not a secure_comm actual issue???

Anyway, lets see how tonights goes, and we'll also see what response we get from MF from our case.

To confirm we would love to send logs to help with this, but it's our customers choice, not ours.

0 Likes
Micro Focus Expert
Micro Focus Expert

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hello @Andy___H1,

Thanks for the update.


@Andy___H1 wrote:

Unfortunately the news is not good for the above. We implemented the change to alter the keys on every Exchange server to be the same, ran secure_comm across them all, and cell manager, and after a single day of a backups with no secure_comm errors (only a single VSS API error), the secure_comm errors returned since every night.

What was the VSS error? Can you share the session report where you found it so maybe it helps us.

My real query is the secure_comm errors we are getting actually a red herring, and it is simply a communications error between Exchange hosts, and not a secure_comm actual issue???.


I have seen secure_communication errors related to various network related problems. So it might be hard to distinguish from time to time. But as long as you can run omnicheck -patches -host <FQDN> to all the virtual and physical nodes from the CM and receive no errors the certificate trust is just fine.

Regards,
Sebastian Koehler

---
Please use the Like button below, if you find this post useful.
0 Likes
Andy___H1 Super Contributor.
Super Contributor.

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hi Sebastian,

Thanks for the response. With regard to the VSS error, this was:

"Error 3 returned from VSS API"

However even with this "error" the job looks to have backed up everything ok, and the VSS error was towards the end of the job, so we ignored it.

With regard to your second query, running the omnicheck command works fine between cell manager and each of the hosts. But to confirm, the secure_comm errors we do get in the failed jobs are always between individual Exchange hosts rather than Exchange to CM. As we use Storeonce Catalyst devices, we have the backup job set up to write out to two different Storeonce devices, so have two DP Catalyst Devices per Exchange VM, so 36 devices in total. This means with a Exchange MDB total of 144, we often have an Exchange MDB on one Exchange host, using a device on another Exchange host when backing up. This causes the cross communication requirement, that randomly fails. So in the same job host1 can talk happily to host2 to backup MDBx, but then on MDBy on the same host1, it fails to talk to host2.

As per most productions environments, we have virus checkers, firewalls, device control software in place and running on all hosts, which may be "hiding" the issues with the backups when they report a secure_comm error it's actually something else, but we've not found issues in these software.

As an update, some good news, last night's job, it worked successfully, no errors at all!! To confirm, after we ran all secure comms between the Exchange hosts, and then from cell to each host during the day, the overnight backup worked without errors. We will continue to do this until we get a more permanent fix, or this "fix" we have applied stops working altogether.

We have established a case with Micro Focus, hopefully between us we can come up with a solution, or if anyone here has any ideas, more than happy to try them.........albeit we can only try this overnight, as we are not allowed to run the PROD backup during the day, and it's only our large PROD backup which has had problems on a regular basis. NP is so small we don't see the issues.

Thanks

Andy

0 Likes
Micro Focus Expert
Micro Focus Expert

Re: DP 10.30 Exchange 2016 Random Connection Errors

Hello @Andy___H1,

I was recently working with a customer where a misconfigured load balancer was causing similar problems. Do you have an additional piece of hardware/software doing load balancing here or is it just basic DNS round robin? It's worth double checking the DNS configuration for all virtual and physical hosts.

Regards,
Sebastian Koehler

---
Please use the Like button below, if you find this post useful.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.