AFN COE success story
Earlier this year, one of our big clients implemented a new video- and conferencing-technology. However, shortly thereafter, it started to continuously experience issues with maintai
ning sign-on connectivity to MS 0365/Teams.
This issue continued for an extended period and, in this time, the only workaround was to re-route the MPLS traffic across the SD-WAN.
This issue became more prominent at the head office campus, where the SD-WAN was not in use. So, although the technical teams were unable to figure out what was causing this intermittent problem with renewing the public certificate – the challenge then was that the executives were moving back to campus and needed connectivity.
The offline captures were done at the customers’ headquarters. The VC technical team connected th
e units (which were giving the issue) to a SM server via SPANs; this was done to eliminate any possible laptop NIC issues, which would skew results.
The tests were done using the corporate network. It was connected directly to the internet (LTE) because, like with the SD-WAN, this would eliminate the central FW/proxies.
During corporate network tests, despite using direct internet access – connection to the O365 and the single sign-on ability failed. To complete the test, we required a sample of a working device; we also had to move the investigation to another regional office that was experiencing similar challenges.
However, when on direct internet, this unit – and the captures – performed successfully. Through this, we learned that – when directly connected to the corporate network – the ‘certificate update’ domain, which needed to be initiated, was failing.
This led to us asking for a review of the security rules, on the firewall, for the devices. These were all locked down to source IPs and specific ports – and not to specific external paths or URIs.
During our troubleshooting phase, we continuously kept the customer updated on the progress, as well as wth regards to the action plan and associated activities.
Upon further investigation, we discovered that higher up the priority tree, on the firewall access control lists – there was a rule blocking the certificate domain.
This meant that, even though the devices and service ports were allowed further down the tree, most of them were working in a top-down priority list. So, in essence, if a rule higher up was blocked, then the lower rules won’t be honoured.
The security team was asked to check the rules on the firewall again and confirm whether there were any rules in place that were blocking anomalies from the source devices. They checked and confirmed that there was a rule on the firewall that was, in fact, blocking this traffic. Unfortunately, this had been overlooked in the troubleshooting process.
However, the firewall rule was corrected and we were able to establish the root cause. As a final test, we went back to the site to confirm whether there were still issues with the corporate network, which had failed previously.
We ascertained that there were no corporate network failures – meaning that the device could successfully do its certificate ‘call-home’ and proceed to the sign-on page.
In this case, the rules on the firewall were overlooked, causing major issues internally for the customer. However, we are glad to report that the problem was resolved.
The takeaway here is to make sure that teams follow a strict troubleshooting process and that nothing is ruled out until all avenues of investigation have been exhausted.
This process must include all teams involved – and communication and feedback are of utmost importance. Because, once again, we have proved that teamwork truly pays off.