An Outage Avoided
I recently received a note from David de’Marsi, a Senior Network Engineer and an Alcatel-Lucent Certified Field Expert (ACFE). Dave works for one of our business partners, Pinnacle Communications Corporation. Pinnacle is a specialist in the hospitality market, and they have a nationwide network to offer cloud-based services to their customers. You can read more about Dave below...
Pinnacle’s data center switching solution is based on OmniSwitch 6900 and their hosted customers use OmniSwitch 6860s to terminate connections from Pinnacle. These hosted customers include Pinnacle’s own employees, so you already know Dave’s dealing with a very demanding clientele!
Dave’s note was unusual because it involved a story, a story with a happy ending. Let’s hear it from Dave:
“On November 5th at around 9:55am, an Ethernet patch cord that connects our Alcatel-Lucent OS6900 virtual chassis to an Alcatel-Lucent OS6860 switch which feeds our primary customer VPN gear shorted out. This short caused a loop in our Hosted production VLAN.”
Now, normally, a loop in your network, particularly a loop at the heart of your network – that’s the stuff of network engineering nightmares. Angry mobs of users are imminent.
“Our core OS6900s realized they had a redundant path, so they shut down the problematic interface. All traffic automatically shifted to the other functional cable path and services were restored. This was all done without my intervention and before I received email notification for the issue.
Now those of you in the know might be thinking “big deal, that’s what spanning tree does”. Dave explains…
“Normally Spanning Tree (which is configured on the OS6900s) would also catch this issue, but would take between 30 seconds and 5 minutes to re-converge, and during this time the Hosted production VLAN would be unreachable.”
30 seconds to 5 minutes of downtime for all connected hosted users – both paying customers and Dave’s own colleagues -- that’s a BIG deal. For a hosted service provider, this is the kind of downtime that can cause financial penalties or even customer losses.
I reviewed the logs and identified the specific Ethernet patch cable that was the issue and replaced it. Once I did, the OS6900 cores automatically re-enabled the redundant cable path and balanced the current traffic load across the two links. Luckily, I was onsite, but even if I wasn’t, the network would have continued to pass traffic without human intervention. Kudos for making intelligent gear that saved us from a larger outage!
The “intelligent gear” Dave refers to is courtesy of our Intelligent Fabric technology – a “Best of Interop” winner earlier this year. Self configuring, self-healing and self-attaching, it’s become the technology of choice for Pinnacle’s data center. And Dave’s tale is a great real-world example of how it’s making business more agile and responsive to the needs of their users.
Thanks for the story, Dave. And thanks to Pinnacle Communciations Corporation for trusting your network to us.