Outage this morning

Demon_Skeith

Administrator
Staff member
Administrator
Credits
49,285
Steal Penalty
You're Rich Money Bags Award
Profile Music
From my email reports from pingdom, it seems from 3:40 to 4:40 AM CST today that our hosting was down which of course means GF was down as well. Though being so early in the morning I doubt many of you were trying to access the site but here is the report what happened:

Below is the official RFO from the Tampa Data Center:


=====
This morning at roughly 4am EST we recognized a portion of our network was unreachable.  Most of our customers were not impacted but the portion that was were effectively offline.  We isolated the issue to the failure of a layer 3 device.  

This device failure caused an uncontrollable layer 2 network loop.  Our network is designed with protected paths but this device failure resulted in a loop pattern that had unexpected results.  

The physical ports towards the failed device required manual intervention to restore network services.   Once we took manual action to correct the situation we then went ahead and replaced the failed device.  Most customers began seeing restored connectivity around 5:30am with 100% of our customers being back to normal around 7:45am.

Now that we have had an opportunity to study the situation we have immediate plans to deploy control mechanisms that will ensure there is always bi-directional communication on the physical port’s UDLD (unidirectional link detection).  This will ensure that a similar failure in the future will be rectified within seconds.

Additionally, we are aware that our lines of communication were less than acceptable during this event.  It has been several years since we have had a network event of any magnitude and the fact is our outreach protocols have not needed to be put into practice in a long time. 

This morning’s events will result in much better communication from us in the future at the times you need it most.

We apologize for any inconvenience this disruption in service may have caused you.  Please open a trouble ticket, live chat or phone us if you have any questions what-so-ever.

=====

When we're dealing with issues that pertain to our servers specifically, we are able to communicate issues effectively and as they happen. However, with a network issue of this magnitude, we're limited in what we can report in a timely fashion based on what we hear from the data center. We've been working with this Data Center since 2010. Their service has always been stellar with a very solid network. We sent techs to the data center as soon as we started seeing timeouts but they quickly determined the issue was not with our hardware, at which point we were awaiting word from the facility techs at the data center. We're certain the data center will better handle an issue of this magnitude in the future which in turn will allow us to maintain our normal level of communication.

We sincerely apologize for the situation for this morning.

If you know of any issues happening today with our hosting please let me know.
 
Back
Top