Intercom

intercom home  |  advanced search  |  about intercom  |  alerts  |  faq  |  help     Search Intercom

We experienced a major network outage from about 8:30 on Friday morning to about 12:30 Saturday afternoon. Most, though not all, of our network-based servers were inaccessible during that time.  The servers themselves were just fine, but there was so much error traffic on the network that it prevented access to them.  We want to tell you about what caused the problem and some steps we are taking to minimize the possibility of this event reoccurring, as well as some things you can do to help us prevent this kind of thing.

 

The culprit turned out to be one small four-foot cable in a vacant office that was tampered with, most likely just to get it out of the way or off the floor, on Friday morning.  Without going into great technical detail, the effect of this was to create a "data storm" of error traffic on the network, consuming the processors of many of our network switches which could not process “legitimate” data packets.

The diagnostic process involves sequentially isolating segments of the network until you locate the segment from which the error traffic is originating.  It's a laborious effort. ITS staff, with support from our network vendor, worked throughout the day and night on Friday and were finally able to narrow it down to a network segment that included a main data center room in Philips Hall and all of Muller Center, and from there were able to subsequently narrow it down to somewhere in Muller.  At that point we physically cut Muller Center off from the network and all services began to come back up as normal.  A subsequent physical inspection of all ports in Muller Hall turned up the culprit of the cable that was causing the problem.

We are conducting a thorough post mortem technical analysis of options we might have for preventing an occurrence like this in future from effecting our entire network, and we are pretty confident that there are some key steps we can take quickly.   What you can do is to help spread the word to call ITS if there is a cable or a computer, or anything that might be network related, that is in your way or needs to be moved for whatever reason. It's kind of like the New York state "Call Before You Dig" campaign.  Call us.  We’ll be more than happy to address your problem in a way that does not disrupt the network.  This incident brought virtually the entire campus to a standstill for a full business day, and produced a sleepless night for a number of ITS staff. There are things we'll be doing to help prevent this, but you can help too!  Just call the ITS Help Desk at 274-1000 if you need to change or move something that you think even might be connected to the network (a device, a cable, anything).

We apologize for the inconvenience of this outage.   No data network can be made 100% "bullet proof". Equipment failure can happen, and human errors will occur.   But with our technical efforts and your cooperation we can minimize the chances that this happens again.  Thank you for listening, and for your patience.

 

 Friday's Network Outage: What Happened? | 2 Comments |
The following comments are the opinions of the individuals who posted them. They do not necessarily represent the position of Intercom or Ithaca College, and the editors reserve the right to monitor and delete comments that violate College policies.
 Friday's Network Outage: What Happened? Comment from ntucker1 on 08/02/11
i was probably one of many who called (not to complain, just
to ask about the cause and estimated downtime) and i was
treated very nicely by ITS. this post proves the good nature
of the entire ITS staff. a big thank you for doing the best
you could the situation!
 Friday's Network Outage: What Happened? Comment from kreeter on 08/03/11
Thank you for sharing this information and for the collective efforts to diagnose and remedy the outage and communicate updates to the users. I will spread the word via staff council for people to call ITS [x3282] before moving a cable or computer or un/plugging one.