Runt Post: Quality troubleshooting, what it looks like

In my previous post, I shared some of the cool stuff ThousandEyes is doing with VoIP.  I also wanted to draw attention to this cool video of Mohit Lad, co-founder and CTO of ThousandEyes, using his own product to troubleshoot an outage event on the fly: http://vimeo.com/105805525

There are very few ways to show off your product better than this type of demonstration. Mohit troubleshoots with expertise, clearly in his element. The tools cater well to his methodical troubleshooting process and both are quite impressive. Plus the routing loop he finds is just darn cool.

photo 100000 (8)

Watch it, you’ll love watching a master at work, I know I did.

Published: 9/26/2014

Disclaimer: While Networking Field Day, which is sponsored by the companies that present, was very generous to invite me to this fantastic event and I am very grateful for it, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.

 

When switch ports lie…

Ah, that moment when you solve some weird IT issue in record time while a suitably impressed customer watches. The customer thinks you’re a genius yet you know you honed in on the solution so quickly not because of your crazy amount of talent and good looks (although these helped of course), but because you instantly recognized the peculiar, tell-tale signs of the odd ball problem at hand. Somewhere along the line you had seen it before and the memory stuck with you. And likely keeps you up at night, but that’s another story…

For that reason, I offer you this: a short recollection of symptoms I experienced when troubleshooting a data cable that turned out to be *mostly* plugged in.

After some re-cabling recently, my team was tasked with testing to confirm everything was coming up roses.  All was good except for one thin client workstation out of 12. This one thin client couldn’t get a DHCP address, but every other device plugged into the same switch was good and chugging along.

I initially checked that the switch port was up and confirmed a physical status light of green as well.  I also checked to see that a mac address was being learned off the port in question and looked at the interface statistics. From a physical perspective, things were looking good. I did try another port on the switch, but the issue persisted. At this point, some engineers would be inclined to look higher up the network stack or at a misconfiguration of the client. The single reason I hesitated to pursue this line of inquiry: I could reliably confirm the client worked before and that only the wiring had changed.

Now my regular blog readers can probably predict what I did next – yep, Wireshark to the rescue!  In short order I had a SPAN port setup on the switch and within minutes I was looking at packet capture goodness.  Want to take a guess what I didn’t see in the capture?  A DHCP request from the client.  In fact, I wasn’t seeing any packets from the host in question. Not what I was expecting.

Knowing packets don’t lie, I had to assume the issue was lower in OSI stack and sure enough, upon closer investigation, the wall cable to the device was just a wee bit loose.  Now admittedly, I could have gone over to that room and checked the cable first.  A loose or damaged cable would definitely have been a likely cause given the re-cabling scenario. Let’s be honest, however, I’ll use any excuse to fire up a packet capture.  I highly encourage this habit too because seeing what “normal” and “abnormal” traffic looks like will go along way toward making you a better engineer in the long run.

So there you have it.  Port looks up from a physical perspective, switch sees something alive on the port, but the device can’t get a DHCP address, you should be sure to check the cable to see that it’s seated properly. For some of us, this is old habit anyway, but now you can add this to the list of symptoms for this type of hardware issue. Just another helpful way to streamline your troubleshooting process.

Published 10/15/2013