Voice basics: troubleshooting a failed outbound fax

Faxing is a technology that instead of nuking it from orbit (the only way to be sure), we’ve propped it up and tried to make it part of the VoIP world, resulting in a whole lot of troubleshooting and whole lot of bang-head-here moments for voice engineers.

While time, variances in equipment, and sheer PTSD keep me from exploring all the ways in which faxing can suck go wrong, I thought I’d throw out a recent example of an all too common occurrence – proving your fax machine isn’t the (biggest) offender in an outbound communication failure.

Specifically, this example deals with an XMedius fax server, a Cisco voice gateway with PRI, and a who-knows-what fax endpoint on the other side.  Your mileage in fax troubleshooting may and likely will vary, just keep that in mind and a drink at hand.

The first step in dealing with one of these reported issues (after cursing, of course) is to determine if it’s an isolated incident or possibly a dialing issue.  Besides calling and confirming* a fax machine actually picks up, checking your inbound and outbound logs on the fax server can quickly quell those reports the server is down when someone really forgot to dial a 9 when sending the fax. Happens all the time.

In my case, I had plenty of inbound/outbound successes to determine this was an isolated case.  I also had the packet capture feature of XMedius turned on.**

This feature is brilliant, truly not an understatement.

I opened the packet capture for one of the failed attempts, navigated to Telephony -> VoIP Calls -> and then selected Flow for my call.  When you do this, there will be quite a bit of information presented in graph form.

You should be looking for a few basic things in particular:

  • Do you see the call ever connect?
  • Do you see the sender’s cng (calling tone) packet?
  • Do you see a DIS (Digital Identification Signal) from the remote endpoint?
  • Do you see the sender’s training message?
  • Do you see the remote endpoint’s CFR (confirmation to receive)?

In my flow graph of the not-so-happy fax, I notice that even though I’ve made contact with the (whiny) fax machine on the other side and negotiations have been successful – the remote endpoint never sends a CFR, therefore the server will not send the fax data.

The fax server tries again and again to elicit a response, but there’s only silence from the other side.  I assume because the remote endpoint realized that for every successful fax, a puppy dies.  Well, that’s the rumor I’ve heard (or started).

Here’s an excerpt from the flow graph, definite lack of CFR.

No CFR

Below is flow graph of a fax that the server sent successfully to another number.  While there are differences, you can see that CFR goodness the flow graph above is missing.

Successful Fax

After reviewing this information, I moved onto finding out if the voice gateway ever sees the CFR and maybe just forgets to send it along.

After working with TAC and doing a PCM capture on the gateway, I was able to confirm that the remote endpoint never sends the CFR, which meant I could declare with some amount of relative certainty that this was a whole lot of not-my-problem.***

TAC even provided me this handy-dandy flow graph built from the captures we took on the gateway, you can see that the fax server tries three times (TCF (9600)) to get the remote end to cough up a CFR, but no dice.

outbound fax flow

While this just scratches the surface, these basics, along with a formidable hammer, should get you started in your fax fighting mission. Just remember to really effectively troubleshoot a fax machine, it’s all in the swing…

 

Published 4/10/2015

*Do not skip this step. Never assume a user is asking you about problems with a working telephone number.  Always test from outside your phone system to confirm that the phone number in question hasn’t been disconnected or written down wrong by the user. This will save you countless hours and possibly what’s left of your sanity.

**Check your XMedius Administrator’s guide or call their support for steps to turn on this feature, it’s a pretty straightforward process and well worth the time.

 ***Trust me there are no absolutes in fax, unless you’re talking about frustration, that part is guaranteed.

Runt Post: Quality troubleshooting, what it looks like

In my previous post, I shared some of the cool stuff ThousandEyes is doing with VoIP.  I also wanted to draw attention to this cool video of Mohit Lad, co-founder and CTO of ThousandEyes, using his own product to troubleshoot an outage event on the fly: http://vimeo.com/105805525

There are very few ways to show off your product better than this type of demonstration. Mohit troubleshoots with expertise, clearly in his element. The tools cater well to his methodical troubleshooting process and both are quite impressive. Plus the routing loop he finds is just darn cool.

photo 100000 (8)

Watch it, you’ll love watching a master at work, I know I did.

Published: 9/26/2014

Disclaimer: While Networking Field Day, which is sponsored by the companies that present, was very generous to invite me to this fantastic event and I am very grateful for it, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.

 

When switch ports lie…

Ah, that moment when you solve some weird IT issue in record time while a suitably impressed customer watches. The customer thinks you’re a genius yet you know you honed in on the solution so quickly not because of your crazy amount of talent and good looks (although these helped of course), but because you instantly recognized the peculiar, tell-tale signs of the odd ball problem at hand. Somewhere along the line you had seen it before and the memory stuck with you. And likely keeps you up at night, but that’s another story…

For that reason, I offer you this: a short recollection of symptoms I experienced when troubleshooting a data cable that turned out to be *mostly* plugged in.

After some re-cabling recently, my team was tasked with testing to confirm everything was coming up roses.  All was good except for one thin client workstation out of 12. This one thin client couldn’t get a DHCP address, but every other device plugged into the same switch was good and chugging along.

I initially checked that the switch port was up and confirmed a physical status light of green as well.  I also checked to see that a mac address was being learned off the port in question and looked at the interface statistics. From a physical perspective, things were looking good. I did try another port on the switch, but the issue persisted. At this point, some engineers would be inclined to look higher up the network stack or at a misconfiguration of the client. The single reason I hesitated to pursue this line of inquiry: I could reliably confirm the client worked before and that only the wiring had changed.

Now my regular blog readers can probably predict what I did next – yep, Wireshark to the rescue!  In short order I had a SPAN port setup on the switch and within minutes I was looking at packet capture goodness.  Want to take a guess what I didn’t see in the capture?  A DHCP request from the client.  In fact, I wasn’t seeing any packets from the host in question. Not what I was expecting.

Knowing packets don’t lie, I had to assume the issue was lower in OSI stack and sure enough, upon closer investigation, the wall cable to the device was just a wee bit loose.  Now admittedly, I could have gone over to that room and checked the cable first.  A loose or damaged cable would definitely have been a likely cause given the re-cabling scenario. Let’s be honest, however, I’ll use any excuse to fire up a packet capture.  I highly encourage this habit too because seeing what “normal” and “abnormal” traffic looks like will go along way toward making you a better engineer in the long run.

So there you have it.  Port looks up from a physical perspective, switch sees something alive on the port, but the device can’t get a DHCP address, you should be sure to check the cable to see that it’s seated properly. For some of us, this is old habit anyway, but now you can add this to the list of symptoms for this type of hardware issue. Just another helpful way to streamline your troubleshooting process.

Published 10/15/2013