When traces lie…

Interesting issue pops into my email box – when calling India, the call goes through to local 911 emergency services instead.  Not surprisingly, this email is marked with high priority.

So diving in, I have the user make test calls and we prove that calls to England, France and other international destinations work splendidly.  Not the same story with calls to a certain number in India- let’s just say local emergency dispatchers aren’t looking to be friends with voice engineers making test calls, even ones with charming southern accents.

In this case, all the calls are dialed the same way: 9011[country code][number], but the number to India happens to be 90119111XXXXXXXX.  As you may have noticed- 911, emergency services in the US, is part of the dialed number.

So what would make the Call Manager or the router- not sure where to lay the blame at this point since a PBX isn’t involved- ditch the 9011 and send 911 out to the PSTN? Good question.

Time for the Dialed Number Analyzer to save the day! Punch in the digits, click “Do Analysis” and get back 9@ as the matching route pattern. Cue icky feeling in stomach.  For those who aren’t familiar with why 9@ just sucks in your dial plan, please click on over to @networkingnerd’s blog post: http://networkingnerd.net/2011/05/26/9-must-die/ for a nice write up on the tawdry subject. If that doesn’t convince you, know that if you use it, I will hunt you down and…uh, let’s get back to the story…

In an attempt to thwart 9@, I create my own international dialing pattern the way god intended international route patterns to be, making sure my CSS/partition trumped that of the pathetic 9@ pattern.  Testing commences and the user’s test call goes through successfully! Huzzah! No more making crank calls to grumpy 911 operators.

Just for good measure, I have the user do one more test as I quietly pat myself on the back. This time, though instead of hitting redial, which unbeknownst to me, he had been doing with the previous calls, the user this time dials the number digit by digit.  I then hear the melodious “your call can not be completed as dialed…” message. Huh?

Having put my self-congratulatory speech on hold, it’s time for more debug and log collections.  At this point things go from slightly askew to downright wonky.  DNA tool says I’m still matching 9@.  *Gasp* – the DNA tool is lying to me! Viewing the router debugs I can see that my pattern has changed what the router was sending out to the PSTN from 911 to 011911- which, while not actually routable, is solid proof my new route pattern in Call Manager is being hit.

Then TAC tells me the trace files show that Call Manager quits collecting digits after the 9 and 0 are dialed for calls placed to the 9011911XXXXXXXX destination, but that the Call Manager collects all the digits dialed for any other international destination. Wait, what?  How does it know after my 9 and 0 whether I am going to dial India or Timbuktu? According to the trace files, though, Call Manager can predict if I’m going to call India before I even dial it. I know the system is good, but I didn’t think it had progressed to mind reading yet.

And what about using redial?  The system apparently collects all the digits there too. Somehow Call Manager *knows* when I’m going to dial India using the keypad, but if you hit redial it’s magical predictive powers are somehow temporarily suspended and the call sneaks on by.

To quote one of my favorite shows of all time: “this is all making a kind of sense that’s… not.”*

Feeling betrayed by my trusty tools and trace files, I am left to conclude that the system is as utterly confused as we are about what is actually going on under the hood.  So it’s back to basics- call routing appears to be the issue, time to review the system’s infernal route patterns yet again.

At this point, I’ll note that in addition to 9@, there is also present a 9011@ pattern. Previously we all blew this pattern off because all the evidence indicated this wasn’t ever being matched by anything. Now that the evidence is suspect at best, a closer look is warranted. We proceed to put the 9011@ pattern in a partition nothing has access to. We test and alas we have true success.

So what to make of this?

Number one and most importantly: never, ever use @ in your route patterns if you can help it.  It’s just wrong, wrong, dirty and wrong.  Also, it appears to completely goof up the Dialed Number Analyzer, so keep that in mind when troubleshooting such patterns.

Number two: tools are useful, but not always accurate. Corollary, trust – but verify. Take output from as many sources as you can to build a full picture of the puzzle, especially if one or more of the tools at hand are spitting out results that defy logic.

Number three: some clues throw you off track. In this case, the redial working pointed to a digit timeout issue, but other international calls were fine, so we put this on the back burner. Turned out to be a good decision.

So one mystery still remains: why the heck did the redial work? Anyone with thoughts/theories please feel free to comment, I’d love to hear your ideas on the subject…I wouldn’t rule out black magic and powers of unspeakable darkness…

*in case you were wondering, quote is from Buffy the Vampire Slayer, episode Becoming- Part 2, a series chocked full o’ excellent one-liners…