When traces lie…

Interesting issue pops into my email box – when calling India, the call goes through to local 911 emergency services instead.  Not surprisingly, this email is marked with high priority.

So diving in, I have the user make test calls and we prove that calls to England, France and other international destinations work splendidly.  Not the same story with calls to a certain number in India- let’s just say local emergency dispatchers aren’t looking to be friends with voice engineers making test calls, even ones with charming southern accents.

In this case, all the calls are dialed the same way: 9011[country code][number], but the number to India happens to be 90119111XXXXXXXX.  As you may have noticed- 911, emergency services in the US, is part of the dialed number.

So what would make the Call Manager or the router- not sure where to lay the blame at this point since a PBX isn’t involved- ditch the 9011 and send 911 out to the PSTN? Good question.

Time for the Dialed Number Analyzer to save the day! Punch in the digits, click “Do Analysis” and get back 9@ as the matching route pattern. Cue icky feeling in stomach.  For those who aren’t familiar with why 9@ just sucks in your dial plan, please click on over to @networkingnerd’s blog post: http://networkingnerd.net/2011/05/26/9-must-die/ for a nice write up on the tawdry subject. If that doesn’t convince you, know that if you use it, I will hunt you down and…uh, let’s get back to the story…

In an attempt to thwart 9@, I create my own international dialing pattern the way god intended international route patterns to be, making sure my CSS/partition trumped that of the pathetic 9@ pattern.  Testing commences and the user’s test call goes through successfully! Huzzah! No more making crank calls to grumpy 911 operators.

Just for good measure, I have the user do one more test as I quietly pat myself on the back. This time, though instead of hitting redial, which unbeknownst to me, he had been doing with the previous calls, the user this time dials the number digit by digit.  I then hear the melodious “your call can not be completed as dialed…” message. Huh?

Having put my self-congratulatory speech on hold, it’s time for more debug and log collections.  At this point things go from slightly askew to downright wonky.  DNA tool says I’m still matching 9@.  *Gasp* – the DNA tool is lying to me! Viewing the router debugs I can see that my pattern has changed what the router was sending out to the PSTN from 911 to 011911- which, while not actually routable, is solid proof my new route pattern in Call Manager is being hit.

Then TAC tells me the trace files show that Call Manager quits collecting digits after the 9 and 0 are dialed for calls placed to the 9011911XXXXXXXX destination, but that the Call Manager collects all the digits dialed for any other international destination. Wait, what?  How does it know after my 9 and 0 whether I am going to dial India or Timbuktu? According to the trace files, though, Call Manager can predict if I’m going to call India before I even dial it. I know the system is good, but I didn’t think it had progressed to mind reading yet.

And what about using redial?  The system apparently collects all the digits there too. Somehow Call Manager *knows* when I’m going to dial India using the keypad, but if you hit redial it’s magical predictive powers are somehow temporarily suspended and the call sneaks on by.

To quote one of my favorite shows of all time: “this is all making a kind of sense that’s… not.”*

Feeling betrayed by my trusty tools and trace files, I am left to conclude that the system is as utterly confused as we are about what is actually going on under the hood.  So it’s back to basics- call routing appears to be the issue, time to review the system’s infernal route patterns yet again.

At this point, I’ll note that in addition to 9@, there is also present a 9011@ pattern. Previously we all blew this pattern off because all the evidence indicated this wasn’t ever being matched by anything. Now that the evidence is suspect at best, a closer look is warranted. We proceed to put the 9011@ pattern in a partition nothing has access to. We test and alas we have true success.

So what to make of this?

Number one and most importantly: never, ever use @ in your route patterns if you can help it.  It’s just wrong, wrong, dirty and wrong.  Also, it appears to completely goof up the Dialed Number Analyzer, so keep that in mind when troubleshooting such patterns.

Number two: tools are useful, but not always accurate. Corollary, trust – but verify. Take output from as many sources as you can to build a full picture of the puzzle, especially if one or more of the tools at hand are spitting out results that defy logic.

Number three: some clues throw you off track. In this case, the redial working pointed to a digit timeout issue, but other international calls were fine, so we put this on the back burner. Turned out to be a good decision.

So one mystery still remains: why the heck did the redial work? Anyone with thoughts/theories please feel free to comment, I’d love to hear your ideas on the subject…I wouldn’t rule out black magic and powers of unspeakable darkness…

*in case you were wondering, quote is from Buffy the Vampire Slayer, episode Becoming- Part 2, a series chocked full o’ excellent one-liners…

Translating nothing into nothing…

Wanna confuse a just-starting-out voice engineer quickly? Just show them voice translation rules. Seemingly simple on the surface, black magic voodoo underneath.  At least it can seem that way to someone new to voice…

The most recent dark magic I learned to perform came about on an issue I was 90% sure was a carrier issue – I like to hold out a 10% chance that the carrier actually did get it right, it’s only fair.

So a user reports that international calls to Great Britain are failing- no other international calls are failing, just those.  Now, I don’t know about your users, but mine *often* have trouble even figuring out the digits to dial to make a long distance call, so my confidence in them being able to accurately enter an international access code is low. Okay, non-existent.

So we fire up the good ole “debug isdn q931” and to my surprise the user is actually right. Surprise being the appropriate emotion since, let’s face it, that doesn’t happen everyday.  I take a capture of the call failure to Great Britain and a capture of the successful international call and conclude that the carrier must be goofing something up somewhere.

Now, I’m really not a blame-it-on-the-other-guy type of gal, but come on- the dial strings are hitting the same route pattern, sent to the same gateway, to the same dial-peer, and out the same voice port.  And only Great Britain numbers fail – thinking it’s not likely my system- seeing that there’s equal treatment to all things international on this end. I reasonably conclude the carrier switch must have some super special, surely unintentional, non-routing going on.

Arming the user with debugs, I send him on his way to confront the carrier with the proof of their Anglophobic ways. That’s when I learn I have overlooked something significant in the debugs- something the lovely carrier technician pointed out – likely with a smirk on his I-know-I’m-right face.

The q931 debugs showed the “type” for the Great Britain calls being marked with type as “International” whereas the calls for other international destinations were being marked with type of  “Unknown.”  Why is this significant?  Well, the “International” designation when received by a carrier switch causes that switch to prepend a 011 to the dialed string.  In this case, it’s extremely detrimental since 011 was already part of the digits placed on the line.

There are many ways to fix this issue, the one I liked best as you may have guessed, involves a translation pattern and was suggested by one of my brilliant coworkers.

It goes like this:

voice translation-rule 1
  rule 1 // // type any unknown plan any unknown

This rule will take anything that hits it, change any “type” to Unknown and any “plan” to Unknown.

It then needs to be added to a translation profile that will catch the called number:

voice translation-profile SET_UNKNOWN
  translate called 1

This then gets applied to the outgoing international dial peer:

dial-peer voice 10000 pots
translation-profile outgoing SET_UNKNOWN
destination-pattern 9011T
prefix 011
port 0/0/0:23

And there you have it.  Calls to the Queen Mother can now commence and users can rejoice!

In case you are still reading this and are interested in the debugs, here are some pertinent excerpts:

From the unsuccessful call (X’s added to protect calling/called parties): Note, Plan:ISDN, Type: International

Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98396
Exclusive, Channel 22
Calling Party Number i = 0x2181, ‘XXXXXX3547’
Plan:ISDN, Type:National
Called Party Number i = 0x91, ‘01144XX80212223’
Plan:ISDN, Type:International

From the successful call (X’s added to protect calling/called parties) – Note, Plan: Unknown, Type:Unknown:

Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98395
Exclusive, Channel 21
Calling Party Number i = 0x2181, ‘XXXXXX3547’
Plan:ISDN, Type:National
Called Party Number i = 0x80, ‘01133XX2087574’
Plan:Unknown, Type:Unknown