When traces lie…

Interesting issue pops into my email box – when calling India, the call goes through to local 911 emergency services instead.  Not surprisingly, this email is marked with high priority.

So diving in, I have the user make test calls and we prove that calls to England, France and other international destinations work splendidly.  Not the same story with calls to a certain number in India- let’s just say local emergency dispatchers aren’t looking to be friends with voice engineers making test calls, even ones with charming southern accents.

In this case, all the calls are dialed the same way: 9011[country code][number], but the number to India happens to be 90119111XXXXXXXX.  As you may have noticed- 911, emergency services in the US, is part of the dialed number.

So what would make the Call Manager or the router- not sure where to lay the blame at this point since a PBX isn’t involved- ditch the 9011 and send 911 out to the PSTN? Good question.

Time for the Dialed Number Analyzer to save the day! Punch in the digits, click “Do Analysis” and get back 9@ as the matching route pattern. Cue icky feeling in stomach.  For those who aren’t familiar with why 9@ just sucks in your dial plan, please click on over to @networkingnerd’s blog post: http://networkingnerd.net/2011/05/26/9-must-die/ for a nice write up on the tawdry subject. If that doesn’t convince you, know that if you use it, I will hunt you down and…uh, let’s get back to the story…

In an attempt to thwart 9@, I create my own international dialing pattern the way god intended international route patterns to be, making sure my CSS/partition trumped that of the pathetic 9@ pattern.  Testing commences and the user’s test call goes through successfully! Huzzah! No more making crank calls to grumpy 911 operators.

Just for good measure, I have the user do one more test as I quietly pat myself on the back. This time, though instead of hitting redial, which unbeknownst to me, he had been doing with the previous calls, the user this time dials the number digit by digit.  I then hear the melodious “your call can not be completed as dialed…” message. Huh?

Having put my self-congratulatory speech on hold, it’s time for more debug and log collections.  At this point things go from slightly askew to downright wonky.  DNA tool says I’m still matching 9@.  *Gasp* – the DNA tool is lying to me! Viewing the router debugs I can see that my pattern has changed what the router was sending out to the PSTN from 911 to 011911- which, while not actually routable, is solid proof my new route pattern in Call Manager is being hit.

Then TAC tells me the trace files show that Call Manager quits collecting digits after the 9 and 0 are dialed for calls placed to the 9011911XXXXXXXX destination, but that the Call Manager collects all the digits dialed for any other international destination. Wait, what?  How does it know after my 9 and 0 whether I am going to dial India or Timbuktu? According to the trace files, though, Call Manager can predict if I’m going to call India before I even dial it. I know the system is good, but I didn’t think it had progressed to mind reading yet.

And what about using redial?  The system apparently collects all the digits there too. Somehow Call Manager *knows* when I’m going to dial India using the keypad, but if you hit redial it’s magical predictive powers are somehow temporarily suspended and the call sneaks on by.

To quote one of my favorite shows of all time: “this is all making a kind of sense that’s… not.”*

Feeling betrayed by my trusty tools and trace files, I am left to conclude that the system is as utterly confused as we are about what is actually going on under the hood.  So it’s back to basics- call routing appears to be the issue, time to review the system’s infernal route patterns yet again.

At this point, I’ll note that in addition to 9@, there is also present a 9011@ pattern. Previously we all blew this pattern off because all the evidence indicated this wasn’t ever being matched by anything. Now that the evidence is suspect at best, a closer look is warranted. We proceed to put the 9011@ pattern in a partition nothing has access to. We test and alas we have true success.

So what to make of this?

Number one and most importantly: never, ever use @ in your route patterns if you can help it.  It’s just wrong, wrong, dirty and wrong.  Also, it appears to completely goof up the Dialed Number Analyzer, so keep that in mind when troubleshooting such patterns.

Number two: tools are useful, but not always accurate. Corollary, trust – but verify. Take output from as many sources as you can to build a full picture of the puzzle, especially if one or more of the tools at hand are spitting out results that defy logic.

Number three: some clues throw you off track. In this case, the redial working pointed to a digit timeout issue, but other international calls were fine, so we put this on the back burner. Turned out to be a good decision.

So one mystery still remains: why the heck did the redial work? Anyone with thoughts/theories please feel free to comment, I’d love to hear your ideas on the subject…I wouldn’t rule out black magic and powers of unspeakable darkness…

*in case you were wondering, quote is from Buffy the Vampire Slayer, episode Becoming- Part 2, a series chocked full o’ excellent one-liners…

10 thoughts on “When traces lie…

  1. Nice post. I’m not a VoIP guy (or girl) but it was interesting. I’d say those lessons apply much more broadly than just to VoIP problems.

  2. I am a fan of the 9.@ and think it gets a bad rap for being misunderstood (thus improperly or not completely set up) but I wouldn’t say not using 9.@ is wrong either.

    1. I can’t say I’m a fan of the way the 9@ & 9011@ obscures troubleshooting, but the real limitation is not being able to implement class of restrictions when using such generic patterns. I would agree, though, that my not understanding all the caveats of 9@ certainly worked against me in this situation. Definitely a learning experience!

  3. 9.@ is a holdover from the Call Manager 3x days. In typical Cisco fashion, upgrades never remove “features”.

    I agree with Amy. Not only is it a slack way of setting up a dial plan, It is wrong. I believe anyone who sets up a new CUCM with 9.@ should be beaten with a stick. (Except for Jeremy C of course)

    When using 9.@ you also lose granularity of call routing. What if I want local call to go out a different gateway than International calls? With 9.@ all calls that match 9.@ go out the same route group.

    Nice post Amy.

    1. Agreed! The real drawback is the loss of classes of restriction when using generic patterns – this company didn’t need/want to control what users could dial what, but most do…

  4. I agree, nice post, Amy. I’ll share a couple related lessons I’ve learned the hard way.

    The Redial button is bad news when doing dial plan testing. It’s like dialing on your cell (or an old SIP phone without SIP dial rules) and then pressing dial. CUCM doesn’t process digit-by-digit, it analyzes the entire string at once. Same idea if you dial without going offhook and then lift the handset or press the speaker button. I always catch myself doing this during testing and have to remind myself to hit speaker first, then dial my digit string. I think the prescient nature of UCM here comes from our patterns marked Urgent Priority as it no longer looks for additional matches when it bumps up against these patterns.

    And you’re right. DNA is a mess. I find it lying to me all the time, so at least you know you’re not alone. It’s a nice first step, but I’ve learned not to put much stock in what it tells me. It becomes extra worthless when you’re doing a lot with Device Mobility or Local Route Groups.

    1. I thought that might be the case with the redial- that it was processing the string all at once and not digit by digit. Though I still cannot understand why dialing 99011911XXXXXXX was ending up 911 when dialed digit by digit. The 911 route patterns in the system were not marked with urgent priority. Is that a “feature” of 9@ and 9011@? They see 911 anywhere in the string and immediately send it out?

  5. I disagree about not using @ in route patterns. The @ combined with PROPERLY CONFIGURED ROUTE FILTERS allows for a very clean dial plan. The problems I have seen with people using the @ is when they do not properly configure corresponding route filters either because of an oversight or they do not understand how to use them.

    As an example, for US toll-free dialing, I would much rather have a single route pattern with the five toll-free area codes in a route filter than having to configure five separate route patterns.

    When using the @ pattern with route filters there is absolutely NO loss in class-of-restriction capability. I have installed many systems in this configuration where COR was an absolute requirement and I can tell you for 100% certainty it works great when properly configured.

    I also disagree with Ted that the @ is a “holdover” because it is not. It is an integral part of properly configured dial plans that should be used.

  6. Yesterday, I found that the Redial button seems to behave differently from just typing in the number. Same applies for the call history list. Dialing the number by hand didn’t work whereas hitting the redial button did. This evil thing seems to cache a lot more than just the number.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s