Runt post: A little VG224, MGCP annoyance…

While I am not a fan of MGCP for general gateway setup, I agree that it’s a good protocol to go with when setting up a VG224 device.  For those not familiar, you can use a VG224 device to connect multiple analog devices to your VoIP network.  Keep in mind that each analog device you add to your IP network makes a baby cry, but if you’re going to do it anyway and have quite a few analog stations, these devices make sense.

Up until recently, I had never had the privilege of configuring one of these guys from scratch.  Like adding any MCGP gateway, though, its a pretty straight forward process, so imagine my surprise when my VG224 wouldn’t register with call manager.

I saw the Unregistered status in Call Manager, and I also saw this on the gateway when I did a #show ccm-manager from the CLI. The status stayed in Registering toward the primary, then would occasionally swap to Registering toward the secondary. What had I forgotten?  Most commonly the mistake is to forget to use the fully qualified domain name as the device name in Call Manager. If “ip domain name” is set on the router, be sure to include the domain name as part of your device name when adding your VG224 to CUCM.  (Pro tip: You can see what the FQDN name should be in the #show ccm-manager output.)

This wasn’t my mistake though.  My “mistake” was simply this – I was trying to get the VG224 to register, but I hadn’t added any configuration to the ports.  My thought was “let’s get this thing registered, then I’ll go back in and add the directory numbers to the ports” – the VG224’s thought, however, was “I’m not registering until this chick defines some ports.”

Alas, after about 20 minutes of checking the configuration against known good configurations, I decided to proceed with adding the directory number information to the ports and worry about the registration at the end.  After I configured the first port, though, the VG224 starting showing as Registered in Call Manager and I proceeded to kick myself for wasting my own time.

Here’s what you *should* see in the output of a properly registered VG224 using MGCP, notice the Domain Name, this is what needs to be added in Call Manager for the device name:

My_VG224_#show ccm-manager
MGCP Domain Name: My_VG_224_my.lab.com
Priority        Status                   Host
============================================================
Primary         Registered               10.10.10.1
First Backup    Backup Ready             10.10.10.2
Second Backup   None                     

Current active Call Manager:    10.10.10.1
Backhaul/Redundant link port:   2428
Failover Interval:              30 seconds
Keepalive Interval:             15 seconds
Last keepalive sent:            14:12:27 CDT Apr 25 2002 (elapsed time: 00:00:14)
Last MGCP traffic time:         14:12:27 CDT Apr 25 2002 (elapsed time: 00:00:14)
Last failover time:             15:33:26 CDT Apr 11 2002 from (10.10.10.2)
Last switchback time:           15:35:01 CDT Apr 11 2002 from (10.10.10.1)
Switchback mode:                Graceful
MGCP Fallback mode:             Not Selected
Last MGCP Fallback start time:  None
Last MGCP Fallback end time:    None
MGCP Download Tones:            Disabled
TFTP retry count to shut Ports: 2

Ways Contact Center Makes Me Cry – chapter 1

One of the fun yet often excruciating parts of working in voice is the opportunity to troubleshooting a wide variety of devices and applications.  For me personally, Cisco Unified Contact Center (UCCX) proves to be one of the more challenging pieces of the software I encounter. This is probably because I started out in voice completely unfamiliar with the product- and also because the application developers may or may not have been smoking crack.  Alas, I jest with that last part- UCCX (formally IPCC) does what it does well. When something goes wrong, however, that box is like a wounded wild animal – there’s a very good chance it will bite your hand off when you try to help it.

Let me relay just one of many anecdotes I have accumulated that illustrate my point.

Executing a simple assignment: to change out the memory in an elderly IPCC server so that it will be ready for its upcoming upgrade to new, snazzy Linux-based UCCX goodness.  For those not aware, starting with version 8 of UCCX, the operating system changes from Windows Server to a Red Hat version of Linux. And there was much rejoicing.

Back to the story though- I gracefully take down the secondary server and pull out 2 sticks of 1 gig memory and put in 2 sticks of 2 gig memory. We’re not talking rocket science here, just your basic hardware swap, expected to be so simple and quick I’ve left the car engine running.

When I power on the what-should-be-happier server, however, I am woefully disappointed when all is not right in Whoville.  I’m greeted with this message of awesomeness:

c:\program files\wfavvid\ClusterData\profile.ini cannot be found and you won’t be making out of here on time tonight, if ever.

That last part doesn’t come standard with this error, but it should.

The effect of a profile.ini file gone missing – the server loses all willingness to be social.  It refuses to join in the UCCX game the other server is playing and instead pouts in the corner.  At this point I’m left with no other options but to figure out what it going to take to reconcile this sullen server with its former BFF.

Browsing to the aforementioned folder, I find that this server has overstated it’s case just a bit in that the file is most certainly there.  But it’s blank. Just a bunch of whitespace with no clues whatsoever as to what dark magical incantations used to abide there. Since 5 minutes ago I didn’t know this now vacant profile.ini file existed and my maintenance window was closing fast, I determined it was time to call up my buddies at TAC and let them enlighten me on this peculiar set of circumstances.

Sure enough, the engineer knew exactly what to do.  If at the time I had been a little more familiar with IPCC on a Windows platform, I likely would have guessed the solution as well since it oozes simplicity. Just copy the contents from the same named file located in the same named directory on the primary server into the now blank file on the secondary server.  Yep, that’s it! Just reboot, and voilà!  Two joyfully reunited IPCC servers strolling hand in hand down the boardwalk. Wonder if I qualify for a Nobel Peace Prize or something…

As a bonus for listening to this tale of woe, here’s what the profile.ini file generally looks like: (ip addresses have been changed to protect the innocent)

#Tue Oct 18 18:05:25 CDT 2011
port=1099
clusterProfile=default
ipAddress=172.10.10.10;172.10.10.1
latestUpdateTimeStamp=1318979125096
Completed=true

Note: In case you were wondering, this particular server was MCS-7825-H3-CCX1-DL380, the version of IPCC was 7 with the latest SUs.

Simplicity

Urban sprawl – New York city is the quintessential example of this phenomena. Why do I bring it up?  One, because I’m currently writing this from a not-so-cushy chair in the bloggers lounge of Interop, hosted in New York.  Two, because it’s the image that for a couple reasons comes to mind while processing all of the information that have been dumped into my overly saturated brain this week.

Reason one this comes mind: network sprawl. Networks just keep growing and growing, constantly bombarded with changes that risk the comforting hierarchal design allowing us OCD geeks to sleep at night.  Every time we wake someone else is demanding we modify our rock solid architecture to incorporate some new fangled something or other.  We grudgingly graft these new devices/endpoints/services into our designs but at a cost. In not too long, our once pristine work of art starts to strongly resemble the monster in Shelley’s Frankenstein – and frankly, we as geeks resent it.

Reason two this comes to mind: networking company sprawl.  Sounds odd, I know, but it’s an apt description when pondering the large, big name, been-around-forever, companies that we’re all familiar with in the networking community. These large companies are all faced with an exigent need to be innovative and encumbered by the weight of supporting past business decisions.  The sheer extent of the empire often results in a series of disjointed business units, complex product lines, incomprehensible licensing models, isolated pools of talent, and a customer base sitting on the edge of their chairs waiting anxiously to see how it all falls out. For the record, we geeks intensely resent this as well.

So when companies like HP Networking announce they are simplifying their product names, I perk up. It’s an immediate sign that someone, somewhere realized that sprawl has gone unchecked for too long and monster creation needs to be mitigated. Hints of such recognition have also been made by other big players, including Cisco, and every time I hear it I get giddy.  I dream of a world with simplified licensing models, BOMs that don’t take a PhD to comprehend, and companies with clearly articulated, streamlined direction. In a word, focus.

I’ve only seen hints though.  I want to see more.  Simplifying product names represents an awesome step in the right direction.  Now how about eliminating confusing redundant products? Cisco’s stance on getting back to core competencies sets my heart a flutter, now how about eliminating cripplingly complexities in the licensing models?

I love that HP Networks invited myself and other front line engineers to their briefings and honest feedback was both requested and given.  I’m sure they are not the only company doing so and for good reason. Listening to the folks doing the implementations can only help in the attempts to narrow focus and reclaim simplicity in the business.

Letting geeks in on company direction is a total win as well.  As geeks we know that change is constant, technology is always in flux, and everyone is just guessing at the next big thing.  We can handle that.  What we can’t abide is a lack of direction, goals, a sense of purpose in all the chaos.  In the words of Douglas Adams “we demand rigidly defined areas of doubt and uncertainty.” So bring us in, spill the beans, and we’ll be more than happy to help you sort through it all.  It’s what we do every day, it’s in our nature, and the results are a windfall for those who seek us out. Leave us in the dark, make us guess, send mixed messages and we’ll drop you like a bad habit. It’s what we do.

For some more great coverage of HP and Interop, check out these bloggers whom I had the great honor of meeting this week as well. I can confirm they are all fabulously awesome in meatspace too:

Aaron Paxson http://teneo.wordpress.com

Andrew VonNagy http://revolutionwifi.blogspot.com/

Brad Casemore http://nerdtwilight.wordpress.com/

Ethan Banks  http://packetpushers.net formerly at http://packetattack.org/

Matthew Norwood http://insearchoftech.com

Stu Miniman http://blogstu.wordpress.com

*A special thanks to @hp_networking who took excellent care of us bloggers, always keeping us fed and in constant supply of caffeine. 

It’s not me it’s you…

One of the niftiest features Call Manager offers is Single Number Reach (SNR). Configuring SNR takes a wee bit of work, but once it’s setup you’ll never love your life more. Why? Because SNR ends the tradition of giving out a work number *and* a cell phone number just so customers, end users, and sales piranas can abuse you all hours of the day. With SNR, you can send your work calls to your cell phone on your terms, which pretty much rocks my socks.

Recently I was troubleshooting this feature of awesomeness trying to determine why a user’s mobile phone wasn’t ringing when the work number was called. Few obvious things to look for when troubleshooting this:

  • Make sure there’s a Remote Destination Profile AND a Remote Destination configured – I know they sound like the same thing, but they are not and you’ll need both
  • Make sure the Calling Search Space on the Remote Destination Profile permits a call to the cell phone- meaning if it’s a long distance call to the cell make sure your chosen CSS allows long distance
  • Make sure that the user is associated with his/her Remote Destination Profile
  • Last but not least, tweak the timer settings- Answer Too Soon, Answer Too Late, Delay Before Ringing, and Just Hang Up the Damn Call Now are timers that are all extremely helpful when customizing this feature for Mr or Mrs Picky Pants. (Yes, I made up that last one, but it sounds like an option that should be there)

So, what if you’ve got it all configured correctly and the user’s mobile phone still doesn’t ring? Time for a sanity check – if you’re me, you put your own cell phone number in the Remote Destination and give it a whirl.  Now this may sound less like sanity and more like madness, but it means you’ve injected a known working quantity into the equation.  If there’s one thing I know my cell phone does, it’s ring. All. the. time.

But back to the story. Given the new parameter, SNR works like a charm.  So why did SNR like my phone so much better than the user’s? Mine does have a pretty cool case, I’ll give you that. However, doubting that this was the magic factor, I did some research and dug up some interesting information. I found that her model of phone had a TON of complaints posted online.  The complaints: that model – an EVO on the Sprint network- apparently has a nasty habit of not ringing when people call.  Fortunately, there were lots of suggestions on how to resolve this particular issue and my customer’s phone now rings all the time. Especially since I forwarded my work number there.

The moral of the story – voice is a murky, complex world in which we engineers often find ourselves trying to manipulate devices that lay just outside of our influence or control. Whether it’s an ancient fax machine located half way across the country with ECM turned on, a cell phone running a buggy version of code and a bad PRL list, or perhaps even another vendor’s video endpoint that refuses to make nice with the expensive equipment on your end of the conversation- we’ve got to hone our skills to narrow down the issue and face the fact that sometimes the fix will be out of our hands.

In case you’ve got an EVO phone with a ringing disorder, check out this forum: http://forums.androidcentral.com/htc-evo-4g/41825-does-your-evo-rings-4-times-before-you-hear-fix-your-slot-index-cycle.html

And if you need to configure SNR from scratch, I’d recommend starting here: http://tek-tips.com/viewthread.cfm?qid=1575474

And an official Cisco doc, complete with judgmental dude looking down at you while you read, can be found here: http://www.cisco.com/en/US/docs/voice_ip_comm/hucs/7.1a/provision/CH13_HUCS.pdf

Voice Girl Goes to Storage Day

Who has two thumbs and got to attend the last Tech Field Day?  This girl!

In case you don’t know what Tech Field Day is, go here and check it out:  http://techfieldday.com/   In case you don’t care what Tech Field Day is, I suggest you stop reading or make sure you have copious amounts of alcohol handy.  Actually, that last suggestion could improve the reading of any of my posts, so feel free to get started, you have my blessing.

Now, I’m sure we’ve all had that friend who goes on a vacation and brings back 10,000 pictures and insists on narrating them all in great, painstaking detail.  Fear not – I want to smack that guy as much as you do – so I will just be hitting the highlights of this expedition in this post.

So, without further ado, awesome thing number 1: hanging out with server admins.  I know, I know, for network and/or voice guys this hardly sounds like something that would make the list of awesome- unless that list were titled Ways In Which My Day Could Awesomely Suck – but it’s true and let me tell you why.

With roles in IT becoming less and less siloed, it’s clear us folks guarding the layer 2 and 3 keys to the castle are going to have to make nice with those folks rocking the upper layer data center knowledge.  As distasteful as that may initially sound to both parties involved, we all earn huge rewards.

Think about it- do you really want that server guy vMotioning all those production boxes across your precious WAN without any clue as to the implications?  I’m certain that server guy with the ponytail doesn’t want us well-intentioned network junkies screwing with SAN infrastructure when he/she thinks we don’t even know what random IO is. Of course do we do know what it is, but not the point…

Contrary to popular sysadmin belief, we network folks are capable of reading and do in fact know what a manual looks like.  Contrary to network admin belief, server guys do know what they are doing and don’t just break crap on purpose.  Given shrinking IT budgets, device consolidation, and technology overlap, our tiny sandbox has only gotten tinier and now it looks like we’re going to have to share the dump truck and not just the buckets.  (the dump truck was always my favorite)

So awesome thing number two:  presentations! Companies solving problems I was vaguely aware existed in ways I only wish I had imaged because retirement would be nice about now.  The quality of presentations was generally high and the technical level generally deep.  Perfect combination.

Let me offer a few brief take-aways from what I saw, you can catch the presentations here http://vimeo.com/groups/techfieldday:

  • Nutanix: Putting your VMs and storage on the same devices, have them utilize the same resources.  It has a kind of eggs in one basket feel – but the basket is really nice.  Interesting implications on the necessity for SAN administrator. http://www.nutanix.com/
  • Nasuni: If you ever want tips on how to deliver a presentation, watch this one. The send-your-files-in-the-cloud-and-see-them-at-your-other-sites product was wicked cool. Matt Simmons had the product up and running during the time of the demo. Sweet. http://www.nasuni.com/
  • Symantec Storage Foundation 6.0: Least favorite presentation style. So. many. power. point. slides. Clearly this product has some significant improvements over the previous version but the demo certainly wasn’t showing off this products nice curves, so to speak. http://www.symantec.com/business/storage-foundation
  • Data Direct Web Object Scaler: large-scale cloud storage wow-ness.  Keeping track of your massive amounts of cloud data using custom filing system to store and replicate data. Demo was super neat, product super fast.   http://www.ddn.com/products/web-object-scaler-wos
  • Pure Storage- all SSD storage, forget tiering.  They wrote their own software to talk/write to SSD drives in a way that makes SSD drives very happy. In fact, drives never fail for Pure Storage, or so was said- a concept our little group of skeptics had some trouble with. Pure Storage held to their guns though and a promise was made to tweet the first drive failure. http://www.purestorage.com/
  • Arista EOS:  Command line goodness. In the demo, the guy added the XMPP package to the Linux-based software running the switch, then chatted with the switch. Totally cool. Who doesn’t want to ask a switch how it’s day is going? http://www.aristanetworks.com/
  • SolidFire- All SSD storage, optimized for providers who want to limit compute and/or storage on a per customer basis. If you are a cloud provider of storage, being able to establish very specific SLAs for customers I’m sure is extremely appealing.  http://solidfire.com/
  • Arekia- backup goodness.  Presentation went into detail on their particular brand of deduplication which provides quite a lot of benefit when backing up large amounts of data. http://www.arkeia.com/

Last but not least, awesome thing number three: Stephen Foskett and Matt Simmons are freaking fantastic!  As the organizers, they coordinated every intricate detail and then made it look easy to the rest of us.  A very special thanks to those guys for making all of this happen, wishing them happy times in therapy as they attempt to recover.

For links to all things Tech Field Day 8: http://techfieldday.com/2011/tfd8/

Just want to be heard…

One of the things call centers supervisors really like to do is listen in on agent calls.  I’m sure it’s not *just* because they are nosy-type people, [insert business justification here], so part of my job is to make sure their eavesdropping is configured and working properly in Cisco Contact Center Express (UCCX).

Now there are about 11 ways to Sunday monitoring and recording can be jacked up by various elements, not the least of these being the voice engineer at the configuration helm. So when, during a deployment, it was found that calls were not able to be monitored or recorded, I skipped right past the look of surprise and moved straight into the what-is-it-this-time expression.

First, the symptoms.  Agents were getting calls and their supervisors were recording these calls. This means a whole bunch of agent/supervisor/phone setup tasks were completed correctly. Plus one point to the competent voice engineer with the mad skills. The recorded files were then being played back, however, and the tracks contained no audio.  Minus one point to the slightly less competent voice engineer who may, in fact, just be mad.

This not being my first rodeo, I initially suspect a codec issue, quickly confirmed by using the question mark button on the phones when the calls are made.  The display on the phone shows me the codec the calls are using is g.722 which, while a lovely codec, is not actually supported by UCCX.  It having been a long day, I decide to take a hatchet to g.722 and disable it in the Call Manager system wide parameters – ensuring no more g.722 EVER. Or at least not in this cluster.

Fully expecting the new rounds of tests to be successful, I get to use my surprised look after all when, once again, the recorded tracks lack audio. Grrr.

Firing up trusty Wireshark shows something very interesting – there is no RTP traffic from the PC to the UCCX server. For those who don’t eat, sleep, breath voice, RTP is the transport for the audio portion of the call.  All the setup/control messages will generally use SCCP, SIP, or H323, but the packetized voice uses RTP over UDP. The fact that it is completely absent from my capture file is more than a bit disconcerting.

After a nice talk with my buddies at TAC, they inform me that this is commonly seen with the particular brand of antivirus being run on the client workstations. After uninstalling the antivirus product and running the capture file again, RTP packets make an appearance and victory is declared in my favor.  100 points to the cheeky voice engineer from Dallas.

In case you were wondering what this RTP traffic looks like in Wireshark, you are looking for something like this:

RTP Traffic

As an added bonus for making it to the end of this post, here are a few other things you should check on the phone device configuration page in Call Manager when having issues with recording and monitoring, they are pulled from this document: Cisco CAD Troubleshooting Guide CAD 8.0 for Cisco Unified Contact Center Express Release 8.0 Cisco Unified Communications Manager Edition revised April 2011

  • PC Port—Enabled. If the PC Port is not enabled, the agent PC that is connected to the port will not have network access. No voice streams will be seen by the desktop monitor module
  • PC Voice VLAN Access—Enabled. If the PC Voice VLAN Access is not enabled, no voice streams will be seen by the desktop if the desktop is not a member of the same VLAN as the phone.
  • Span to PC Port—Enabled. If the Span to PC Port is not enabled, the voice streams seen by the phone will not be seen by the desktop monitor module.

When traces lie…

Interesting issue pops into my email box – when calling India, the call goes through to local 911 emergency services instead.  Not surprisingly, this email is marked with high priority.

So diving in, I have the user make test calls and we prove that calls to England, France and other international destinations work splendidly.  Not the same story with calls to a certain number in India- let’s just say local emergency dispatchers aren’t looking to be friends with voice engineers making test calls, even ones with charming southern accents.

In this case, all the calls are dialed the same way: 9011[country code][number], but the number to India happens to be 90119111XXXXXXXX.  As you may have noticed- 911, emergency services in the US, is part of the dialed number.

So what would make the Call Manager or the router- not sure where to lay the blame at this point since a PBX isn’t involved- ditch the 9011 and send 911 out to the PSTN? Good question.

Time for the Dialed Number Analyzer to save the day! Punch in the digits, click “Do Analysis” and get back 9@ as the matching route pattern. Cue icky feeling in stomach.  For those who aren’t familiar with why 9@ just sucks in your dial plan, please click on over to @networkingnerd’s blog post: http://networkingnerd.net/2011/05/26/9-must-die/ for a nice write up on the tawdry subject. If that doesn’t convince you, know that if you use it, I will hunt you down and…uh, let’s get back to the story…

In an attempt to thwart 9@, I create my own international dialing pattern the way god intended international route patterns to be, making sure my CSS/partition trumped that of the pathetic 9@ pattern.  Testing commences and the user’s test call goes through successfully! Huzzah! No more making crank calls to grumpy 911 operators.

Just for good measure, I have the user do one more test as I quietly pat myself on the back. This time, though instead of hitting redial, which unbeknownst to me, he had been doing with the previous calls, the user this time dials the number digit by digit.  I then hear the melodious “your call can not be completed as dialed…” message. Huh?

Having put my self-congratulatory speech on hold, it’s time for more debug and log collections.  At this point things go from slightly askew to downright wonky.  DNA tool says I’m still matching 9@.  *Gasp* – the DNA tool is lying to me! Viewing the router debugs I can see that my pattern has changed what the router was sending out to the PSTN from 911 to 011911- which, while not actually routable, is solid proof my new route pattern in Call Manager is being hit.

Then TAC tells me the trace files show that Call Manager quits collecting digits after the 9 and 0 are dialed for calls placed to the 9011911XXXXXXXX destination, but that the Call Manager collects all the digits dialed for any other international destination. Wait, what?  How does it know after my 9 and 0 whether I am going to dial India or Timbuktu? According to the trace files, though, Call Manager can predict if I’m going to call India before I even dial it. I know the system is good, but I didn’t think it had progressed to mind reading yet.

And what about using redial?  The system apparently collects all the digits there too. Somehow Call Manager *knows* when I’m going to dial India using the keypad, but if you hit redial it’s magical predictive powers are somehow temporarily suspended and the call sneaks on by.

To quote one of my favorite shows of all time: “this is all making a kind of sense that’s… not.”*

Feeling betrayed by my trusty tools and trace files, I am left to conclude that the system is as utterly confused as we are about what is actually going on under the hood.  So it’s back to basics- call routing appears to be the issue, time to review the system’s infernal route patterns yet again.

At this point, I’ll note that in addition to 9@, there is also present a 9011@ pattern. Previously we all blew this pattern off because all the evidence indicated this wasn’t ever being matched by anything. Now that the evidence is suspect at best, a closer look is warranted. We proceed to put the 9011@ pattern in a partition nothing has access to. We test and alas we have true success.

So what to make of this?

Number one and most importantly: never, ever use @ in your route patterns if you can help it.  It’s just wrong, wrong, dirty and wrong.  Also, it appears to completely goof up the Dialed Number Analyzer, so keep that in mind when troubleshooting such patterns.

Number two: tools are useful, but not always accurate. Corollary, trust – but verify. Take output from as many sources as you can to build a full picture of the puzzle, especially if one or more of the tools at hand are spitting out results that defy logic.

Number three: some clues throw you off track. In this case, the redial working pointed to a digit timeout issue, but other international calls were fine, so we put this on the back burner. Turned out to be a good decision.

So one mystery still remains: why the heck did the redial work? Anyone with thoughts/theories please feel free to comment, I’d love to hear your ideas on the subject…I wouldn’t rule out black magic and powers of unspeakable darkness…

*in case you were wondering, quote is from Buffy the Vampire Slayer, episode Becoming- Part 2, a series chocked full o’ excellent one-liners…

A tale of two phones

Manager comes in and asks me if I have any tasks the shiny new intern can observe, my response: I’m about to troubleshoot a Call Forward All issue- if the intern doesn’t have anything more interesting, he’s welcome to watch.  And to my complete dismay, he really doesn’t have anything more interesting and I have an audience.

First step: gather the facts.  Call the user’s extension, the phone rings- super.  Set the call forward all- super.  Call the user’s extension- get busy signal.  Not so super. Okay, Houston, we’ve verified there is a problem.

Second step: check the obvious.  In this case, checking the CSS of the Call Forward All of the Directory Number is the place to start.  If no CSS or the wrong CSS is set, the call that’s supposed to be forwarded to Timbuktu isn’t going to get there no matter how much you will it to do so. So I verify the CSS is correct, and I give a lengthy explanation of CSSs to the intern who, once again to my dismay, has not fallen asleep yet.

Third step: isolate the issue- This is also known as the simplify-the-issue stage or eliminate-all-the-factors-you-can stage.  In this situation, I whittle down the problem to it’s most basic parts- taking the directory number, which was a shared line, isolating it to a known good test phone. Also, I confirm the number I’m forwarding to is an internal, reachable, working number. I repeat the tests. And for the love of Pete, it still doesn’t work.

Soooo, thinking I’d like the intern to think I’m smarter than a potato, I blow through the rest of my bag of tricks, including using the Dialed Number Analyzer to confirm my call is taking the expected path.  I also proceed to rule out any PBX conflicts.  As a side note, if you have customers that run a Cisco voice system integrated with a legacy PBX, it is perfectly acceptable to blame the PBX for all issues.  They are like telco carriers, it’s the right thing to do.

While I still have what is left of your attention, let’s move along.  At this point, I hang my head in shame, admit defeat, and do what all good mentors do – show the intern how to collect logs and open a TAC case.

And TAC’s findings on the source of my malcontent?  Bug ID CSCse19548.  Which, for those who would rather I google that for them, the essence is there’s a counter that is supposed to prevent looping calls and it has been triggered.  This counter tells the Directory Number  “you are a loop!! no call forward for you!!” (call forward Nazi…)

The fix: reset the counter.  You can do this two ways: the take-a-hammer-to-it approach and stop/start the CCM Service, temporarily taking down call processing on your CUCM server, OR the slightly milder approach, you can reset the max forwards counter using the phone itself:

-Go off hook on a phone that is registered to CCM that has the high counter, make sure the CallFwdAll is set
-enter “**##*30 (enables codes)
-Go off hook again
-enter “**##*35 (clears the max hop counters)

By the way, if you are wondering what this issue looks like in the log files, you are looking for something like this:

06/17/2011 11:25:21.637 CCM|Forwarding – processCFA – ERROR –
Forward loop detected.  — Clear the call with USER_BUSY. callKey=

Final thoughts:

The last thing I will mention I learned from this situation is that users rarely tell you all of the story.  If you’ve ever seen House and heard his rant on “all patients lie” it’s similar in concept. In this case, I later found that the directory number in question was once part of a group of two directory numbers in which two very nice, very old ladies, kept call forwarding their phones to each other.  You guessed it, creating a call forwarding loop!  Just shortly before this CallFwdAll issue had been discovered, the hooligans had been assigned a single directory number to share which put a stop to their shenanigans. Reminding me to always, always, ask questions.  And lots of them.

Customizing CAS…

Ever been troubleshooting an issue to find the problem was you left out a single line of code – code you never knew needed to be there?  I’m betting anyone who’s been in IT for any length of time has been there, done that- and nearly pulled his/her hair out in the process. (Why do you think there are so many balding guys in IT?)

Case in point – team lead and I are bringing up E1 circuit in Mexico, resting comfortably in the knowledge that we’ve done this before. The fact that we have no idea how many digits the carrier is going to be sending doesn’t even phase us, we’ve got mad translating skills – bring it on.

That is until it’s clear that whatever digits the carrier is sending, the router is less than thrilled with. No modification of the incoming translation pattern appeases the angry stream of incoming digits- whatever they may be.

Fast forward about two hours and quite a number of debugs later and say hi to the Australian TAC engineer, who is now on the line with us, two IT guys at the site, and a Mexican telco engineer.  Only problem – no one but the telco engineer speaks Spanish – and we’re pretty sure everything that’s wrong is his fault.  He is the telco guy after all.

In the bizarre world of coincidences, the Australian TAC guy (with the really great accent, btw) pipes up with  “hey, the guy in the cube next to me happens to speak Spanish” – and with that our international summit gains traction.  Shortly after, we are staring in awe at the magic line of code that makes everything in this particularly odd universe super happy.

What was missing? This line: groupa-callerid-end. Yep, all this madness and mayhem over that one single line.  It goes here:

controller E1 0/0/0
framing NO-CRC4
ds0-group 1 timeslots 1-15,17-30 type r2-digital r2-compelled ani
cas-custom 1
country telmex
category 2
answer-signal group-b 1
groupa-callerid-end  

Now will you always need this command when bringing up E1s?  Nope.  Will this fix all your issues with telcos in Mexico?  Not likely.  But it’s definitely something to make note of.  Especially when you find yourself doing a bit of guesswork due to a certain lack of information and relatively huge language gap with the carrier.

As an added bonus and completely unrelated to the issue above – here are some dial-peers for common patterns in Mexico that might prove useful if you are planning on bringing up a site there.  Think of it as your treat for making it to the end of this post.

dial-peer voice 2 pots
description Local Dialing
destination-pattern 9[1-9]…….
port 0/1/0:1
forward-digits 8
!
dial-peer voice 91 pots
description Long distance
destination-pattern 901……….
port 0/1/0:1
forward-digits 12
!
dial-peer voice 9011 pots
description International Dialing
destination-pattern 900T
port 0/1/0:1
prefix 00
!
dial-peer voice 44 pots
description Local Cell Phone
destination-pattern 9044……….$
port 0/1/0:1
forward-digits 13
!
dial-peer voice 45 pots
description Long Distance Cell Phone
destination-pattern 9045……….$
port 0/1/0:1
forward-digits 13
!
dial-peer voice 60 pots
description Emergency Services
destination-pattern 060
port 0/1/0:1
forward-digits all
!
dial-peer voice 9060 pots
description Emergency Services
destination-pattern 906.$
port 0/1/0:1
forward-digits 3

dial-peer voice 9070 pots
description Information & Electric Repairs
destination-pattern 907[01]
port 0/1/0:1
forward-digits 3
!
dial-peer voice 9050 pots
description Telephone Repair
destination-pattern 9050
port 0/1/0:1
forward-digits 3
!
dial-peer voice 9040 pots
description Information
destination-pattern 9040$
port 0/1/0:1
forward-digits 3

Translating nothing into nothing…

Wanna confuse a just-starting-out voice engineer quickly? Just show them voice translation rules. Seemingly simple on the surface, black magic voodoo underneath.  At least it can seem that way to someone new to voice…

The most recent dark magic I learned to perform came about on an issue I was 90% sure was a carrier issue – I like to hold out a 10% chance that the carrier actually did get it right, it’s only fair.

So a user reports that international calls to Great Britain are failing- no other international calls are failing, just those.  Now, I don’t know about your users, but mine *often* have trouble even figuring out the digits to dial to make a long distance call, so my confidence in them being able to accurately enter an international access code is low. Okay, non-existent.

So we fire up the good ole “debug isdn q931” and to my surprise the user is actually right. Surprise being the appropriate emotion since, let’s face it, that doesn’t happen everyday.  I take a capture of the call failure to Great Britain and a capture of the successful international call and conclude that the carrier must be goofing something up somewhere.

Now, I’m really not a blame-it-on-the-other-guy type of gal, but come on- the dial strings are hitting the same route pattern, sent to the same gateway, to the same dial-peer, and out the same voice port.  And only Great Britain numbers fail – thinking it’s not likely my system- seeing that there’s equal treatment to all things international on this end. I reasonably conclude the carrier switch must have some super special, surely unintentional, non-routing going on.

Arming the user with debugs, I send him on his way to confront the carrier with the proof of their Anglophobic ways. That’s when I learn I have overlooked something significant in the debugs- something the lovely carrier technician pointed out – likely with a smirk on his I-know-I’m-right face.

The q931 debugs showed the “type” for the Great Britain calls being marked with type as “International” whereas the calls for other international destinations were being marked with type of  “Unknown.”  Why is this significant?  Well, the “International” designation when received by a carrier switch causes that switch to prepend a 011 to the dialed string.  In this case, it’s extremely detrimental since 011 was already part of the digits placed on the line.

There are many ways to fix this issue, the one I liked best as you may have guessed, involves a translation pattern and was suggested by one of my brilliant coworkers.

It goes like this:

voice translation-rule 1
  rule 1 // // type any unknown plan any unknown

This rule will take anything that hits it, change any “type” to Unknown and any “plan” to Unknown.

It then needs to be added to a translation profile that will catch the called number:

voice translation-profile SET_UNKNOWN
  translate called 1

This then gets applied to the outgoing international dial peer:

dial-peer voice 10000 pots
translation-profile outgoing SET_UNKNOWN
destination-pattern 9011T
prefix 011
port 0/0/0:23

And there you have it.  Calls to the Queen Mother can now commence and users can rejoice!

In case you are still reading this and are interested in the debugs, here are some pertinent excerpts:

From the unsuccessful call (X’s added to protect calling/called parties): Note, Plan:ISDN, Type: International

Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98396
Exclusive, Channel 22
Calling Party Number i = 0x2181, ‘XXXXXX3547’
Plan:ISDN, Type:National
Called Party Number i = 0x91, ‘01144XX80212223’
Plan:ISDN, Type:International

From the successful call (X’s added to protect calling/called parties) – Note, Plan: Unknown, Type:Unknown:

Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98395
Exclusive, Channel 21
Calling Party Number i = 0x2181, ‘XXXXXX3547’
Plan:ISDN, Type:National
Called Party Number i = 0x80, ‘01133XX2087574’
Plan:Unknown, Type:Unknown