Today’s topic initially spawned from mesh troubleshooting. If you’ve worked with mesh much, you may have just thrown up a little and are frantically trying to close the browser tab that led you here, and that’s totally understandable. For my voice engineer readers, mesh troubleshooting is one of the few things in the universe that can generate pain and suffering in levels akin to troubleshooting fax machines.
Given typically vague rumors and incomplete reports of intermittent connectivity issues at a mesh site, my amazing coworker was able to hone in on the root problem: various APs in the mesh intermittently stopped accepting clients on their 2.4 GHz radios. Being as 5 GHz was limited to back-haul traffic only, this was definitely wreaking some havoc on the overall deployment.
From the controller, disabling and re-enabling the radio on the 802.11g radio profile* for the affected APs served as a workaround while TAC was consulted. Mysteriously, other mesh deployment sites with the same model AP and code were not seeing this issue. As a note, these APs were all Aruba 274s and controller code version 220.127.116.11, but spoiler alert, the model AP wasn’t the issue.
Fast forward to TAC and some show/debugs commands later, the APs that randomly stopped accepting clients had enormous amounts of 2.4 radio resets, indicating this bug discussed here.
This issue affects not only 274s, but other models of access points. The bug does not appear to affect all of any model, just a small percentage of access points when it does show up.
If you think you might be experiencing this issue, take a look at the output of these commands and look for a crazy high number of radio resets. How high? Since the radio appears to be resetting practically every second, the radio reset number is noticeably and ridiculously large.**
show ap debug radio-stats ap-name NAME-OF-AP radio 0 advanced
show ap debug radio-stats ap-name NAME-OF-AP radio 1 advanced
show ap debug system-status ap-name NAME-OF-AP
The fix is in code versions 18.104.22.168 or 22.214.171.124 or later. We landed 126.96.36.199 and the issue looks to be properly squashed. The upgrade process, which I’ll outline in another brief post, was a simple and straightforward, and being a veteran of many a lengthy and challenging voice upgrades, I found this to be refreshingly delightful and far less sanity taxing.
* Enabling the radio can be done on the 802.11g radio profile, on the Basic tab. Uncheck the Radio enable box, click Apply. Check the Radio Enable box, click Apply. These mesh APs each have their own AP specific profile and this action only affects the individual AP. If your AP doesn’t have an AP specific profile, be sure to know what APs you are impacting when you do this. Also of note to this case, some experiencing this issue found disabling ARM provided temporary relief, but didn’t do the trick in this deployment, as ARM was already disabled and the issue still occurring.
**Below is an example of the number of resets seen for one of the affected APs:
Interface Rx_pkts Rx_errors Rx drops Tx_pkts Tx_errors Tx_drops Resets
——— ——- ——— ——– ——- ——— ——– ——
wifi0 174210795 15727807 0 451900531 103 0 9
wifi1 9166677 133711103 0 32655175 842870 0 211094