RSS

Seeing Tetration in action – NFD16

One of the highlights of Network Field Day 16 was a Cisco Tetration presentation by Tim Garner. Launched by Cisco last June, Tetration is a heavy lifting, data crunching platform that soaks up telemetry on all your network packets, uses machine learning algorithms on that data, and produces security policies templates based on the flow information received. This process gives engineers in-depth analytics, an impressive level of visibility, and supplies automagically crafted baseline security policies.  The latter truly shines when you are working with developers and application owners who have absolutely no clue what server needs to talk to what other server(s), much less what ports are required to do so securely.

With Tetration, you can use hardware sensors in the form of Nexus 9K switches with an -X in the SKU, or you can use software agents that can be installed just about anywhere. Or you can use a combination of both.  These sensors look at every single packet going in and out and generate telemetry packets that get shuffled off to Tetration where the real magic happens.

In addition to software agents and hardware sensors that natively generate Tetration metadata packets, you can also stream data from load balancers, firewalls, and other networking devices.  Some devices such as Citrix and F5 are natively supported, but others might take your doing a little work to get the data into a format that Tetration will accept – JSON being one of the acceptable formats.

Another interesting option for getting metadata into Tetration is the use of virtual machines set up as ERSPAN destinations.  Each VM can take in up to 40 gig of traffic, generate telemetry data for this traffic, and stream the data to the Tetration cluster.  Tetration can also take in NetFlow data using this VM method as a NetFlow receiver. NetFlow data is sampled though, so Tetration would not be seeing metadata on every packet as with the other options listed.

Once the data gets to the Tetration cluster, the snazzy machine learning algorithms built into the box start telling you cool things like what hosts are talking to what hosts and what “normal” network behavior looks like, and thereby, what abnormal network behavior would look like.

If your development servers should never be talking to your production servers, Tetration can tell you not only if that what’s happening now, but also if that behavior changes in the future.  Using a Kafka broker* you can have Tetration feed notifications to applications such as Splunk or Phantom, which can in turn communicate with hardware and software devices that perform actions such as host isolation when anomalous traffic is detected.

The automatic whitelists built by Tetration will require some care and feeding by an engineer. Importing policies from ACI is also an option as well. Tetration generated whitelists can be reviewed and tweaked, and an audit of what will be blocked when implementing or making policy changes is an excellent job preserving idea. Checking policies against the four to six months of network traffic data stored by the cluster gives you a good sense of what to expect when enforcement is actually turned on. That being said, you can also run your policies in audit mode for a few months to see what traffic hits the crafted policies.

If you want to see Tetration in action, I highly recommend this video below. The demo starts at about 16 minutes, but Tim Garner is such an excellent presenter, you’ll be glad you watched the whole thing.

 

*Kafka broker service was new to me, basically it’s a notification message bus, I used a few of these links below to get the idea:

https://sookocheff.com/post/kafka/kafka-in-a-nutshell/

https://kafka.apache.org/quickstart

https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html

 

Disclaimer: While Networking Field Day, which is sponsored by the companies that present, was very generous to invite me to this fantastic event and I am very grateful for it, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.

 

Published 10/6/2017

Advertisements
 
2 Comments

Posted by on 2017/10/06 in Cisco, Network Field Day 16

 

Tags: , , , ,

Preserving and managing intent using Apstra AOS

Apstra’s Networking Field Day 16 presentations highlighted key issues engineers face every day.  The traditional ways of spinning up configurations and validating expected forwarding behavior falls short of the needs of networks of any size. For anyone who has encountered a network where the documentation was woefully outdated or practically non-existent, and whose design requirements and purpose were deduced purely from rumors and vague supposition, Apstra offers AOS and its Intent Store.

More than just configuration management, the idea of Apstra is to manage not just the building of consistent configurations abstracted from the specific hardware, but also to provide a controlled manner in which to address and manage network design revisions throughout the network’s life cycle.

Changing business needs impart necessary modifications to device dependencies and performance, Apstra addresses these revisions by maintaining a single source of truth – the documented intent* – and providing tools to validate this intent.  As Derick Winkworth said “it’s about moving the network [design] from being all in someone’s head” to making the design something consistent, tangible, and best of all, something that can be queried, and by extension, verified.

Under the covers, Apstra makes use of graph theory, and for those who’d rather not Google that, the upshot is nodes get added and relationships get tied to nodes.  The structure allows for a flexible schema that lends itself to ever-changing quantities of connections and also to new types of inter-dependencies between objects.

For example, Apstra added the ability to create links between network nodes and the applications that run through them.  This is done through some DevOps wizardary which this video highlights well, and the additional relationship mappings allow the network operator to query for application paths and diagnosis traffic flow issues.

For a short how-is-this-useful-to-me, I highly recommend this explanation by Damien Garros on using Apstra to shorten the time it takes to deploy crazy amounts of security zones, validate them, and monitor them. Snazzy stuff for any engineer who has ever faced a security auditor.

 

Disclaimer: While Networking Field Day, which is sponsored by the companies that present, was very generous to invite me to this fantastic event and I am very grateful for it, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.

 

*Intent is all the buzz these days, back in my day we called it policy or design requirements. But I’ll try to avoid waving my arms and shouting get off my LAN… 🙂

Published 9/25/2017

 
1 Comment

Posted by on 2017/09/25 in Apstra, Network Field Day 16

 

Tags: , , , , ,

Minor version upgrades for Aruba controllers, version 6.5.x

If you tuned in for the last post, you’ll remember that in all the wireless mesh troubleshooting fun, a wireless controller upgrade was required.  Today’s post outlines the upgrade process from 6.5.0.1 to 6.5.1.7 with a primary and secondary controller. As always when dealing with upgrades, your mileage may vary. Never forget to read the release notes, and when in doubt, contact TAC.

As with any upgrade, the release notes often contain subtle details, and these are typically details that bite back if you miss them.  The notes for 6.5.1.7 are pretty straightforward, but they do include exciting, not to be missed, caveats if upgrading from 6.4.x, as well as some solid tips for the upgrade.

The best practice advice in the notes includes steps such as confirming enough memory and storage space for the controller, making a backup of the configuration, and noting the number of active access points before upgrading. All of these suggestions make prudent sense and the commands to do so are listed in the guide.

You can use a FTP, TFTP, SCP, local file, or USB to do this upgrade, but the guide warns against using a Windows-based TFTP server. I used FileZilla FTP server.

Once you’ve downloaded the image file from Aruba and your pre-upgrade checklist is complete, navigate to Maintenance > Controller > Image Management -> Master Configuration.

Pick the file loading option you want to use for the upgrade, then fill in the required details for the transfer. Choose the non-boot partition for Partition to Upgrade. This will load the new software image to the inactive partition. If you are uncertain which one is the boot partition, look under Current Partition Usage, one partition will be marked **default boot**. You will want the other one.

Be sure that Reboot Controller after Upgrade is set to No, unless you have a single controller and eat danger for breakfast. Select Yes for Save Current Configuration Before Reboot, and click Upgrade.

At this point, you rinse and repeat for the other controller(s).  Once the controllers have the upgrade version loaded, you reboot the master, and simultaneously reboot the other controller. In voice upgrade world, you have been well trained to wait FOREVER for all the services to come back up on the primary before even considering a reboot of secondaries, but in Aruba wireless world, simultaneous is written in the guide. See excerpt below from the 6.5.1.7 Release Notes available on the Aruba support site.

TAC did ease my anxiety over this simultaneous reboot thing by letting me know no harm would be caused if I wanted to wait for the master to come back online completely before proceeding.

After the controllers reboot and are back online, access points begin getting their new firmware and rebooting. Once the dust settles, you should end up with the same number of active APs as you noted in the beginning. Then it’s all testing and confirming access points are happy, clients are connecting, and that all is well your WiFi world.

Published 9/11/2017

 
2 Comments

Posted by on 2017/09/11 in Controller Upgrades, Wireless

 

Tags: , ,

Wireless troubleshooting – mesh access points stop accepting clients

Today’s topic initially spawned from mesh troubleshooting.  If you’ve worked with mesh much, you may have just thrown up a little and are frantically trying to close the browser tab that led you here, and that’s totally understandable.  For my voice engineer readers, mesh troubleshooting is one of the few things in the universe that can generate pain and suffering in levels akin to troubleshooting fax machines.

Given typically vague rumors and incomplete reports of intermittent connectivity issues at a mesh site, my amazing coworker was able to hone in on the root problem: various APs in the mesh intermittently stopped accepting clients on their 2.4 GHz radios. Being as 5 GHz was limited to back-haul traffic only, this was definitely wreaking some havoc on the overall deployment.

From the controller, disabling and re-enabling the radio on the 802.11g radio profile* for the affected APs served as a workaround while TAC was consulted. Mysteriously, other mesh deployment sites with the same model AP and code were not seeing this issue. As a note, these APs were all Aruba 274s and controller code version 6.5.0.1, but spoiler alert, the model AP wasn’t the issue.

Fast forward to TAC and some show/debugs commands later, the APs that randomly stopped accepting clients had enormous amounts of 2.4 radio resets, indicating this bug discussed here.

This issue affects not only 274s, but other models of access points. The bug does not appear to affect all of any model, just a small percentage of access points when it does show up.

If you think you might be experiencing this issue, take a look at the output of these commands and look for a crazy high number of radio resets.  How high?  Since the radio appears to be resetting practically every second, the radio reset number is noticeably and ridiculously large.**

show ap debug radio-stats ap-name NAME-OF-AP radio 0 advanced
show ap debug radio-stats ap-name NAME-OF-AP radio 1 advanced
show ap debug system-status ap-name NAME-OF-AP

The fix is in code versions 6.5.0.4 or 6.5.1.3 or later.  We landed 6.5.1.7 and the issue looks to be properly squashed. The upgrade process, which I’ll outline in another brief post, was a simple and straightforward, and being a veteran of many a lengthy and challenging voice upgrades, I found this to be refreshingly delightful and far less sanity taxing.

* Enabling the radio can be done on the 802.11g radio profile, on the Basic tab.  Uncheck the Radio enable box, click Apply.  Check the Radio Enable box, click Apply.  These mesh APs each have their own AP specific profile and this action only affects the individual AP.  If your AP doesn’t have an AP specific profile, be sure to know what APs you are impacting when you do this. Also of note to this case, some experiencing this issue found disabling ARM provided temporary relief, but didn’t do the trick in this deployment, as ARM was already disabled and the issue still occurring.   

Radio enable

**Below is an example of the number of resets seen for one of the affected APs:

Interface Rx_pkts Rx_errors Rx drops Tx_pkts Tx_errors Tx_drops Resets
——— ——- ——— ——– ——- ——— ——– ——
wifi0 174210795 15727807 0 451900531 103 0 9
wifi1 9166677 133711103 0 32655175 842870 0 211094

 

Published 09/05/2017

 

Tags: , , , , , , ,

Cisco Live 2017, engineering awesomeness.

Spending a week with amazing engineers always ranks high on my list of reasons to attend Cisco Live every year.  The networking community and the behind the scenes work of the Cisco Live team make this event truly fantastic every year, and 2017 was a definitely a hit.

I especially enjoyed participating in Tech Field Day once again.  OpenGear presented Lighthouse 5 which focuses on automating setup and maintenance, leveraging new API goodness. OpenGear’s API aims to enhance scale of deployments, while streaming workflows.  I found it especially fun watching Slack be leveraged to enroll and communicate with the OpenGear device. Nerdy goodness I recommend checking out.

If you are looking for a monitoring solution, I highly recommend you check out this excellent PRTG demo by Benjamin Day of Paessler, who not only knows his stuff, but refuses to use even one Power Point slide for his Tech Field Day presentation.  The man is a genius. The PRTG notification enhancements, maps, and overall flexibility really stood out, definitely cool stuff. You won’t be sad you watched.

And in the final bit of Tech Field Day learning for me, NetApp’s presentation on their FlexPod SF solution took a room full of network engineers and captivated their attention on storage. I know it sounds hard to believe that network engineers could find a storage presentation fascinating, but Jeremiah Dooley managed to pull off this incredible feat, and I highly recommend checking out this session.  He covers all the important details of the FlexPod SF announcement, including the available architectures, in a way that makes network engineers forget that this is a solution focused on storing bits, and not just moving them.

The return of Engineering Deathmatch to Cisco Live featured several episodes with some of my fabulous (and lovingly voluntold for EDM) friends, who couldn’t be more amazing. I’m excited to check out the Engineering Deathmatch site as the episodes air over the coming weeks.

And lastly, my favorite annual tradition of Cisco Live wrap up blogging, the photo gallery of crazy, brilliant, hilarious engineers being remarkably phenomenal. I heart you all.

Published 07/04/2017

 
3 Comments

Posted by on 2017/07/04 in Cisco Live 2017

 

Tags: , , , , , ,

HPE Discover 2017, Las Vegas

Attending HPE Discover 2017 did not disappoint. It was a fabulous week filled with presentations from subject matter experts on cool new tech, conversations with incredibly talented engineers and bloggers, and maximum levels of geeking out with other geeks.

I suspect this blog audience would be super interested to hear more about the new 8400 Aruba core switch announced at HPE Discover this year.

The speeds and feeds, along with and all the usual data sheet info is here, but what really stands out is the emphasis on telemetry data and programmability. Much of the focus on visibility and automation has been leveraged to make troubleshooting easier for the engineer.

The demonstration I saw up close was a simple script that allowed for monitoring of the priority voice queue. The script automagically detected any issues with the queue, captured offending packets when there was an issue, and presented the info to the user.  The Network Analytics Engine even gave some guesses as to why the issue occurred.  The demo I saw is pretty similar to what you can see in this short demo.

The 8400 is the first core switch Aruba has come out with, and it touts a new OS based on the existing Aruba switch OS. Yes, the thought of a new OS makes me a tad nervous when talking core switching, so be sure to check out the Coffee Talk Day 2’s first session in which the thoroughness of the OS testing process is discussed. If you’d rather not watch the whole thing, just know that code quality is a focus of the developers involved.

Other cool HPE Discover announcements included Aruba Asset Tracking, which leverages BLE enabled tags and Meridian Location Services to keep up with your stuff in real time. Data sheet goodness is here – see excerpt below from the data sheet to see the APs that support Asset Tracking.

For more HPE Discover 2017 goodness, check out these recorded sessions, I especially recommend Day Three’s talk on machine learning algorithms and the state of AI, completely fascinating, totally nerdy goodness.

Coffee Talks Day 1
Coffee Talks Day 2
Coffee Talks Day 3

Disclaimer: While HPE was very generous to invite me to this great event, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.  Also, special thanks to Pegah, Laura, and Becca for doing such a great job organizing this event.

 
1 Comment

Posted by on 2017/06/20 in HP Networking, Uncategorized

 

Tags: , , , ,

Oracle Ravello Blogger Day, 2017

Attending Oracle Ravello Blogger Day last month provided me deep insight into two products I knew little about before attending, Oracle Cloud and Ravello.  After the excellent deep dive provided and the basic melting of my brain on all things hypervisor, virtualization, and cloud, crafting an intelligible post seems a formidable challenge. But here we go:

Oracle has a cloud?! Yup. And they are pretty serious about where they are taking this. Over the last three years, there’s been a serious commitment to time and resources to build this thing and to build it right. Clay Magouyrk, VP of Oracle Cloud Infrastructure, jokingly commented one best things about being late to the cloud game is learning from other peoples mistakes.  Cloud isn’t new and watching what is working for the market leaders and avoiding their pitfalls is practically industry tradition.  But there’s differentiation here as well, with Oracle touting non over-subscription, predictable latency, bare metal access, and competitive pricing.  The Oracle cloud still has construction work to be done – only two US regions (think availability zones) are available at this time, but a European region is soon to be established.

Ravello, what is is? Ravello uses nested virtualization to allow you to bring your VMware based applications into the cloud without changing anything about them.  It reads the metadata of your virtual machines, sets up your virtual networking for you, and presto! You have your VMware environment running on cloud infrastructure.  Why is this handy?  Well, lots of vExperts have already leveraged this for their studies and lab environments.  Being able to test large scale scenarios without laying out great big wads of cash into your own virtual infrastructure is huge. For you networkers, this reminds me of Forward Networks where you basically have an accurate running copy of your network that you can break as you will. My favorite case study presented at Oracle Ravello Blogger Day was a network security company whose Ravello template, comprised of hundreds of endpoints and servers, is used to train engineers using true-to-life malware incidents.

Why Ravello and Oracle Cloud together? Ravello has in the past been cloud agnostic and still plans to stay that way, but there will be added benefits if you chose to run Ravello on Oracle cloud – those benefits stemming from the ability of Ravello developers to tap into the underlying infrastructure and eek out that extra bit of performance.  I would try to explain the hypervisor intricacies that allow this dark magic to happen, but I would quickly resort to words like abracadabra and shibboleet.

Fortunately, many of my vExpert friends have already blogged on the finer details of Oracle Cloud and the Ravello announcements and I highly encourage you to check these out:

Chris Wahl (@chriswahl): Getting Nerdy on the Oracle Ravello Cloud Service

Ather Beg (@atherbeg): Oracle Ravello Blogger Day – Part 1: Oracle Cloud Oracle Ravello Blogger Day – Part 2: Ravello

Gareth Edwards (@garethedwards86): Ravello 2017 Bloggers Conference – Opening Post #RBD1

Max Mortillaro (@darkkavenger): RT1 – Oracle Cloud Strategy: Part 1 – Oracle Ravello Cloud Service

Matt Leib (@MBLeib): Ravello Systems Changing the Game

James Green (@jdgreen): Can Oracle Build a Viable Public Cloud

Keith Townsend (@CTOAdvisor): Oracle’s Cloud Investment is Real

Tim Smith (@tsmith_co): Ravello and the Oracle Cloud Journey

 

Disclaimer: While Mark Troyer and the awesome folks at Teck Reckoning were very generous to invite me to this fantastic event which was awesome, my opinions are totally my own, as all redheads are far too stubborn to have it any other way.

 

Published 06/02/2017

 

 

 
1 Comment

Posted by on 2017/06/02 in Oracle Ravello Blogger Day

 

Tags: , ,