Extreme Networks VOSS (Fabric) Edge Port Delay Reason and Remediation

Extreme Networks Networking

We’ve recently started to move to an Extreme Networks VOSS Fabric based network, however due to the scale, we can’t change everything attached to it to work more seamlessly with the VOSS fabric based network operation.

We have a large number of hosts for example that use Active-Backup bonding on their NICs when connecting to the network, i.e. one NIC on the host connects to switch A, the other NIC connects to switch B. If a link was to fail to switch A, or switch A was to fail or need to be rebooted (to upgrade), the host switches to use the backup link in the bond that is connected to switch B.

Additionally we also have some storage arrays and firewalls that have active and inactive links that are connected all the time, in the event of a failover the active IP addresses move to the surviving ports.

In both cases the behaviour needs to be consistent and most importantly very brisk, a fail-over or fall-back should take perhaps a second or two at most to ensure there is no discernible affect to service.

Symptoms

Using the existing Extreme Networks XOS based switches, this was never a problem. If a host’s link was to fail, from the point of view of the client, perhaps an interruption of the equivalent of a ping being dropped, nothing more.

However, when repeating the tests on a host that had been moved to VOSS fabric based switches, the issue saw an interruption in network connectivity of about 6 ping drops, or around 5-6 seconds, occasionally slightly more.

Cause

Obviously this wasn’t expected or ideal. After some investigation it appeared that the issue was down to spanning-tree that is enabled by default on all edge ports.

Spanning-tree ensures there are no loops in your network by identifying loops and blocking ports automatically; however this process of discovering loops takes time, vastly less time that it used to in earlier spanning-tree versions however.

Resolution

So in this case when spanning-tree was disabled on both switch ports (i.e. the active and backup switch ports) for the host and the test repeated, the fail-over time went down to a second (or 1 ping drop equivalent).

interface gigabitethernet 1/3    
    no spanning-tree mstp force-port-state enable

The example above means that interface gigabitethernet 1/3 will not wait for spanning-tree to complete before allowing traffic to flow which is exactly what you need in this type of host configuration.

It is worth noting however that this should not be used on user facing edge ports, i.e. ports where an end user may accidentally plug a loop into your network.

It is also recommended to use and configure something like SLPP which is like Cisco’s BPDUGuard, in a data centre location. https://extremeportal.force.com/ExtrArticleDetail?an=000084288 Configuration’s such as this (as well as careful cabling) can ensure you can get keep the benefits of loop protection/prevention but without the risk of loops in your network causing availability issues.

Image Attribution

Leave a Reply

Your email address will not be published. Required fields are marked *