Adventures In Networking: Hardships In Finding The Longest Match

Networking, Router, VMware

Networking, Router, VMwareSometimes in life you have to learn things the hard way. Recently I learned why the Longest Match Rule (Longest Match Algorithm) works and why it is applied not only to routing, but to other situations as well.

I was adding a new storage array and datastores to an existing VMware cluster using iSCSI. The VMware existing environment was laid out as follows:
[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”]vSwitch0 = VM Network & Service Console (10.1.1.0/16)
vSwitch1 = iSCSI (10.12.1.0/16)
vSwitch2 = vMotion (10.12.1.0/16)
vSwitch3 = Testing
[/framed_box]
The new storage array and iSCSI targets landed on a new vSwitch (vSwitch4). The old environment had both iSCSI and vMotion on the same network (10.12.1.0/16). For the new environment I wanted to completely separate the iSCSI and vMotion traffic by assigning them to different networks. Both iSCSI networks needed to stay up for migrations to happen so the new environment was laid out as follows:
[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] vSwitch0 = VM Network & Service Console (10.1.1.0/16)
vSwitch1 = iSCSI (10.12.1.0/16)
vSwitch2 = vMotion (10.12.1.0/24)
vSwitch3 = Testing
vSwitch4 = iSCSI (10.12.2.0/24)
[/framed_box]

First, vSwitch4 was created where the new storage was configured and presented to VMware, just as planned. The problem occurred when the subnet mask on vSwtich2 was modified from /16 to /24. As soon as this change to the subnet mask on vSwitch2 happened, access to all the VM went down. After scrambling for about 5 minutes to retrace the steps prior the problem I was able to determine that it was the subnet change that caused the outage. Changing the subnet mask on vSwitch2 back to /16 slowly brought everything back online.

What caused this outage?

One simple mistake!

When the subnet was changed from /16 to /24 the third octet also needed to be changed to differentiate the iSCSI and vMotion networks. When the /24 subnet was applied to vSwitch2 (10.12.1.0 network) the Longest Match Rule matched the longer extended network prefix. This change also applied for vSwitch1 and any data within the /16 network would traverse the /24 thus dropping the iSCSI targets and all its datastores.

A network that has a longest match describes a smaller set of IP’s than the network with a shorter match. This in turn means the longer match is more specific than the shorter match. The 10.12.1.0/24 is the selected path because it has the greatest number of matching bits in the destination IP address of the packets (see below).

[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] #1, 10.12.1.0/24 = 00001010.00001100.00000001.00000000
#2, 10.12.0.0/16 = 00001010.00001100.00000000.00000000
#3, 10.0.0.0/8 = 00001010.00000000.00000000.00000000
[/framed_box]
By simply changing the third octet on vSwitch2 I was able to change the subnet to /24.
The final and working configuration was laid out as follows:
[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] vSwitch0 = VM Network & Service Console (10.1.1.0/16)
vSwitch1 = iSCSI (10.12.1.0/16): left for migration
vSwitch2 = vMotion (10.12.3.0/24)
vSwitch3 = Testing
vSwitch4 = iSCSI (10.12.2.0/24)
[/framed_box]

Photo From: maximilian.haack