Mellanox `nmlx5_core` driver `4.23` issues on ESXi 8.0 Update 1

After installing ESXi 8.0 Update 1, some issues start to appear with affected nmlx5_core
adapters:
- Delayed / Failed IP discovery on VLAN-backed segments, even within the same host. Once in the ARP cache, no issues persist
- Delayed / Failed IP discovery, IP allocation failures on VLAN trunked port-groups, even within the same host. Issues persist even after IP discovery is established
- Overlay encapsulation offload failures:
- ICMP with any payload size will function bidirectionally via Edge Transport Nodes / FRRLinux machines, but TCP and UDP will not
- All overlay traffic encapsulated by a vSphere host flows correctly between workloads on the sane NSX overlay segment
- All overlay traffic encapsulated by a vSphere host flows correctly between segments on the same NSX distributed router
These issues are seen on the following hardware models:
These issues are experienced with the upgrade to vSphere 8.0 Update 1, which includes the following updated driver:
nmlx5-core 4.23.0.36-8vmw.800.1.0.20513097
This driver from NVIDIA ships with support for both Bluefield SmartNIC and ConnectX Generation 5 network adapters as one package, and rolling back to a previous release of ESXi 8 with the previous driver ( nmlx5-core 4.22
) immediately resolves all overlay issues
If anyone would like to contribute to this problem inventory, email me here
Originally published at https://blog.engyak.co on June 2, 2023.