After weeks of trial and error and troubleshooting on several different Chef builds (1x ar71xx/wndr3700v2, 3x x86 of all different hardware), default LiMe 802.11s profile, I've come to the conclusion that watchping or some other link in the chain is failing to offer the gateway.

At any time, if any node loses WAN uplink, it and it's clients will sit without internet for hours, while still communicating with all other nodes and machines on the network.

After some chat in the dev channel, the fact that I was using USB WiFi devices as WAN interfaces, set up with the LEDE/OPENWRT default WWAN iface name, became suspect as the cause. Unfortunately however, neither changing the reference in the watchping settings in /etc/config/system, nor changing WWAN to WAN as the actual interface used made any difference. The nodes CAN communicate (thus not the usual "can't ping" bug), Uplink is recognized as lost by watchping, but the node with one does not ever offer it up.

Once again, this is with the default 802.11s profile, meaning BMX6. I have not yet tried BMX7, as I'm not sure how to migrate from one to the other in place.