Hey axel!
so, we've scratching our heads today, facing some MTU issues on our
shiny new libre-mesh network,
and after a few hours debugging, tcpdumping and discussing, we came to
these conclusions:
* the invented src-addr in the bmxOut tunnel makes it impossible for
hops along the path to return "Packet Too Big" to the originating node.
So, if a particular link has a smaller MTU than the first link (say, a
VPN is involved), the packet will be silently dropped.
A==1500==B---1400---C===1500===D
A wants to send a packet to a network announced by D; so it creates a
tunnel with D as destination, and a "D-derived" fake address as src,
that matches the bmxIn "catchall" tunnel peer-addr in D
Then sends a 1460 + 40 = 1500 bytes packet through that tunnel,
B cannot push that packet to C, then tries to send back a ICMPv6 PTB...
but the src-addr it finds in the encapsulation is not A, but instead the
"D-derived" fake addr. Then, the ICMPv6 PTB is lost and A can never find
out about the smaller MTU.
This fundamentally breaks PMTUD which is a bad idea in IPv6
to avoid this there are 3 options:
* set mtu=1280 on every bmxOut tunnel (yuck! :( )
* probe before establishing each tunnel, with the real src-addr so that
PMTUD can happen correctly, until it reaches the desired endpoint node.
Then, use discovered PMTU for new tunnel. (*downside*: this will only
work as long as path doesn't change to pass through some thinner link.
In that case, PMTU will not be rediscovered, and packets will be dropped
again.)
* use the current bmxIn "catchall" tunnels only for sending special bmx6
control packets, that ask for a symmetric tunnel.
i.e.
1) A sends to D (2001:db8::D) a packet (encapsulated with a fake
src-addr, to be catched by the catchall @ D) with content "I'm A and
this is my real address 2001:db8::A; please make a dedicated tunnel for me"
2) D gets that packet and creates a tunnel with "peer 2001:db8::A", then
sends back an ACK to A, again using "A-derived" fake-addr as src
3) A gets the ACK and creates the tunnel with "peer 2001:db8::D"
*** now both ends have a symmetric tunnel between them, with real src
and dest address ***
4) A finally sends the real payload through the symmetric tunnel, this
payload (may be bigger than 1280, say... 1450) will be encapsulated with
the real src-addr of A, so if any node in the path needs to send back a
Packet-too-big, will be able to, and PMTUD will happen correctly.
(at the cost of a full RTT latency before the first payload packet, but
with a reasonable tunnel expire time as it has currently, that shouldn't
be terrible)
back in April, i remember we discussed the idea of symmetric tunnels,
and you brought up this "control connection" idea which i'm simply
redescribing here.
But that discussion was in another context, more like a 'feature', and
finally didn't really solve the idea we had originally,
yet, this PMTUD issue was not taken into account AFAIR, so the
"symmetric tunnel" idea now becomes more like a bugfix (i.e. don't
create PMTUD blackholes)
what do you think?
Cheers!!
gui