Hi Gui!
This is indeed a very very nice bug report. Despite the complexity of
the context.
unfortunately, What you describe makes totally sense :-(
Some more comments linline...
On 10/24/2013 01:59 AM, Gui Iribarren wrote:
Hey axel!
so, we've scratching our heads today, facing some MTU issues on our
shiny new libre-mesh network,
and after a few hours debugging, tcpdumping and discussing, we came to
these conclusions:
* the invented src-addr in the bmxOut tunnel makes it impossible for
hops along the path to return "Packet Too Big" to the originating
node. So, if a particular link has a smaller MTU than the first link
(say, a VPN is involved), the packet will be silently dropped.
A==1500==B---1400---C===1500===D
A wants to send a packet to a network announced by D; so it creates a
tunnel with D as destination, and a "D-derived" fake address as src,
that matches the bmxIn "catchall" tunnel peer-addr in D
Then sends a 1460 + 40 = 1500 bytes packet through that tunnel,
B cannot push that packet to C, then tries to send back a ICMPv6 PTB...
but the src-addr it finds in the encapsulation is not A, but instead
the "D-derived" fake addr. Then, the ICMPv6 PTB is lost and A can
never find out about the smaller MTU.
This fundamentally breaks PMTUD which is a bad idea in IPv6
to avoid this there are 3 options:
* set mtu=1280 on every bmxOut tunnel (yuck! :( )
must it be so small? mtu=1350 is too big for the potential intermediate
VPNs?
* probe before establishing each tunnel, with the real
src-addr so
that PMTUD can happen correctly, until it reaches the desired endpoint
node. Then, use discovered PMTU for new tunnel. (*downside*: this will
only work as long as path doesn't change to pass through some thinner
link. In that case, PMTU will not be rediscovered, and packets will be
dropped again.)
* use the current bmxIn "catchall" tunnels only for sending special
bmx6 control packets, that ask for a symmetric tunnel.
i.e.
1) A sends to D (2001:db8::D) a packet (encapsulated with a fake
src-addr, to be catched by the catchall @ D) with content "I'm A and
this is my real address 2001:db8::A; please make a dedicated tunnel
for me"
2) D gets that packet and creates a tunnel with "peer 2001:db8::A",
then sends back an ACK to A, again using "A-derived" fake-addr as src
using "A-derived" fake-addr as src does not work from A because this
implies that the fake-addr is a valid address on A and that causes any
incoming packet having this address as src address to be dropped.
Instead A's primary address could be used. But anyway I hope we can
solve it without a dedicated control connection.... see below.
3) A gets the ACK and creates the tunnel with
"peer 2001:db8::D"
*** now both ends have a symmetric tunnel between them, with real src
and dest address ***
4) A finally sends the real payload through the symmetric tunnel, this
payload (may be bigger than 1280, say... 1450) will be encapsulated
with the real src-addr of A, so if any node in the path needs to send
back a Packet-too-big, will be able to, and PMTUD will happen correctly.
(at the cost of a full RTT latency before the first payload packet,
but with a reasonable tunnel expire time as it has currently, that
shouldn't be terrible)
back in April, i remember we discussed the idea of symmetric tunnels,
and you brought up this "control connection" idea which i'm simply
redescribing here.
But that discussion was in another context, more like a 'feature', and
finally didn't really solve the idea we had originally,
yet, this PMTUD issue was not taken into account AFAIR, so the
"symmetric tunnel" idea now becomes more like a bugfix (i.e. don't
create PMTUD blackholes)
what do you think?
I think the reception of an ip6-encapsulated packet destinated to the
primary address of the node could be used as an implicit request to
setup the reverse tunnel (from the gw to the node). So the additional
setup delay (of one RTT) could be avoided but at the cost of not knowing
if the gateway accepted or will ever accept the implicit tunnel request.
I'll try to speed up the dev of symmetric tunnels.
/axel
Cheers!!
gui
_______________________________________________
Dev mailing list
Dev(a)lists.libre-mesh.org
https://lists.libre-mesh.org/mailman/listinfo/dev