Dear all,
I was testing LibreMesh (together with Gio and SAn, lime-packages master
branch compiled on top of OpenWrt's openwrt-18.06 branch) on a
MediaTek-based router: YouHua WR1200JS.
Everything works fine apart the routing on cabled connections.
Seems that these routers does not like VLAN of type 802.1ad on cable.
It could be an OpenWrt bug or a bug on the device.
Can anyone check and confirm on other MediaTek devices please?
Here I make a list of what I tested:
* setting the routing protocols to run on 802.1q interfaces (rather than
on 802.1ad, we usually don't do it as it gave problems with TP-Link
routers, can be done giving a third argument in /etc/config/lime, like
"list protocols babeld:17:8021q") and the routing protocols see each
other via cable, works well (two identically configured routers see each
other as neighbours via eth0-1_17 in Babeld, prompted with "echo dump |
nc ::1 30003")
* listening with Wireshark on the laptop, I receive from the cable
broken IPv6 multicast packets. They are correctly marked as VLAN 802.1ad
ID 17 but the rest of the packet content is Error/Malformed.
* creating an 802.1ad interface on my laptop (e.g. "ip link add link
enp0s25 name enp0s25.17 type vlan proto 802.1ad id 17; ip link set
enp0s25.17 up"), adding an /24 IP on both sides and pinging from the
router to the laptop. My laptop receives the router's ARP requests and
answers, but the router keeps asking as if it did not receive the answer.
* while pinging from the laptop (10.2.1.2) to the router (10.2.1.1) on
the just created tagged cabled interface, I connect via wifi and ssh to
the router and run tcpdump on it:
** running it on eth0 shows that my ARP requests physically reach the
router and are properly tagged ("tcpdump -i eth0 -nn -e vlan"):
21:03:45.354344 54:ee:75:7a:c2:1f > ff:ff:ff:ff:ff:ff, ethertype
802.1Q-QinQ (0x88a8), length 64: vlan 1, p 0, ethertype 802.1Q-QinQ,
vlan 17, p 0, ethertype ARP, Request who-has 10.2.1.1 tell 10.2.1.2,
length 42
** running it on eth0-1_17 shows broken UDP packets (the same Malformed
IPv6 multicast packets I received with Wireshark) which likely are
generated by Babeld, BUT NO ARP request at all:
21:05:45.395359 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce5!] UDP, length 81
21:05:49.255355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e9!] UDP, length 12
21:05:53.225372 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e8!] UDP, length 12
21:05:57.385373 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e7!] UDP, length 12
21:06:01.245355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce1!] UDP, length 81
* flashed the YouHua router with OpenWrt 18.06.4 as downloaded from the
OpenWrt website and created the 802.1ad interfaces using the ip command
(installing the ip-full package, "ip link add link eth0.1 name eth0-1_17
type vlan proto 802.1ad id 17; ip link set eth0-1_17 up; ip address add
10.2.1.1/24 dev eth0-1_17") and still it does not ping (my laptop's ARP
requests and my laptop's ARP answers does not get to eth0-1_17)
* on the same clean router, using nping I sent a raw ethernet packet on
the eth0-1_17 interface (using the command "nping --send-eth
--source-mac ff:ff:ff:ff:ff:ff --dest-mac ff:ff:ff:ff:ff:ff --data
aaaabbbbccccddddeeeeffffffffeeeeddddccccbbbbaaaa -e eth0-1_17 -N
8.8.8.8") and captured it on the laptop.
What I got is broken (notice that instead of "aa aa bb bb cc cc" on the
second line, I have "aa aa 0e 9c cc cc").
This is when capturing on enp0s25 (plain ethernet)
0000 ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010 08 00 08 00 4c 14 ab ea 00 01 aa aa 0e 9c cc cc
0020 dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0030 aa aa d6 5f 25 ff fe eb 7e ac ae 2c 00 16 b7 e6
0040 4a c6 4f ee f2 fa
And this is when capturing on enp0s25.17 (VLAN 802.1ad ID 17 interface)
0000 ff ff ff ff ff ff ff ff ff ff ff ff 08 00 08 00
0010 2c 48 cb b2 00 05 aa aa 9a 9a cc cc dd dd ee ee
0020 ff ff ff ff ee ee dd dd cc cc bb bb aa aa 64 68
0030 63 70 20 31 2e 32 38 2e 34 0c 07 4f 70 65 6e 57
0040 72 74
the latest part of the packet, both when listening on enp0s25 or on
enp0s25.17, varies: usually does not have a transcription while
sometimes it can be transcribed as:
0030 aa aa 64 68 63 70 20 31 2e 32 38 2e 34 0c 07 4f ..dhcp 1.28.4..O
0040 70 65 6e 57 72 74 penWrt
where 1.28.4 looks like the busybox version on the router, no idea why
or how this got here.
Capturing the packet with tcpdump from inside the router, listening on
eth0-1_17 I got:
0000 ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
0010 00 34 f5 88 00 00 40 01 6a 2e 0a 02 01 01 08 08
0020 08 08 08 00 2f 89 c8 71 00 05 aa aa bb bb cc cc
0030 dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0040 aa aa
then, listening on eth0.1 I got:
0000 ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010 08 00 45 00 00 34 21 a6 00 00 40 01 3e 11 0a 02
0020 01 01 08 08 08 08 08 00 26 19 d1 e1 00 05 aa aa
0030 bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
0040 cc cc bb bb aa aa
and listening on eth0:
0000 ff ff ff ff ff ff ff ff ff ff ff ff 81 00 00 01
0010 88 a8 00 11 08 00 45 00 00 34 4c e4 00 00 40 01
0020 12 d3 0a 02 01 01 08 08 08 08 08 00 c8 4f 2f ab
0030 00 05 aa aa bb bb cc cc dd dd ee ee ff ff ff ff
0040 ee ee dd dd cc cc bb bb aa aa
so that all these three captures taken from inside the router look good.
As a comparison, I used the same nping command on a TP-Link WDR3600
router and the packet captured on my laptop looks perfectly ok, sniffing
on enp0s25:
0000 ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010 08 00 45 00 00 34 88 39 00 00 40 01 6f 59 0a 0d
0020 69 1a 08 08 08 08 08 00 da bd 1d 3d 00 05 aa aa
0030 bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
0040 cc cc bb bb aa aa
And capturing on enp0s25.17:
0000 ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
0010 00 34 33 0e 00 00 40 01 c4 84 0a 0d 69 1a 08 08
0020 08 08 08 00 60 93 97 67 00 05 aa aa bb bb cc cc
0030 dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0040 aa aa
In case this bug a hardware one for all the MediaTek-based routers, I
would suggest considering running Babeld on the br-lan bridge without
any VLAN (neither 802.1q nor 802.1ad) rather than on eth0-1_17.
BMX6 was already running on the bridge and to avoid it to run also
inside BATMAN-adv we were using this ebtables rule:
https://github.com/libremesh/lime-packages/blob/master/packages/lime-proto-…
we could do the same for Babeld (and for consistency I would also not
use VLAN for it on wireless mesh interfaces).
Thanks && ciao;
Ilario
What is the intended workflow when developing LibreMesh packages (like lime-app, shared-state, etc.)? What I mean is the process of making some change to a LiMe package, compiling it and trying it out on a device running OpenWrt.
Just following the compilation instructions <https://libremesh.org/development.html#compiling_libremesh_from_source_code> to produce an entire OpenWrt image every time seems to imply these steps:
Make changes to the package source code, and commit those to the repository.
Change the suitable Makefile in the LibreMesh source code to point to the commit above. Commit this as well.
In the OpenWrt repository, add a LibreMesh feed that points to the LibreMesh repository above, instead of the official one.
Compile OpenWrt.
While this works, it’s not really practical when you just want to try things out. You also clutter the repository by having to commit things just to test run them.
I’m sure you all follow a more convenient procedure, and I’d like to learn about it. Cheers!
Eric