Mikrotik CRS236-24S+2Q+RM switch: one year usage summary

Mikrotik is known for its cheap 10GBit/s switches. That's the reason I bought one of these back in 2021. Although buying 10GBase-T transceivers is not that cheap, in the end I replaced my good ol' Netgear S3300-28X with the Mikrotik CRS236-24S+2Q+RM running SwOS-2.13. I know the switch can also run RouterOS but to be honest, I do not like the complexity of RouterOS, nor do I need an OS on my switch that provides lots of functionality I definitely will never use (on a switch).
Unfortunately, SwOS comes with a couple of drawbacks:
  • SwOS has a limited password length: It accepts twelve chars but not 20. I didn't check if twelve really is the limit but SwOS refuses to save passwords with 20 chars.
  • No ssh connections possible: All SwOS provides is a very basic http web interface.
  • No https connections possible: The web-interface can only be accessed via http.

Now all these drawbacks are not present in RouterOS but still, that one is totally overloaded with functionalities one never needs in a L2 switch. Furthermore, one small misconfiguration in RouterOS and 10GBit/s transfer speeds are gone because everything gets routed over the switch CPU which is only connected to the switch chip with a 1GBit/s line.

There's one really annoying bug in SwOS-2.13 though, that only recently got confirmed by Mikrotik: The two QSFP+ ports refuse to re-establish a link connection once the remote side gets rebooted. In order to re-establish the link, I found three possible solutions:
  • Reboot the switch. (most effortful solution as the entire network connected to the switch is down while the switch performs the reboot)
  • Un-/replug the QSFP+-cable at the switch. (similarly effortful)
  • Toggle auto negotiation from "auto" to "40G" or vice versa.

And these are the actions that do not work and will not solve the link issue:
  • Reboot the remote device again.
  • Un-/replug the QSFP+-cable at the remote side.
  • Replace the remote NIC (if possible).
  • Use a different QSFP+ cable.

Mikrotik promised a fix for a future SwOS release but I doubt this release will come anytime soon as their priority clearly is with RouterOS (where - again - this issue is not present).

Besides these issues the switch performs really great and - when using a proper transfer protocol - provides reliable 10GBit/s on all of its ports. In summary I do not regret my decision and once Mikrotik finally puts some love into SwOS again, this will become a very nice and satisfying switch for my home network.

My time as Gentoo dev has come to an end...

so the inevitable end has come sooner than I expected but... let's be honest, Gentoo more and more turns into a shit hole full of self-important wannabes.
Ever since I became a Gentoo developer, the project always had issues with too powerful people. People being in comrel, QA and/or council at the same time cannot make a healthy project. But whenever the proposition came up in council to force people to only be in one powerful position, this has always been battered down decorated with lame excuses.
So now Gentoo is at a point where there is a small circle of "elevated people" that dictates direction of Gentoo for each and everyone. Refusal of bowing to their will or following their commands leads to... harmful sanctions as I just found out for myself lately. As a result I am no longer part of this shit hole project anymore.
Most of my twelve years of being a Gentoo developer were much fun, the last two years were not.
On the bright side, I now have much more free time I can spend with my family, my friends and my hobbies. Sorry to all users of the 147+ packages I was maintaining. I am sure many will not land in good hands I'm afraid.


Update 20220629 12:36CEST
Gentoo now started to censor my retirement bug (and even restricted the bug so it is no longer publically available). So I feel the urge to clarify some things:

  • I was kicked out from Gentoo because of developer Sam James. He is the reason I was booted. I NEVER voluntarily retired from Gentoo!
  • Gentoo started to surveil my E-Mail account. (*)
  • Gentoo is planning to invalidate my council vote although I did that vote BEFORE any of this bullshit started to happen.
(*) I've been told by a reliable source


Update 20220630 15:09CEST
Gentoo now tries to frame me as a defamer in order to justify their decision. And of course this blog post was taken as justification to fast-retire me and categorize me as "malicious actor". It is hilarious how deep they stick in their own "reality bubble". The way comrel acts nowadays reminds me of the German word "Gedankenpolizei". In the past GDR's Stasi acted similarly:

  • Secret trials
  • the defendant was given no access to what he/she was accused for
  • condemnation was secretly ruled
  • secret surveillance
  • vote manipulation
  • public censorship
Would you want to be part of such an "organization"? I surely do not.

Update 20220630 16:47CEST
Now Gentoo has even removed my Gentoo overlay (poly-c) from their public repository list with the funny but nevertheless completely wrong assertion:

"The owner has stopped using Gentoo"

At this point I would not be surprised if they try to completely remove any sign of my existence from their public ::gentoo git package repository as well. I mean, they already proved that they don't stop at performing blatant censorship.

Update 20220708 11:25CEST
Three days ago, I got my council vote confirmation ID. So to my surprise it seems at least this kind of "punishment" could not be enforced by comrel.

Threadripper Pro is really a beast

Finally, I have my new desktop machine up and running since May 8th. As I wanted a strong machine for Gentoo development again, I went with an AMD Ryzen Threadripper PRO 3955WX which is a 16-core CPU with 32 threads running at 3.9GHz base frequency. That should be enough for the upcoming six to eight years. As I also have a couple of VMs in use on this machine, I doubled the amount of RAM compared to my old machine and now have not less than 256GB ECC-RAM to play with. That should be enough to compile Gentoo packages in a RAM-disk and run a couple of VMs at the same time.
Finding a suitable mainboard for this CPU was not hard, but obtaining the board was an adventure of its own. I opted for a Supermicro M12SWA-TF board which was announced in January 2021 with a release date of mid 2021. Unfortunately it took over a year until the board was easily available.
Having this machine now also means an end to my dual CPU-socket usage on desktop systems. I see this as an improvement because that way the mainboard has more space for other stuff and features I consider important for a modern motherboard.

I only had few annoyances while installing Gentoo on that machine. Unfortuantely I couldn't use a Gentoo-clone from my old desktop machine on the new machine because... well... -march=native on my old AMD Bulldozer CPUs produced binaries that don't run on a Threadripper CPU. That is the first time I found AMD-CPUs not being fully downward compatible.
So I went with a fully-blown stage1 installation (as I am used to do since 2003) and simply installed all packages that the world file from my old machine contained. Configuring the kernel and grub was another challenge. I somehow had trouble getting grub being recognized by UEFI and I had to take about half a dozen attempts to get a bootable kernel configured.

Speaking of this machine I should also mention Arsimael who donated an Asus Radeon RX 5700 XT GPU to this system which I would not have been able to afford otherwise.

This machine really is a beast compared to my old machine. Compiling gcc-11 went down from 1:49h to 29 minutes by less than half of the power consumption. That is simply... WOW! Now all I need to do is replacing all the loud chassis fans with Noctuas.

Early boot messages on UEFI systems with grub

For quite a while I had the problem on UEFI systems that after grub had loaded the kernel there were no boot messages shown until KMS kicked in and switched to the final framebuffer driver. Finally slashbeast, a fellow Gentoo dev, told me the correct kernel cmdline to make early boot messages appear:
CODE:
earlycon=efifb

As prerequisite, the kernel needs to have CONFIG_FB_EFI compiled in and CONFIG_X86_SYSFB disabled.

Hardware IOMMU and xhci controllers

You'd think that after the long time that has passed since the first USB3 controllers appeared on the computer market, Linux would have no big issues with USB3. Well, that's as far from reality as one might imagine.
For the last couple of months I had an annoying problem with my USB3 PCIe controller card. One out of four ports were dead and the other ones occasionally refused to work as well. Luckily at least one port always worked and so I just plugged my USB3-hub into that port and continued doing stuff. Fast forward to this week. I replaced my GPU with a more recent model and in order to fit the new card into the PC-tower, I had to shuffle around other PCIe cards. This also required the USB3-controller card to be removed. After I finally got the GPU into a fitting slot, I started to put the other PCIe cards back into the tower when I found a dark spot on my USB3 controller card:
broken_USB3_card.jpg
To my surprise some electronic part on that card had burnt and left a spot the size of a human thumb nail being entirely black. Now I had my explanation why one of the USB3-ports was always dead. Of course I didn't want to put a semi-broken card back into my PC so I tried to replace it with a spare part. And that's where the real fun began.

Booting Linux with that spare USB3 PCIe-card immediately resulted into the IOMMU shutting down the new USB3-card. What I found out the hard way with three(!) different USB3-controller cards was that no matter what kind of chip is being used on the cards, there's always some issue.
Apparently, cards with a "Renesas" chip don't have IOMMU issues but if you happen to use a card with too many of these chips being soldered onto the card, Linux just shuts the entire controller card down because Renesas chips seem to be the worst possible USB3 controller chips availabe with just way too many quirks.
On the other hand, cards with either "VIA" or "ASMedia" chips get in trouble when a (buggy?) hardware IOMMU is in use. My new card with two ASMedia chips booted nicely when no USB device was connected to it:

xhci_hcd 0000:09:00.0: xHCI Host Controller
xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8
xhci_hcd 0000:09:00.0: hcc params 0x0200ef81 hci version 0x110 quirks
0x0000000000000010 usb usb8: New USB device found, idVendor=1d6b,
idProduct=0002, bcdDevice= 5.04 usb usb8: New USB device strings:
Mfr=3, Product=2, SerialNumber=1 usb usb8: Product: xHCI Host Controller
usb usb8: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb8: SerialNumber: 0000:09:00.0
hub 8-0:1.0: USB hub found
hub 8-0:1.0: 2 ports detected
xhci_hcd 0000:09:00.0: xHCI Host Controller
xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 9
xhci_hcd 0000:09:00.0: Host supports USB 3.1 Enhanced SuperSpeed
usb usb9: We don't know the algorithms for LPM for this host, disabling
LPM. usb usb9: New USB device found, idVendor=1d6b, idProduct=0003,
bcdDevice= 5.04 usb usb9: New USB device strings: Mfr=3, Product=2,
SerialNumber=1 usb usb9: Product: xHCI Host Controller
usb usb9: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb9: SerialNumber: 0000:09:00.0
hub 9-0:1.0: USB hub found
hub 9-0:1.0: 2 ports detected
xhci_hcd 0000:08:00.0: xHCI Host Controller
xhci_hcd 0000:08:00.0: new USB bus registered, assigned bus number 10
xhci_hcd 0000:08:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000000000010
usb usb10: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04 usb usb10: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb10: Product: xHCI Host Controller
usb usb10: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb10: SerialNumber: 0000:08:00.0
hub 10-0:1.0: USB hub found
hub 10-0:1.0: 2 ports detected
xhci_hcd 0000:08:00.0: xHCI Host Controller
xhci_hcd 0000:08:00.0: new USB bus registered, assigned bus number 11
xhci_hcd 0000:08:00.0: Host supports USB 3.1 Enhanced SuperSpeed
usb usb11: We don't know the algorithms for LPM for this host,
disabling LPM. usb usb11: New USB device found, idVendor=1d6b,
idProduct=0003, bcdDevice= 5.04 usb usb11: New USB device strings:
Mfr=3, Product=2, SerialNumber=1 usb usb11: Product: xHCI Host
Controller usb usb11: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb11: SerialNumber: 0000:08:00.0
hub 11-0:1.0: USB hub found
hub 11-0:1.0: 2 ports detected


But as soon as I connected some device to that card:

usb 10-1: new high-speed USB device number 2 using xhci_hcd
xhci_hcd 0000:08:00.0: Abort failed to stop command ring: -110
xhci_hcd 0000:08:00.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:08:00.0: HC died; cleaning up
xhci_hcd 0000:08:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x100000000 flags=0x0000]
xhci_hcd 0000:08:00.0: Timeout while waiting for setup device command
usb 10-1: hub failed to enable device, error -62
usb usb10-port1: couldn't allocate usb_device
usb usb11-port1: couldn't allocate usb_device


Yeah! Well done IOMMU! Searching for "IOMMU xhci" results in lots of posts from Linux users who suffer the same problem. Some xhci kernel driver hacker told me that this is an IOMMU issue and usually requires to update the mainboard's BIOS. But my mainboard (Supermicro H8DG6-F) is EOL for quite a while already and the manufacturer told me that there will be no more BIOS updates for this board. So the only other solution was to disable usage of the hardware IOMMU in the Linux kernel. I added the following to my kernel command-line: "amd-iommu=off iommu=soft" and now I can finally use my new USB3 PCIe controller card.