I have a pair of pcengines APU2 devices running OpenBSD as firewalls. They're getting a bit long in the tooth, but they work well and since my internet uplink is 100Mbit/s, they still have are more than fast enough to support that. They have, however, become the limiting factor for copying around the VM images that I use for OpenSSH testing, capping out at around 300Mbit/s on local copies. This annoyed me enough that I wanted to do something about it.
The since the platform is pushing ten years of age, the obvious thing to do would be to update the hardware. Unfortunately there is not an obvious successor to the APU2 series in the "long term supported, low power, serial console, multi-network amd64" hardware category. Plus, they support jumbo frames, and testing with those showed they already could send close to a gigabit already, so it seemed like they should be able to receive that too.
I knew from previous experience that a large amount of the cost is per-packet overhead, and since the hosts, the firewall and the switch all supported jumbo packets that I should try that. Unfortunately the results were disappointing: jumbo packets did not significantly improve the performance. This turned out to be due to a couple of things: a previously unknown bug in the em(4) driver on specific chips that caused ethernet frames of very specific sizes around mbuf boundaries to be truncated which invalidated some tests, and poor hardware performance seemingly caused by PCIe power management.
When testing, there's quite a few variables to be aware of, and if possible control for:
hw.perfpolicy defaults to
high, which for a device that doesn't run on battery is
the same as hw.setperf=100. For most tests, we explictly set
hw.perfpolicy=manual, usually with
hw.setperf=100.kern.pool_debugreorder_kernel relinks the kernel every boot. This
takes some cpu, so you'll need to either wait until it's done to do
your tests, or temporarily disable it.Enter BIOS settings via F10, then make sure you have the following
settings:
Enabled: Core
Performance Boost.
Disabled: PCIe Power Management.
r Restore boot order defaults
n Network/PXE boot - Currently Disabled
u USB boot - Currently Enabled
t Serial console - Currently Enabled
k Redirect console output to COM2 - Currently Disabled
o UART C - Currently Enabled - Toggle UART C / GPIO
p UART D - Currently Enabled - Toggle UART D / GPIO
m Force mPCIe2 slot CLK (GPP3 PCIe) - Currently Disabled
h EHCI0 controller - Currently Disabled
l Core Performance Boost - Currently Enabled
i Watchdog - Currently Disabled
j SD 3.0 mode - Currently Disabled
g Reverse order of PCI addresses - Currently Disabled
v IOMMU - Currently Disabled
y PCIe power management features - Currently Disabled
w Enable BIOS write protect - Currently Disabled
z Clock menu
x Exit setup without save
s Save configuration and exit
1500 MTU: (1500-40) / (38+1500) * 119.2Mbyte/s = 0.95 * 119.2MByte/s = 113.24MByte/s
9000 MTU: (9000-40) / (38+9000) * 119.2Mbyte/s = 0.99 * 119.2MByte/s = 118MBytes/s.
| MTU | hw.setperf | CPB | PCIe power mgmt | pf | MByte/s |
|---|---|---|---|---|---|
| 1500 | 0 | disabled | enabled | pass in quick on em1 | 32.6 |
| 1500 | 0 | disabled | enabled | set skip on em1 | 32.5 |
| 1500 | 100 | disabled | enabled | pass in quick on em1 | 33.5 |
| 1500 | 100 | disabled | enabled | set skip on em1 | 36.5 |
| 1500 | 0 | enabled | disabled | pass in quick on em1 | 30.0 |
| 1500 | 0 | enabled | disabled | set skip on em1 | 38.9 |
| 1500 | 100 | enabled | disabled | pass in quick on em1 | 46.8 |
| 1500 | 100 | enabled | disabled | set skip on em1 | 55.7 |
| 9000 | 0 | enabled | disabled | pass in quick on em1 | 107.0 |
| 9000 | 0 | enabled | disabled | set skip on em1 | 109.0 |
| 9000 | 100 | enabled | disabled | pass in quick on em1 | 111.0 |
| 9000 | 100 | enabled | disabled | set skip on em1 | 111.0 |