Bare Metal vs. Virtualization Performance
I just built a new homelab server. Specs are as follows:
Neil's Lab Server Specifications | |
---|---|
CPU Model | Intel® Xeon® Gold 6326, 16 Cores (32 Threads), 2.90 GHz (Base), 3.50 GHz (Turbo) |
CPU Cooler | Noctua NH-U12S DX-4189 |
Motherboard | Supermicro X12SPi |
RAM | Samsung 6x16GB (96 GB) DDR4-3200 RDIMM ECC PC4-25600R Dual Rank |
NIC (On board) | Intel X550 2x 10G Base-T |
NIC (PCIe) | Supermicro AOC-SGP-I4 4x 1GbE |
OS NVMe | 2x1TB(2TB) Samsung 970 Pro |
OS NVMe Carrier | Supermicro AOC-SLG3-2M2 PCIe x8 |
NAS NVMe | 8x2TB(16TB) Samsung 970 Evo |
NAS NVMe Carrier | 2x Quad Gigabyte GC-4XM2G4 PCIe x16 |
Power Supply | EVGA 750 Watt 210-GQ-0750-V1 |
Chassis | NZXT H510i Flow |
I was trying to find performance differences of running an ESXi (or Proxmoxx) hypervisor and comparing it with bare metal performance. Google search yields garbage results. I've used VMWare Workstation Pro as well as VMWare Fusion before, but both of those are type 2 hypervisors.
Type 1 Hypervisor:
[Guest OS] [Guest OS]
[ Type 1 Hypervisor ]
[ Hardware ]
Type 2 Hypervisor:
[Guest OS] [Guest OS]
[ Type 2 Hypervisor ]
[ Host OS ]
[ Hardware ]
So naturally, I was curious what is the performance impact of running a virtualized OS on a Type 1 hypervisor? The most popular options appear to be:
- VMWare ESXi (vSphere Hypervisor)
- Microsoft Hyper-V
- Xen
- Proxmoxx
I was able to get a VMWare vSphere 7.0 Enterprise Plus license from UC Berkeley, so I went with VMWare ESXi. vSphere comes with a bazillion packages to manage a datacenter, hypervisor being one of the pieces.
With regards to benchmarking, I just used Geekbench 5.4.3 version for linux to simplify things. Ubuntu Server 20.04 was installed as a bare metal OS as well as a virtual machine on vmfs file system. Both running on Samsung 970 Pro 1TB NVMe drives.
Geekbench5 results are as follows:
I am quite pleased with these results. Letting go 5% of the bare metal performance to have the flexibility of running multiple VMs with cool features such as snapshots is definitely a no-brainer for me. ESXi is rock solid and very much enterprise-grade piece of software. I'll compare Proxmoxx when I get a chance next.
For completeness, the full Geekbench results can be accessed here:
- Bare metal - https://browser.geekbench.com/v5/cpu/11465975
- VM - https://browser.geekbench.com/v5/cpu/11466843
Bogus Ops test (stress-ng)
Using the stress-ng
utility, the results are even more impressive:
*********************
SINGLE CORE BOGUS OPS
*********************
VM :~$ sudo stress-ng --matrix 1 -t 60s --metrics-brief
dispatching hogs: 1 matrix
successful run completed in 60.00s (1 min, 0.00 secs)
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
matrix 237270 60.00 60.00 0.00 3954.50 3954.50
BARE METAL :~$ sudo stress-ng --matrix 1 -t 60s --metrics-brief
dispatching hogs: 1 matrix
successful run completed in 60.00s (1 min, 0.00 secs)
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
matrix 240077 60.00 59.99 0.00 4001.28 4001.95
*********************
MULTI CORE BOGUS OPS
*********************
VM: $ sudo stress-ng --matrix 0 -t 60s --metrics-brief
dispatching hogs: 32 matrix
successful run completed in 60.01s (1 min, 0.01 secs)
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
matrix 4531734 60.00 1919.50 0.00 75529.30 2360.89
BARE METAL: $ sudo stress-ng --matrix 0 -t 60s --metrics-brief
dispatching hogs: 32 matrix
successful run completed in 60.01s (1 min, 0.01 secs)
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
matrix 4578028 60.00 1919.57 0.00 76300.66 2384.92
*********************
MULTI CORE STRESS TEST
*********************
VM:~$ sudo stress-ng --cpu 32 --cpu-method all --perf -t 60
dispatching hogs: 32 cpu
successful run completed in 61.30s (1 min, 1.30 secs)
cpu:
0 Cache L1D Read 0.00 /sec
0 Cache L1D Read Miss 0.00 /sec
0 Cache L1D Write 0.00 /sec
0 Cache L1I Read Miss 0.00 /sec
0 Cache DTLB Read 0.00 /sec
0 Cache DTLB Read Miss 0.00 /sec
0 Cache DTLB Write 0.00 /sec
0 Cache DTLB Write Miss 0.00 /sec
0 Cache ITLB Read Miss 0.00 /sec
0 Cache BPU Read 0.00 /sec
0 Cache BPU Read Miss 0.00 /sec
1,918,611,096,224 CPU Clock 31.30 B/sec
1,918,611,245,760 Task Clock 31.30 B/sec
26,592 Page Faults Total 433.84 /sec
26,592 Page Faults Minor 433.84 /sec
0 Page Faults Major 0.00 /sec
1,376 Context Switches 22.45 /sec
0 CPU Migrations 0.00 /sec
0 Alignment Faults 0.00 /sec
0 Emulation Faults 0.00 /sec
26,528 Page Faults User 432.79 /sec
128 Page Faults Kernel 2.09 /sec
3,424 System Call Enter 55.86 /sec
3,392 System Call Exit 55.34 /sec
256 TLB Flushes 4.18 /sec
0 Kmalloc 0.00 /sec
0 Kmalloc Node 0.00 /sec
0 Kfree 0.00 /sec
32 Kmem Cache Alloc 0.52 /sec
0 Kmem Cache Alloc Node 0.00 /sec
32 Kmem Cache Free 0.52 /sec
25,376 MM Page Alloc 414.00 /sec
0 MM Page Free 0.00 /sec
1,124,672 RCU Utilization 18.35 K/sec
448 Sched Migrate Task 7.31 /sec
0 Sched Move NUMA 0.00 /sec
1,408 Sched Wakeup 22.97 /sec
0 Sched Proc Exec 0.00 /sec
0 Sched Proc Exit 0.00 /sec
0 Sched Proc Fork 0.00 /sec
0 Sched Proc Free 0.00 /sec
0 Sched Proc Hang 0.00 /sec
0 Sched Proc Wait 0.00 /sec
1,376 Sched Switch 22.45 /sec
32 Signal Generate 0.52 /sec
32 Signal Deliver 0.52 /sec
928 IRQ Entry 15.14 /sec
928 IRQ Exit 15.14 /sec
408,768 Soft IRQ Entry 6.67 K/sec
408,768 Soft IRQ Exit 6.67 K/sec
0 Writeback Dirty Inode 0.00 /sec
0 Writeback Dirty Page 0.00 /sec
0 Migrate MM Pages 0.00 /sec
0 SKB Consume 0.00 /sec
0 SKB Kfree 0.00 /sec
0 IOMMU IO Page Fault 0.00 /sec
0 IOMMU Map 0.00 /sec
0 IOMMU Unmap 0.00 /sec
0 Filemap page-cache add 0.00 /sec
0 Filemap page-cache del 0.00 /sec
0 OOM Compact Retry 0.00 /sec
0 OOM Wake Reaper 0.00 /sec
0 Thermal Zone Trip 0.00 /sec
BARE METAL:~$ sudo stress-ng --cpu 32 --cpu-method all --perf -t 60
dispatching hogs: 32 cpu
successful run completed in 61.30s (1 min, 1.30 secs)
cpu:
6,299,757,819,808 CPU Cycles 0.10 T/sec
5,413,717,415,168 Instructions 88.31 B/sec (0.859 instr. per cycle)
993,084,418,080 Branch Instructions 16.20 B/sec
21,493,168,320 Branch Misses 0.35 B/sec ( 2.16%)
47,817,010,688 Bus Cycles 0.78 B/sec
5,546,730,077,536 Total Cycles 90.48 B/sec
12,255,527,424 Cache References 0.20 B/sec
164,721,152 Cache Misses 2.69 M/sec ( 1.34%)
557,545,772,192 Cache L1D Read 9.10 B/sec
64,777,397,440 Cache L1D Read Miss 1.06 B/sec
390,375,844,512 Cache L1D Write 6.37 B/sec
1,156,206,112 Cache L1I Read Miss 18.86 M/sec
3,278,834,912 Cache LL Read 53.49 M/sec
3,822,464 Cache LL Read Miss 62.36 K/sec
85,624,000 Cache LL Write 1.40 M/sec
47,737,408 Cache LL Write Miss 0.78 M/sec
515,595,101,888 Cache DTLB Read 8.41 B/sec
279,424 Cache DTLB Read Miss 4.56 K/sec
369,159,797,632 Cache DTLB Write 6.02 B/sec
1,280,000 Cache DTLB Write Miss 20.88 K/sec
10,606,716,288 Cache ITLB Read Miss 0.17 B/sec
988,307,728,992 Cache BPU Read 16.12 B/sec
21,333,012,864 Cache BPU Read Miss 0.35 B/sec
9,740,960 Cache NODE Read 0.16 M/sec
0 Cache NODE Read Miss 0.00 /sec
51,092,544 Cache NODE Write 0.83 M/sec
0 Cache NODE Write Miss 0.00 /sec
1,917,173,399,968 CPU Clock 31.27 B/sec
1,917,190,695,168 Task Clock 31.28 B/sec
26,624 Page Faults Total 434.32 /sec
26,624 Page Faults Minor 434.32 /sec
0 Page Faults Major 0.00 /sec
160,672 Context Switches 2.62 K/sec
0 CPU Migrations 0.00 /sec
0 Alignment Faults 0.00 /sec
0 Emulation Faults 0.00 /sec
26,560 Page Faults User 433.27 /sec
128 Page Faults Kernel 2.09 /sec
3,936 System Call Enter 64.21 /sec
3,904 System Call Exit 63.69 /sec
256 TLB Flushes 4.18 /sec
0 Kmalloc 0.00 /sec
0 Kmalloc Node 0.00 /sec
64 Kfree 1.04 /sec
64 Kmem Cache Alloc 1.04 /sec
0 Kmem Cache Alloc Node 0.00 /sec
224 Kmem Cache Free 3.65 /sec
25,344 MM Page Alloc 413.44 /sec
32 MM Page Free 0.52 /sec
1,294,656 RCU Utilization 21.12 K/sec
128 Sched Migrate Task 2.09 /sec
0 Sched Move NUMA 0.00 /sec
160,832 Sched Wakeup 2.62 K/sec
0 Sched Proc Exec 0.00 /sec
0 Sched Proc Exit 0.00 /sec
0 Sched Proc Fork 0.00 /sec
0 Sched Proc Free 0.00 /sec
0 Sched Proc Hang 0.00 /sec
0 Sched Proc Wait 0.00 /sec
160,672 Sched Switch 2.62 K/sec
32 Signal Generate 0.52 /sec
32 Signal Deliver 0.52 /sec
992 IRQ Entry 16.18 /sec
992 IRQ Exit 16.18 /sec
366,624 Soft IRQ Entry 5.98 K/sec
366,624 Soft IRQ Exit 5.98 K/sec
0 Writeback Dirty Inode 0.00 /sec
0 Writeback Dirty Page 0.00 /sec
0 Migrate MM Pages 0.00 /sec
0 SKB Consume 0.00 /sec
0 SKB Kfree 0.00 /sec
0 IOMMU IO Page Fault 0.00 /sec
0 IOMMU Map 0.00 /sec
0 IOMMU Unmap 0.00 /sec
0 Filemap page-cache add 0.00 /sec
0 Filemap page-cache del 0.00 /sec
0 OOM Compact Retry 0.00 /sec
0 OOM Wake Reaper 0.00 /sec
0 Thermal Zone Trip 0.00 /sec