The issue seems to be related to my cpu Intel Celeron J3455.
A different system with Intel Core i5 showed only a minimal speed gap between Haiku and Linux.
The issue seems to be related to my cpu Intel Celeron J3455.
A different system with Intel Core i5 showed only a minimal speed gap between Haiku and Linux.
The real question is:
why are Haiku binaries so slow on an Intel Celeron J3455 cpu?
Another test with gzip 1.1 on both systems:
time gzip --fast Haiku-r1b2-54154_111_x64.iso
This takes 59s on Linux and 186s on Haiku.
Linux is more than 3 times faster on the same real hardware without virtualisation!
It’s obvious, that gcc/g++ binaries suffer on this cpu at Haiku. But why?
Well, such high level benchmarking will not tell us much. Is the CPU otherwise idle when you do this? Or is there for example a buggy driver using all CPU for (mis)handling interrupts? Is there something spamming the syslog? Did you experiment with the “profile” command to see where the app spends most of its time in?
For your gzip test, you need to take IO out of the measurement if you want to check the CPU/compiler (and not the disk drivers). So you need to do this from a ramdisk or some other way to run gzip without reading and writing data to disk. And run some separate benchmarks for disk access (for example you could use bonnie++)
Thanks @PulkoMandy
The cpus are idle and disk is a fast sata ssd, but syslog is spammed by this messages
KERN: usb error xhci 0: KERN: link descriptor for pipe: max transfers count exceeded
KERN: usb error hub 14: KERN: error updating port status
KERN: usb error xhci 0: link descriptor for pipe: max transfers count exceeded
KERN: usb error hub 14: KERN: error updating port status
Any tip, how to get rid of them?
Create a bugreport and hope our local USB3 specialist @waddlesplash has some time to look into it? Or disable USB3 in your BIOS if you can, to run in USB2 mode instead.
The syslog spamming is related to an usb3 stick, from which the iso file was copied to my ssd.
After removing the stick and rebooting, spamming it’s gone for now.
But the slow execution speed was retained.
I’ll take a look at the profile output later.
There is no speed difference using gzip on a ramdisk instead of ssd:
/RAMDISK> time gzip --fast Haiku-r1b2-54154_111_x64.iso
real 3m10,900s
user 3m7,243s
sys 0m1,720s
The profile looks normal to me. Maybe someone experienced could compare it to his system.
profile -o sf9-profile.txt stockfish bench > /dev/null
profiling results for thread "pthread func" (1911):
tick interval: 1000 us
total ticks: 0 (0 us)
unknown ticks: 0 (0 us, 0.00%)
dropped ticks: 0 (0 us, 0.00%)
no functions were hit
profiling results for thread "pthread func" (1912):
tick interval: 1000 us
total ticks: 0 (0 us)
unknown ticks: 0 (0 us, 0.00%)
dropped ticks: 0 (0 us, 0.00%)
no functions were hit
profiling results for thread "pthread func" (1913):
tick interval: 1000 us
total ticks: 2293 (2293000 us)
unknown ticks: 0 (0 us, 0.00%)
dropped ticks: 0 (0 us, 0.00%)
hits unknown image
------------------------------------------------------------------------------
1925 0 5331 /boot/system/bin/stockfish
1 0 5332 /boot/system/lib/libstdc++.so.6.0.25
7 0 5334 /boot/system/lib/libroot.so
hits in us in % image function
------------------------------------------------------------------------------
601 601000 26.21 5331 Eval::evaluate(Position const&)
301 301000 13.13 5331 MovePicker::next_move(bool)
106 106000 4.62 5331 Position::see_ge(Move, Value) const
101 101000 4.40 5331 Position::attackers_to(Square, unsigned long) const
100 100000 4.36 5331 Position::do_move(Move, StateInfo&, bool)
99 99000 4.32 5331 ExtMove* generate<(GenType)0>(Position const&, ExtMove*)
67 67000 2.92 5331 Pawns::probe(Position const&)
61 61000 2.66 5331 Position::set_check_info(StateInfo*) const
60 60000 2.62 5331 prefetch(void*)
56 56000 2.44 5331 TranspositionTable::probe(unsigned long, bool&) const
46 46000 2.01 5331 Position::undo_move(Move)
44 44000 1.92 5331 ExtMove* generate<(GenType)1>(Position const&, ExtMove*)
35 35000 1.53 5331 Position::slider_blockers(unsigned long, Square, unsigned long&) const
28 28000 1.22 5331 Score Pawns::Entry::do_king_safety<(Color)0>(Position const&, Square)
27 27000 1.18 5331 Thread::search()
26 26000 1.13 5331 Score Pawns::Entry::do_king_safety<(Color)1>(Position const&, Square)
25 25000 1.09 5331 ExtMove* generate<(GenType)3>(Position const&, ExtMove*)
21 21000 0.92 5331 Position::key_after(Move) const
21 21000 0.92 5331 Material::probe(Position const&)
21 21000 0.92 5331 ExtMove* generate<(GenType)2>(Position const&, ExtMove*)
16 16000 0.70 5331 Position::pseudo_legal(Move) const
16 16000 0.70 5331 Position::is_draw(int) const
11 11000 0.48 5331 Position::legal(Move) const
6 6000 0.26 5331 MovePicker::MovePicker(Position const&, Move, Depth, ButterflyHistory const*, CapturePieceToHistory const*, Square)
6 6000 0.26 5331 Position::do_null_move(StateInfo&)
5 5000 0.22 5331 Position::gives_check(Move) const
4 4000 0.17 5331 MainThread::check_time()
3 3000 0.13 5331 prefetch2(void*)
3 3000 0.13 5331 MovePicker::MovePicker(Position const&, Move, Depth, ButterflyHistory const*, CapturePieceToHistory const*, PieceToHistory const**, Move, Move*)
2 2000 0.09 5331 MovePicker::MovePicker(Position const&, Move, Value, CapturePieceToHistory const*)
2 2000 0.09 5331 ExtMove* generate<(GenType)5>(Position const&, ExtMove*)
2 2000 0.09 5331 ExtMove* generate<(GenType)4>(Position const&, ExtMove*)
1 1000 0.04 5334 BPrivate::hoardUnlock(mutex&)
1 1000 0.04 5331 UCI::pv(Position const&, Depth, Value, Value)
1 1000 0.04 5334 pow
1 1000 0.04 5334 BPrivate::processHeap::free(void*)
1 1000 0.04 5334 BPrivate::superblock::superblock(int, int, BPrivate::hoardHeap*)
1 1000 0.04 5334 malloc
1 1000 0.04 5334 tolower
1 1000 0.04 5334 _kern_write
1 1000 0.04 5332 __gnu_cxx::stdio_sync_filebuf<char, std::char_traits<char> >::xsputn(char const*, long)
1 1000 0.04 5331 TimeManagement::init(Search::LimitsType&, Color, int)
1 1000 0.04 5331 Endgame<(EndgameCode)16, ScaleFactor>::operator()(Position const&) const
profiling results for thread "stockfish" (1909):
tick interval: 1000 us
total ticks: 286 (286000 us)
unknown ticks: 0 (0 us, 0.00%)
dropped ticks: 0 (0 us, 0.00%)
hits unknown image
------------------------------------------------------------------------------
250 0 5331 /boot/system/bin/stockfish
4 0 5332 /boot/system/lib/libstdc++.so.6.0.25
24 0 5334 /boot/system/lib/libroot.so
hits in us in % image function
------------------------------------------------------------------------------
188 188000 65.73 5331 Bitboards::init()
56 56000 19.58 5331 Bitbases::init()
17 17000 5.94 5334 memset
3 3000 1.05 5331 Thread::Thread(unsigned long)
2 2000 0.70 5332 std::string::append(char const*, unsigned long)
2 2000 0.70 5334 isspace
1 1000 0.35 5331 main
1 1000 0.35 5331 ExtMove* generate<(GenType)4>(Position const&, ExtMove*)
1 1000 0.35 5331 UCI::CaseInsensitiveLess::operator()(std::string const&, std::string const&) const
1 1000 0.35 5332 bool std::has_facet<std::ctype<char> >(std::locale const&)
1 1000 0.35 5332 std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
1 1000 0.35 5334 _kern_mutex_unlock
1 1000 0.35 5334 _kern_spawn_thread
1 1000 0.35 5334 _kern_write
1 1000 0.35 5334 __mutex_lock
1 1000 0.35 5334 log
My best guess is, that the J3455 cpu fails to use it’s full clock range. The Enhanced Intel SpeedStep® Technology allows the cpu to run between 600 and 2300 mhz. For some unknown reason it seems the higher frequencies are avoided at my system on Haiku.
To check this assumption, a tool showing the current clock frequency of the cpu would be great.
Anything on Haiku like cpu-z on Windows or lscpu on Linux?
OK, I learned something new: sysinfo
~> sysinfo | grep running
4 Intel Celeron® J3455, revision 506c9 running at 1497MHz
Hoping that sysinfo
data is “realtime”, then my assumtion seems confirmed. The J3455 cpu runs the whole time @ 1.5ghz.
Anything here, which can be maintained by an user?
I don’t think so. Other than filing a bug report with your syslog and sysinfo…
Thanks,
now reading https://www.haiku-os.org/docs/welcome/de/bugreports.html
… come on … Captcha Error @ registration & 100% cpu load & WebPositive unresponsive!
With the help of @korli I could build a modified module intel_pstates
with my cpu model number 92 (0x5c) included at line 141.
But this doesn’t help to solve the issue. Get some very strange results for my Intel Celeron J3455 4-core cpu from function is_cpu_model_supported
:
intel_pstates debug output:
Cpu Vendor Family Model
0 0 9 48
1 1207495615 5 128
2 739248128 4 0
3 87 4294967295 0
Vendor was only correct for cpu #0, Family and model never.
sysinfo shows these values:
4 Intel Celeron® J3455, revision 506c9 running at 1497MHz
CPU #0: “Intel® Celeron® CPU J3455 @ 1.50GHz”
Signature: 0x0506c9; Type 0, family 6, model 92, stepping 9
The added debug lines to function is_cpu_model_supported
in haiku/src/add-ons/kernel/power/cpufreq/intel_pstates/intel_pstates.cpp
:
dprintf(“Debug: found model number %u\n”, model);
dprintf(“Debug: found vendor %u\n”, cpu->arch.vendor);
dprintf(“Debug: found family number %u\n”, cpu->arch.family);
Is this engine useable/playable on Terminal? So one can create a gui for it with yab to have ai?
You probably mixed this with my thread Stockfish 12 chess engine ? Here I’m searching for help with my slow-running-cpu
.
And yes, Stockfish 12 is useable in the terminal. Start it and type uci
to see it’s options. Type quit
to exit.
But there is already a great gui for playing with uci-chess-engines: ChessX from HaikuDepot.
3.2 Ghz Intel i3 (4-core), ran the Stockfish 9 bench:
Haiku x86_64 (hrev54547):
Total time (ms): 3297
Nodes searched: 5023629
Nodes/second: 1523696
Linux 4.18.5 x86_64 comparison:
Total time (ms): 3858
Nodes searched: 5023629
Nodes/second: 1302132
I don’t think most of the newer Intel processor Family 6 models (>=Y2014) are currently supported in Haiku’s cpufreq code.
NOTE: Based on my review, Haiku seems faster than Linux?!?
In the meantime I know, that the slow speed on Haiku is related to my Celereon J3455 (system doesn’t turbo boost the cpu).
But over 200 kn/s faster (17%) than on Linux feels odd. Any background activity there?
Just basic test with minimal background activity.
time unzip haiku-r1beta2-x86_64-anyboot.zip
Archive: haiku-r1beta2-x86_64-anyboot.zip
inflating: ReadMe.md
inflating: haiku-r1beta2-hrev54154_111-x86_64-anyboot.iso
real 0m21.239s
user 0m6.249s
sys 0m0.935s
~> time gzip --fast haiku-r1beta2-hrev54154_111-x86_64-anyboot.iso
real 1m28.427s
user 1m9.499s
sys 0m1.600s
Well, 88s compress with 21s decompress on BeFS.
If you do {sysinfo -cpu|grep EST) for cpufreq versus the ‘model filtering’, might work…
NOTE: Your processor doesn’t support Intel Turbo Boost, just EST
Thanks for your idea. I patched function is_cpu_model_supported
for my cpu to return true. Syslog shows now the correct entry:
KERN: using Intel P-States: min 8, max 15, boost 23
KERN: found cpufreq module: power/cpufreq/intel_pstates/v1
KERN: using Intel C-States: C0, C1, C3, C4, C5, C6, C7
KERN: found cpuidle module: power/cpuidle/intel_cstates/v1
But all of my bench tests witch chess engines still give low values!
For unknown reason p-state 23 (=2.3ghz) gets not activated by Haiku.
The cpu remains the whole time stuck at it’s default clock (p-state 15) with 1.5ghz.