Why do chess engines run slow on Haiku?

HaikuDepot offers the chess engine stockfish, which runs surprisingly slow on my system (compared to Linux or Windows).

You can test it from the terminal with
stockfish bench > /dev/null
The result shows, how many chess nodes (positions) per second are calculated on your hardware.

Stockfish version 9 from HaikuDepot is outdated, so I decided to download and compile three different chess engines on Haiku R1/beta2 and Linux.

Here are the results:

Engine                   Haiku          Linux		      command
---------------------------------------------------------------------------------------------------------
Stockfish 11		    580k nps	 910k nps	   ./stockfish bench > /dev/null
Ethereal 12.25		335k nps	 835k nps     ./ethereal bench
Xiphos 0.6			450k nps	 660k nps	   ./xiphos [ENTER AND] go depth 18

Same compile options and Makefile on both systems (-O3 -flto -mpopcnt -m64). Haiku with g++ 8.3 and Linux with g++ 9.3.

Any idea, why there is such a big difference?

https://github.com/official-stockfish/Stockfish/archive/sf_11.zip
https://github.com/AndyGrant/Ethereal/archive/V12.25.zip
https://github.com/milostatarevic/xiphos/archive/v0.6.zip

The issue seems to be related to my cpu Intel Celeron J3455.

A different system with Intel Core i5 showed only a minimal speed gap between Haiku and Linux.

screenshot599
screenshot600

The real question is:

why are Haiku binaries so slow on an Intel Celeron J3455 cpu?

Another test with gzip 1.1 on both systems:

time gzip --fast Haiku-r1b2-54154_111_x64.iso

This takes 59s on Linux and 186s on Haiku.
Linux is more than 3 times faster on the same real hardware without virtualisation!

It’s obvious, that gcc/g++ binaries suffer on this cpu at Haiku. But why?

Well, such high level benchmarking will not tell us much. Is the CPU otherwise idle when you do this? Or is there for example a buggy driver using all CPU for (mis)handling interrupts? Is there something spamming the syslog? Did you experiment with the “profile” command to see where the app spends most of its time in?

For your gzip test, you need to take IO out of the measurement if you want to check the CPU/compiler (and not the disk drivers). So you need to do this from a ramdisk or some other way to run gzip without reading and writing data to disk. And run some separate benchmarks for disk access (for example you could use bonnie++)

Thanks @PulkoMandy

The cpus are idle and disk is a fast sata ssd, but syslog is spammed by this messages

KERN: usb error xhci 0: KERN: link descriptor for pipe: max transfers count exceeded
KERN: usb error hub 14: KERN: error updating port status
KERN: usb error xhci 0: link descriptor for pipe: max transfers count exceeded
KERN: usb error hub 14: KERN: error updating port status

Any tip, how to get rid of them?

Create a bugreport and hope our local USB3 specialist @waddlesplash has some time to look into it? Or disable USB3 in your BIOS if you can, to run in USB2 mode instead.

2 Likes

The syslog spamming is related to an usb3 stick, from which the iso file was copied to my ssd.

After removing the stick and rebooting, spamming it’s gone for now.
But the slow execution speed was retained.

I’ll take a look at the profile output later.

There is no speed difference using gzip on a ramdisk instead of ssd:

/RAMDISK> time gzip --fast Haiku-r1b2-54154_111_x64.iso 

real    3m10,900s
user    3m7,243s
sys     0m1,720s

The profile looks normal to me. Maybe someone experienced could compare it to his system.

profile -o sf9-profile.txt stockfish bench > /dev/null

profiling results for thread "pthread func" (1911):
  tick interval:  1000 us
  total ticks:    0 (0 us)
  unknown ticks:  0 (0 us,   0.00%)
  dropped ticks:  0 (0 us,   0.00%)
  no functions were hit

profiling results for thread "pthread func" (1912):
  tick interval:  1000 us
  total ticks:    0 (0 us)
  unknown ticks:  0 (0 us,   0.00%)
  dropped ticks:  0 (0 us,   0.00%)
  no functions were hit

profiling results for thread "pthread func" (1913):
  tick interval:  1000 us
  total ticks:    2293 (2293000 us)
  unknown ticks:  0 (0 us,   0.00%)
  dropped ticks:  0 (0 us,   0.00%)

        hits     unknown    image
  ------------------------------------------------------------------------------
        1925           0     5331 /boot/system/bin/stockfish
           1           0     5332 /boot/system/lib/libstdc++.so.6.0.25
           7           0     5334 /boot/system/lib/libroot.so

        hits       in us    in %   image  function
  ------------------------------------------------------------------------------
         601      601000   26.21    5331  Eval::evaluate(Position const&)
         301      301000   13.13    5331  MovePicker::next_move(bool)
         106      106000    4.62    5331  Position::see_ge(Move, Value) const
         101      101000    4.40    5331  Position::attackers_to(Square, unsigned long) const
         100      100000    4.36    5331  Position::do_move(Move, StateInfo&, bool)
          99       99000    4.32    5331  ExtMove* generate<(GenType)0>(Position const&, ExtMove*)
          67       67000    2.92    5331  Pawns::probe(Position const&)
          61       61000    2.66    5331  Position::set_check_info(StateInfo*) const
          60       60000    2.62    5331  prefetch(void*)
          56       56000    2.44    5331  TranspositionTable::probe(unsigned long, bool&) const
          46       46000    2.01    5331  Position::undo_move(Move)
          44       44000    1.92    5331  ExtMove* generate<(GenType)1>(Position const&, ExtMove*)
          35       35000    1.53    5331  Position::slider_blockers(unsigned long, Square, unsigned long&) const
          28       28000    1.22    5331  Score Pawns::Entry::do_king_safety<(Color)0>(Position const&, Square)
          27       27000    1.18    5331  Thread::search()
          26       26000    1.13    5331  Score Pawns::Entry::do_king_safety<(Color)1>(Position const&, Square)
          25       25000    1.09    5331  ExtMove* generate<(GenType)3>(Position const&, ExtMove*)
          21       21000    0.92    5331  Position::key_after(Move) const
          21       21000    0.92    5331  Material::probe(Position const&)
          21       21000    0.92    5331  ExtMove* generate<(GenType)2>(Position const&, ExtMove*)
          16       16000    0.70    5331  Position::pseudo_legal(Move) const
          16       16000    0.70    5331  Position::is_draw(int) const
          11       11000    0.48    5331  Position::legal(Move) const
           6        6000    0.26    5331  MovePicker::MovePicker(Position const&, Move, Depth, ButterflyHistory const*, CapturePieceToHistory const*, Square)
           6        6000    0.26    5331  Position::do_null_move(StateInfo&)
           5        5000    0.22    5331  Position::gives_check(Move) const
           4        4000    0.17    5331  MainThread::check_time()
           3        3000    0.13    5331  prefetch2(void*)
           3        3000    0.13    5331  MovePicker::MovePicker(Position const&, Move, Depth, ButterflyHistory const*, CapturePieceToHistory const*, PieceToHistory const**, Move, Move*)
           2        2000    0.09    5331  MovePicker::MovePicker(Position const&, Move, Value, CapturePieceToHistory const*)
           2        2000    0.09    5331  ExtMove* generate<(GenType)5>(Position const&, ExtMove*)
           2        2000    0.09    5331  ExtMove* generate<(GenType)4>(Position const&, ExtMove*)
           1        1000    0.04    5334  BPrivate::hoardUnlock(mutex&)
           1        1000    0.04    5331  UCI::pv(Position const&, Depth, Value, Value)
           1        1000    0.04    5334  pow
           1        1000    0.04    5334  BPrivate::processHeap::free(void*)
           1        1000    0.04    5334  BPrivate::superblock::superblock(int, int, BPrivate::hoardHeap*)
           1        1000    0.04    5334  malloc
           1        1000    0.04    5334  tolower
           1        1000    0.04    5334  _kern_write
           1        1000    0.04    5332  __gnu_cxx::stdio_sync_filebuf<char, std::char_traits<char> >::xsputn(char const*, long)
           1        1000    0.04    5331  TimeManagement::init(Search::LimitsType&, Color, int)
           1        1000    0.04    5331  Endgame<(EndgameCode)16, ScaleFactor>::operator()(Position const&) const

profiling results for thread "stockfish" (1909):
  tick interval:  1000 us
  total ticks:    286 (286000 us)
  unknown ticks:  0 (0 us,   0.00%)
  dropped ticks:  0 (0 us,   0.00%)

        hits     unknown    image
  ------------------------------------------------------------------------------
         250           0     5331 /boot/system/bin/stockfish
           4           0     5332 /boot/system/lib/libstdc++.so.6.0.25
          24           0     5334 /boot/system/lib/libroot.so

        hits       in us    in %   image  function
  ------------------------------------------------------------------------------
         188      188000   65.73    5331  Bitboards::init()
          56       56000   19.58    5331  Bitbases::init()
          17       17000    5.94    5334  memset
           3        3000    1.05    5331  Thread::Thread(unsigned long)
           2        2000    0.70    5332  std::string::append(char const*, unsigned long)
           2        2000    0.70    5334  isspace
           1        1000    0.35    5331  main
           1        1000    0.35    5331  ExtMove* generate<(GenType)4>(Position const&, ExtMove*)
           1        1000    0.35    5331  UCI::CaseInsensitiveLess::operator()(std::string const&, std::string const&) const
           1        1000    0.35    5332  bool std::has_facet<std::ctype<char> >(std::locale const&)
           1        1000    0.35    5332  std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
           1        1000    0.35    5334  _kern_mutex_unlock
           1        1000    0.35    5334  _kern_spawn_thread
           1        1000    0.35    5334  _kern_write
           1        1000    0.35    5334  __mutex_lock
           1        1000    0.35    5334  log

My best guess is, that the J3455 cpu fails to use it’s full clock range. The Enhanced Intel SpeedStep® Technology allows the cpu to run between 600 and 2300 mhz. For some unknown reason it seems the higher frequencies are avoided at my system on Haiku.

To check this assumption, a tool showing the current clock frequency of the cpu would be great.
Anything on Haiku like cpu-z on Windows or lscpu on Linux?

https://ark.intel.com/content/www/us/en/ark/products/95594/intel-celeron-processor-j3455-2m-cache-up-to-2-3-ghz.html

OK, I learned something new: sysinfo :sunglasses:

~> sysinfo | grep running
4 Intel Celeron® J3455, revision 506c9 running at 1497MHz

Hoping that sysinfo data is “realtime”, then my assumtion seems confirmed. The J3455 cpu runs the whole time @ 1.5ghz.

Anything here, which can be maintained by an user?

1 Like

I don’t think so. Other than filing a bug report with your syslog and sysinfo…

Thanks,
now reading https://www.haiku-os.org/docs/welcome/de/bugreports.html

:japanese_ogre:… come on … Captcha Error @ registration & 100% cpu load & WebPositive unresponsive! :broken_heart:

1 Like

For the bug report go here: https://dev.haiku-os.org

2 Likes

With the help of @korli I could build a modified module intel_pstates with my cpu model number 92 (0x5c) included at line 141.

But this doesn’t help to solve the issue. Get some very strange results for my Intel Celeron J3455 4-core cpu from function is_cpu_model_supported:

intel_pstates debug output:

Cpu		Vendor		        Family		           Model
 0 	        0      		            	9	                    		48
 1	            1207495615	        5			                    128
 2         	739248128		    4		                 	    0
 3	            87			                4294967295 	    0

Vendor was only correct for cpu #0, Family and model never.

sysinfo shows these values:
4 Intel Celeron® J3455, revision 506c9 running at 1497MHz

CPU #0: “Intel® Celeron® CPU J3455 @ 1.50GHz”
Signature: 0x0506c9; Type 0, family 6, model 92, stepping 9

The added debug lines to function is_cpu_model_supported in haiku/src/add-ons/kernel/power/cpufreq/intel_pstates/intel_pstates.cpp:

dprintf(“Debug: found model number %u\n”, model);
dprintf(“Debug: found vendor %u\n”, cpu->arch.vendor);
dprintf(“Debug: found family number %u\n”, cpu->arch.family);

Is this engine useable/playable on Terminal? So one can create a gui for it with yab to have ai?

You probably mixed this with my thread Stockfish 12 chess engine ? Here I’m searching for help with my slow-running-cpu.

And yes, Stockfish 12 is useable in the terminal. Start it and type uci to see it’s options. Type quit to exit.
But there is already a great gui for playing with uci-chess-engines: ChessX from HaikuDepot.

2 Likes

3.2 Ghz Intel i3 (4-core), ran the Stockfish 9 bench:
Haiku x86_64 (hrev54547):
Total time (ms): 3297
Nodes searched: 5023629
Nodes/second: 1523696

Linux 4.18.5 x86_64 comparison:
Total time (ms): 3858
Nodes searched: 5023629
Nodes/second: 1302132

I don’t think most of the newer Intel processor Family 6 models (>=Y2014) are currently supported in Haiku’s cpufreq code.

NOTE: Based on my review, Haiku seems faster than Linux?!?

In the meantime I know, that the slow speed on Haiku is related to my Celereon J3455 (system doesn’t turbo boost the cpu).

But over 200 kn/s faster (17%) than on Linux feels odd. Any background activity there?

Just basic test with minimal background activity.

time unzip haiku-r1beta2-x86_64-anyboot.zip
Archive: haiku-r1beta2-x86_64-anyboot.zip
inflating: ReadMe.md
inflating: haiku-r1beta2-hrev54154_111-x86_64-anyboot.iso

real 0m21.239s
user 0m6.249s
sys 0m0.935s

~> time gzip --fast haiku-r1beta2-hrev54154_111-x86_64-anyboot.iso

real 1m28.427s
user 1m9.499s
sys 0m1.600s

Well, 88s compress with 21s decompress on BeFS.

If you do {sysinfo -cpu|grep EST) for cpufreq versus the ‘model filtering’, might work…
NOTE: Your processor doesn’t support Intel Turbo Boost, just EST

1 Like