[GSoC 2026] Improving automated test coverage and system documentation

Hi everyone,

I’ve completed my proposal regarding automating the test and expanding coverage in Haiku’s codebase. This proposal aims to add infrastructure to enhance the testing coverage, mainly in the filesystem(BFS Driver) testing and UI testing area (Interface kit).

Feedback and new ideas are welcome for further refinement in the proposal.
Proposal

7 Likes

Have a good voyage with this period of GSoC and your testing improvements -

There are more peoples here who run several test against Haiku - POSIX compatibility, clang compliance, just to name a few - so it will be welcomed well !.. :cowboy_hat_face:

Hii guys,

Whenever you have time, Can you look at my proposal. @KapiX @PulkoMandy @nephele

Can you provide the proposal as a .odt or .pdf? Looking at it in google docs does not work correctly for me. Also keep in mind that I am not a GSOC mentor and have no clue how that stuff works, so I will only be able to comment on some technical stuff :slight_smile:

Hii Nephele,

Here is the PDF version of the Proposal.

Yeah, that’s okay :slightly_smiling_face: . The Technical feedback and refinement of Proposal is much more Important.

Hello everyone,

To simplify and automate filesystem benchmarking on Haiku, I’ve created the fs-test framework.

My current benchmarking workflow is:

  1. Boot into a Haiku system using a test image (an .iso).

  2. Inside that running Haiku instance, install the bonnie++ benchmarking tool.

  3. Create a virtual disk image to test the target filesystem and mount it.

  4. Run the fs-test script against the mounted disk image, which automatically executes the bonnie++ benchmark and safely stores the results.

This makes comparing multiple hardware setups or filesystems (like BFS or ext2) incredibly easy. I’m thinking of adding all the generic tests to the filesystems inside this framework. Is this approach right? Here is the result of Benchmarking :

~/haiku/src/tests/add-ons/kernel/file_systems> ./fs-test --fs=bfs --test=benchmark --format=human /boot/home/mnt/
Filesystem: bfs
Mount path: /boot/home/mnt/

[TEST] benchmark
----------------------------------------
# bonnie++ benchmark on /boot/home/mnt/ (bfs)
# bonnie++ -d /boot/home/mnt/ -s 1G -n 16 -u user -q
# Results: ./results/bfs_bonnie_20260318_194741.csv
# Version 2.00a       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
shredder         1G 1332k  99  176m  14 10.9m   5 2583k  71 12.5m   5  4573  63
Latency             15325us     513ms     373ms   14411us   16025us   22463us
Version 2.00a       ------Sequential Create------ --------Random Create--------
shredder            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16384   5 +++++ +++ 16384   6 16384   6 +++++ +++ 16384   7
Latency              3895ms      90us    3388ms    2861ms     660us    4609ms
1.98,2.00a,shredder,1,1773863696,1G,,8192,5,1332,99,179974,14,11206,5,2583,71,12795,5,4573,63,16,,,,,737,5,+++++,+++,879,6,683,6,+++++,+++,641,7,15325us,513ms,373ms,14411us,16025us,22463us,3895ms,90us,3388ms,2861ms,660us,4609ms
ok 1 - benchmark
----------------------------------------
[PASS] benchmark

========================================
SUMMARY: 1 passed, 0 failed
Results: ./results
========================================

1 Like

I’ve had a look at the proposal, most things look OK to me.

What irritates me a bit is the focus of putting this all into a CI? Is your intention to run BFS stress tests on each commit? If so then why? that seems unreasonably computationally expensive (and in power).

Beeing able to ask an auomation on gerrit to run some tests could be very cool, like ask on a UI change to run some subset of UI tests, or on a filesystem change to run some fs tests or your bfs stress test.

I like the BPicture testing idea, I’ve had similar ideas in the past. But just note that any “pixel change” must have come from either a change of the BPicture or the rendering stack itself. In some cases both have to be tested, in others only one. So it may be usefull to also have pixel based tests for some stuff (for example to verify if fonts still render the same)

I don’t understand the section “Visual Testing Strategy”
What is the point to try to filter out “dynamic content” (Surely here you’d work with a known set of data and not something as fragile as a live server?) or stuff that is visually similar? Pixel tests should fail when a pixel is different, this allows you to know your change impacts stuff. I don’t think a heuristic should decide for you that it’s not significant, but rather tell you that the change happened.

One example here is when waddlesplash reworked a change to change the gradients rendering in the default control look, this affected many colors and moving parts; Beeing able to know just which UI elements render the same or not the same would have been invaluable, as we had done this comparison manually. But just checking if something is “similarily white” for example would not have acomplished this.

One additional thing, why would a CI, or any automated way to run these tests need a shell script? Can’t there be a jam target to run to run these tests? It would be much more convenient if the way to do this matched for me, on my pc, and what a automated tester could do, so i could more easily compare. It would also allow me to specify just which automated tests to run in gerrit instead of running everything on each whitespace change :wink:

Hey everyone,

Thanks for all the feedback — really helpful. I’ve gone through all the comments and updated the proposal. Major changes:

  • Removed the Visual Testing Strategy section entirely — OpenQA already covers this
  • Dropped dynamic content masking — not relevant in a unit test context with fixed datasets
  • Kept pixel comparison and reframed BPicture — both test different things and are both needed
  • Switched unified runner from shell scripts to Jam targets
  • Fixed the crash loop kill window to trigger during an active write instead of a random sleep

One thing I’d like clarity on before finalising the UI testing section — given that CI builders run on Linux and Interface Kit tests can’t run there, would it make sense to keep UI testing scoped to Haiku’s local infrastructure for developer use? Or should I drop it entirely and focus fully on BFS?

1 Like