Working on testing infrastructure. I’d like to get feedback on two areas before I start building.
1. Filesystem Stress Testing
We have good targeted tests (B+Tree torture test, attribute iterator, checksumfs), but nothing for systematic stress testing. I’m thinking:
POSIX-level stress**:** rapid create/delete cycles, concurrent readers+writers, large files, BFS attribute operations at scale
Crash-consistency**:** write to BFS → kill VM mid-operation → reboot → run checkfs → verify no corruption (similar to how xfstests works for ext4/ZFS)
2. BPicture-Based UI Tests
Instead of pixel-comparing screenshots (high maintenance — breaks on any theme/font/layout change), record drawing commands via BPicture and compare serialized output:
Kit-level tests:BView::FillRoundRect() → record to BPicture → Flatten() → compare against reference file in repo. Tests that IK methods generate correct drawing commands.
App_server-level tests: record on the server side too, to catch cases where IK modifies parameters before sending AS_* messages.
Visual regression (informational only): render BPictures to bitmaps for before/after diffs in code review — not a pass/fail gate.
For both, I’m thinking of adding new Jam rules (FSTest, UITest) so adding a test is a one-liner. Feedback on priorities and the technical questions above is welcome!
Kit-level tests:BView::FillRoundRect() → record to BPicture → Flatten() → compare against reference file in repo. Tests that IK methods generate correct drawing commands.
I think we already do something like this, only not for interface widgets: in src/tests/kits/interface/flatten_picture there are tests which render something directly on a BBitmap and through a BPicture, and then compare the results. I think it could be extended / reworked to do UI tests.
I think we have bonnie++ in the depots, and there’s also dirconc that has been run as well. That should at least take care of most of the first item under #1?
okay thanks for saving me from redundant work, Can you share me the related documents and if there is any resources available on it, would it be good to add full crash consistency tests ? or working on UI tests would be better ??
bonnie++ is designed mainly as a benchmark tool for measuring performances.
For filesystem testing, it would be interesting to run xfstests, originally developped for the XFS filesystem, but usable also with other ones.
I’m not sure if we already ported it. This is a test suite with many specific tests, and it would surely give interesting results, highlighting problems not only in the filesystems, but also in Haiku VFS layer.
I’ve been thinking about how we could add automated testing for Interface Kit, and I wanted to run an idea by you before going too deep.
The problem: Right now we mostly test UI stuff manually. Change something in app_server? Better launch a bunch of apps and click around. It’s time-consuming and bugs still slip through.
Use BPicture and PicturePlayer to test the actual drawing commands, not just pixel output. So if I call view->FillRect(10, 10, 50, 50), we’d record that to a BPicture, then verify the command sequence matches what we expect. This way tests won’t break every time font rendering gets tweaked.
For visual stuff that needs pixel comparison (like button appearance), we’d add region masking - ignore the timestamp area, ignore usernames, but verify the rest looks right.
Is comparing BPicture commands robust enough, or will internal format changes break everything?
For visual tests, is masking dynamic regions (time, names) actually practical?
Would this catch the kinds of bugs we actually see in Interface Kit?
I’m thinking this could integrate with our existing CppUnit tests and share a helper library. Store reference files (both .bpicture and .png) in the repo alongside tests.
Can we use it directly over bfs for testing? I think we can use this implementation to create similar testing infrastructure for filesystem in codebase. I’ll look into it.
I’m working on a proposal focused on adding testing infrastructure to the codebase and wanted to get your feedback on scope.
My plan is to tackle both filesystem testing and UI testing infrastructure within a single GSoC timeframe. Here’s my thinking:
Filesystem Testing (Weeks 1-6):
I’m already familiar with this area, and much of the groundwork exists — bfs_shell, checkfs/CheckVisitor, and patterns from fat_test.sh. The work is primarily orchestration and test harness code rather than implementing new filesystem internals. I’m confident this won’t require the full program duration.
UI/Interface Kit Testing (Weeks 7-24):
Once the FS testing infrastructure is in place and running in CI, I’d shift focus to the BPicture-based Interface Kit testing discussed earlier. This is a separate but complementary piece of testing infrastructure.
Proposed sequencing:
FS stress tests + crash-consistency (first half)
UI/BPicture regression testing (second half)
Does this scope seem reasonable for a single GSoC, or would it be better to focus exclusively on one area? I’m happy to adjust based on mentor feedback.