Unit Test Conversion to BDD Style (Given/When/Then)

Hi everyone,

I’m starting to work on converting some of the existing tests to a more BDD-style structure, beginning with BClipboard. I wanted to share the approach I’m planning to use to make sure it aligns with expectations.

Since we don’t appear to have a Gherkin / cucumber-style framework in the current test infrastructure, the idea is to apply BDD directly in CppUnit, with the Given / When / Then flow encoded in the test method names, and comments used only when the name becomes too long or needs clarification.

The basic convention I’m following is:

Given<precondition>_When<action>_Then<expected_behavior>

(BClipboard construction)

Original tests were numbered cases. In BDD style, the same behavior would be expressed like this:

class BClipboardTester : public TestCase
{
public:
    BClipboardTester() {}
    BClipboardTester(std::string name) : TestCase(name) {}

    // Feature: BClipboard creation
    void GivenNullName_WhenClipboardIsCreated_ThenDefaultsToSystemClipboard();
    void GivenValidName_WhenClipboardIsCreated_ThenNameIsPreserved();

    static Test* Suite();
};

The second point is, how to compile an Individual Unit test particular to an application? Is there any thread I can refer to?

Feedback Welcome.

Hi Priyanshu, as someone who has only ever used our tests interactively maybe you can elaborate more on what “BDD” means, I’ve not heard of that before :slight_smile:

Hi nephele,

So BDD (Behavior-Driven Development) is about writing tests that describe the behavior of the system rather than just checking functions. By using the Given_When_Then structure in our CppUnit names, we turn our test suite into a ‘living document.’ If a test fails, the name tells us exactly what the expected business logic was, making it much faster to debug than generic names like testCase1. Also the current code doesn’t have full test coverage, in many places the test are failing.

So Kacper has proposed an idea, that covers-

Goal: To achieve near 100% code coverage for various Haiku components, following Behavior-Driven Development (BDD) principles for new tests.

Creating a Test Image: A key part of the current roadmap is integrating unit tests into the Continuous Integration (CI) system. This involves merging (#2080, this merged now) that will allow the build system to produce a dedicated test image. This image will then be booted in a Virtual Machine (QEMU) to run the tests automatically and report results.

Doesn’t creating a new class for this stuff go against the goal? I’d think annotating existing functions would work better.

I think the style proposed is really difficult to read and It would be hard to use for me as a result, especially wheb it is expected to write new tests for code as it is beeing written.

We aren’t creating new test class, just annotating the existing one there is also one more style format you can take a look. Here

I fear that quickly the number and the complexity to have a short description of prcondition, action and expected_behavior will make it hard to read, understand, and maintain.

Maybe, instead of trying to keep using CppUTest and add our own layer of BDD above, switching to Catch2, which have already BDD support, will make more sense, no?

While I’m used to CppUTest, I’m not with Catch2 (yet, at work we just started to use it), but it seems to support BDD :

SCENARIO( “vector can be sized and resized” ) {
  GIVEN( “An empty vector” ) {

    auto v = std::vectorstd::string{};    

  // Validate assumption of the GIVEN clause     
  THEN( "The size and capacity start at 0" ) {
    REQUIRE( v.size() == 0 );
    REQUIRE( v.capacity() == 0 );
  }

  // Validate one use case for the GIVEN object
  WHEN( "push_back() is called" ) {
    v.push_back("hullo");
    
    THEN( "The size changes" ) {
      REQUIRE( v.size() == 1 );
      REQUIRE( v.capacity() >= 1 );
    }
  }
}
3 Likes

I can see that it’s verbose, but I don’t see how it’s difficult to read. Its major advantage is if you have good coverage, you can usually skip reading test code and debugging and infer what is wrong with your code just using unit test names and their status in runner summary.

Readability in code can be improved by adding macros and I do have a Proof of Concept which would look like this:

class TimerTests {
    CPPUNIT_TEST_SUITE(TimerTests);
    TEST(Timer, SetToZero, ReturnsZero);
    CPPUNIT_TEST_SUITE_END();
};

TEST_DEF(TimerTests, Timer, SetToZero, ReturnsZero) {
    // (...)
}

These macros would convert to Given_When_Then name and that’s what would end up being reported in Terminal.

@Priyanshu please use these CppUnit helper macros when writing tests. Writing a method name 3 or 4 times in one file for each method is a waste of time. EDIT: using them in our tests is more complicated than I expected, RFC change.

If the names are too long or too complicated then it is no longer a unit test. Another advantage - clear indication when it’s getting too complicated.

I know Catch2 and I’m using it at work extensively, but there are other considerations:

  1. GCC2 support - while I’m leaning toward removing GCC2 support from tests, I didn’t make a decision yet.
  2. We have custom extensions to the testing framework - I don’t know how deeply they are integrated, but they would need a rewrite.
  3. Changing framework while trying to do a lot of other things to set-up automated testing is a recipe for getting nothing done. There are more important things for now.

I’d also say that calling this a BDD layer, when in reality it’s just a naming scheme is an exaggeration.

Most variable names are, if they use pascal case, are just severall descriptive like “serialPort”, it’s a port and it is a serial one, or “unsafeString” a string that is unsafe etc.

They basically are never whole sentences, and that difference makes it very difficult to read for me

OK, this is a personal preference. Maybe you can have a script that converts from PascalCase to snake_case and when you’re done with your changes convert it back. That would be a nifty editor feature, to allow visually changing variable naming style…

I’m not fully convinced by the BDD approach and especially trying to mix the BDD-based documentation with the code.

My experience was that it creates code that is very structured for the tests, and quite easy to follow, but as a documentation, it isn’t that great (or, at least, it documents the test itself, but not really the behavior of the thing being tested, at least not in an approachable way).

I think I’m fine with the idea (either as given/when/then, or an alternative test setup/test execution/check results form), but maybe done with well-defined comments rather than macros. Also useful in our case (due to how things worked in the past and our backlog of knowledge being stored that way) would be reference to past commits and tickets where available (basically, what led to a specific test being written).

That being said, my paid work is in a quite different domain, embedded systems where most of the constraints are about realtime data processing, and cannot really be checked by unit tests (we however do a lot of automated integration test at a higher level, something that would be more difficult on graphical user interface based systems).

So I’ll let the experts decide :slight_smile:

Is there a goal to exploit the BDD form to generate some kind of documentation (test coverage report, specification, …) or is it just a way to structure the tests themselves?

If you want to reach 100% code coverage as you wrote in your previous message, you certainly will have to.

GCC2 support would be useful if you want to run some of the tests on BeOS to compare behavior. At this point in Haiku life I don’t know if it is worth it.

Supposedly it could also catch compiler bugs in gcc2, but I expect there are few of these at this point.

So, it’s a nice bonus if we can check it, but it comes at the cost of having to use an older or simpler test framework library.

When we have tests consistently structured that way, it will be easy to convert them to a different form in an automated way, if we decide it isn’t that good of an approach.

Yes, this is the second source of tests I considered besides Haiku Book. This is something to do when we have basic structure in place.

Currently it’s only the structure, but when more tests are converted it will be easier to see what else we can do with that.

Generally speaking, I am aware that an OS is a diverse project and one approach will not work in all cases. I’m not treating BDD as a dogma, but as a good starting point and guideline.

I’ve recently fixed a GCC2 build issue that only happened when <strstream> was being used. It would affect any program that included it. The question is how committed we are as a project to maintain support for GCC2 on the scale of “we absolutely cannot have regressions, and if we do, fix them immediately” to “it’s okay to have it not working and provide a fix within X days/weeks/months”.

as a thought on this, I would love UI tests that record BPicture outputs similar to how the webkit testsuite does it, it would make UI differences easier to identify (and we have automatic pictures for the userguide)

GCC2 support - while I’m leaning toward removing GCC2 support from tests, I didn’t make a decision yet.

As long as we plan to ship GCC2 executables, it makes sense to keep the unit tests working for GCC2, too. It’s not just about compiler bugs, but also about programming errors (for instance, as our other platform is 64-bits, not 32-bits, etc.).

I wouldn’t mind if we just leave out “Given“, “When“ and “Then“ from the names, and just do “NullName_ClipboardInit_DefaultToSystemClipboard“ instead to shorten the names; it’s pretty redundant information otherwise.

While I’m not practicing BDD at work (and in fact never heard of it before), I’m halfway there, and write what the test is about to test in the method name. I find that very helpful. I’m not sure the additional information is, though; in the end, you always will need to look at the test code to see how it tested what exactly.

In any case, I think it will be helpful to convert a complete test, to see how that looks in practice.

These have a high maintenance cost. It works for a web browser because they test the layout in a very precise way (if your css says something is 10px wide then it’s 10px wide).

But for anything “larger” in the user interface, if you add a button, suddenly all your reference screenshots are out of date and need to be replaced. If you change how buttons are drawn system-wide, all your screenshots of all apps will be out of date.

The question should be, in this case, what are you testing exactly? And then we can determine if comparing images or BPicture is a good tool for testing it, or if something simper can be done.

That would be great, but not related to unit testing.

It could be related to integration testing, but that is a different thing done with different tools.

1 Like

I would test the output of the rendering or the BPicture, and keep the last state in the repository, that way you can see clearly which places your patch affects and don’t have to guess. Mostly this would be usefull for code revoew I think as an informational tool

Yes, I plan to implement those too. This whole testing initiative is a result of bugs I fixed in app_server recently :slight_smile:
In fact, both for app_server (for testing the protocol) and Interface Kit. If we ever want to replace app_server with something else (I think @X512 mentioned working on new_app_server), or change rendering backend from AGG, these would be great acceptance tests.
I was considering if one of those is redundant, but IK method may modify parameters that are sent to app_server and the behavior will differ.

Do they? As @nephele said, you could just dump the results from the app and replace reference images in the repo with those after checking if this is what you expected.

If you want to see how that works in a larger codebase then Intel’s Compute Runtime (the thing I validate at work although at integration and system level) is an example: compute-runtime/opencl/test/unit_test at master · intel/compute-runtime · GitHub.

If you want to see how I imagine it in our codebase: https://review.haiku-os.org/c/haiku/+/10266

I already made some tool that converts BPicture to JSON/YAML (and back): HaikuUtils/PictureDumpJson2 at master · X547/HaikuUtils · GitHub .

Example output: button.yaml · GitHub .

My question was: what are you testing with these?

If, 90% of the time, the failing test is solved by changing the reference image because the update is in fact correct, I consider this high maintenance cost for the test. A perfect test fails only when something got broken. Maybe we won’t get things perfect, but if a test fails mostly for “false positive” reasons, it’s wasting more time than it saves, and, maybe even worse, it degrades the confidence people have in the tests in general. If you make a changes, and it creates hundred of failures, of which 99 are just “let’s update the picture” and one is a real problem, there are chance the real problem will be missed.

They are also difficult to fit in BDD, because the picture does not explain why ity looks like it does. So you also have to describe the picture with words in your BDD description. And there’s a risk the picture does not match the description. Is it an unclear description? Was the picture incorrectly updated at some point? How do you know?

Reference drawing can work for the low level components (check that BButton looks like a button, with the right colors, etc) and maybe basic drawing commands (lines, shapes, maybe BPicture rendering). It can work for layout APIs as well, but I would say it is already too low level for that (you can instead, for example, check that views are a specific size after setting up a layout).

For testing an entire application, it won’t really work. You are even further from the drawing than a simple layout kit test. Any change to the interface kit (changing the margins of a button, a new freetype antialiasing algorithm, …) would break absolutely all the tests, which is a sign that the tests are not unit tests anymore. They would test all the way down from the application to the app_server rendering. I think nephele had that type of test in mind as he suggested using the pictures to populate the user guide.

So, in the scope of unit tests, drawing tests can be used to validate app_server behavior, and possibly some of interface kit (although this latter part could also be done by checking what commands are sent to app_server). But I don’t think they can be used in a meaningful way for higher levels.

I’m testing that BRect(0, 0, 3, 3) is rendered as 4x4, not 3x3.

This would be entirely separate test suite, I did not consider fitting it in BDD, maybe not even CppUnit.

I agree. I intended to test primitives and maybe individual components that way. Perhaps I misunderstood nephele’s post.

I don’t think the ones I mentioned would be unit tests either, if you test a whole application that only tells you that the result is different, not that a test failed, which is what I would want. A tool that basically tells you which applications are affected by your change or aren’t. That is independant of testing low level gui functionality as mentioned here