Code hygiene work

Hi there,

I’m a new contributor with work in progress to address some Haiku code hygiene as part of maturing the arm64 port. My current milestones are getting it to build and run inside QEMU aarch64 on an Apple Silicon Mac. Building works, running it in QEMU is still flakey. Toward that end, I’m currently addressing some basic git repo cleanliness:

  1. pre-commit hooks to enforce the following policies:
    • strip trailing whitespaces except on .md files.
    • checking that non-binary executables, e.g. bash scripts start with a proper shebang.
    • check for files that contain merge conflicts.
    • check for bad symlinks
    • normalize EOLs to always be LFs. Not CRLF, not CR.
    • ensure all files end in EOL and only an EOL.
    • optionally check that filenames are distinct, even on case-insensitive filenames. I’m aware the current naming violates that criteria, but it’s a major annoyance and knowing the extent of the problem is useful for future remediation.
  2. add a .gitattributes file to explicitly set file handling.
    • use * text=auto eol=lf, which converts line endings to LF on checkout for anything git decides is a text file.
    • explictly set git’s binary attribute on files with certain extensions, e.g. *.gif, *.jpg, *.rsrc, *.pfb, etc. so git doesn’t do EOL conversions or produce textual diffs for them.
  3. add some settings in .editorconfig, which most text editors use.
    • default character set is utf-8.
    • defalt EOL is lf.
    • insert final newlines.
    • trim trailing whitespace.
    • Makefiles use tabs.
    • markdown files preserve trailing whitespace.

The pre-commit hooks run ad hoc or as part of commits as linting. Trivial to add them to a CI/CD system. Easy enough to also run clang-tidy, ShellCheck, etc.

For C files, the indenting is currently tabs with 4 stops. Is this true for all other file types? Or should the default be 2 spaces instead of tabs? I didn’t see anything in the style guide. I’m tabs vs spaces agnostic so long as it’s consistent. Except for Makefiles, which require tabs.

My argument for consistent LF EOLs is that, even on Windows, modern editors respect it and it’s standard behavior for Linux and MacOS. Same with tools expecting all files end with an empty line. Extraneous invisibles and whitespace at line ends is annoying for several reasons, usually for diffs. Diff clutter, merge conflicts, expecting the ‘End’ key to jump to the end of line. I ran pre-commit to fix the trailing whitespace on a local branch and there was a LOT of it: 3732 files. An .editorconfig is a great preventitive measure.

I’ve also been fixing the easier compiler errors, using C++17 as the minimum required standard. IMHO, it’s not worth the misery of #ifdef’s and macros to support older dialects, especially with features like noexcept as part of the type system, __has_include, standard attributes for unused arguments and parameters instead of __attribute__((__unused__). The warnings I’ve been getting out of the way:

  1. Unused arguments, variables, etc. with the [[maybe_unused]] attribute.
  2. sprintf to snprintf to prevent buffer overflows.
  3. reinterpret_cast to replace C-style casts. That quiets warnings about alignment on aarch64. Also makes it easy to grep for bugs falling out of ARM’s 8-byte stack alignment.

Also found a subtle defect building on MacOS aarch64. Variadic arguments are passed on the stack instead of registers, which causes runtime bugs with mapping open(…) to _haiku_build_open(), the latter having a fixed number of arguments. It’s non-comformant to the C standard to assume that behavior.

Anyway, let me know your thoughts.

1 Like

I’m really confused by this. Why do you want to use hooks to enforce this stuff? We already have gerrit. Commits should be commited as they appear and not do any post processing that isn’t apparent…

We already have a bot that chwcks for code style conformity (like whitespace styles).
But this can’t be done completely automatically.

Consistent end of line makes sense.

C++17 is not something we can use. We compile parts of our codebase with gcc3.

1 Like

Does a zero-length file violate the “end with LF” rule?

What would this use? I only found that this could be using shellcheck, and testing shellcheck with a simple example of “#!bash” already produces a false error :confused:

Fair enough. If I wasn’t wiped out by the flu at the moment, I could have articulated my argument a little better. Hopefully I’ll feel better later and leave you less confused.

It’s probably easier to think of what I’m doing as a long lived branch to work on some legacy cruft and make arm64 more stable. I started hacking on it on my Mac to work on the arm64 stuff. I got annoyed with the compiler warnings out of principle. If it’s warning you of something, it’s generally a good idea to figure out why and fix them.

Confusingly for me, the buildtools repo for the cross target compilers track gcc 13. gcc3 < gcc13. If you run the configure script out of the box as per the docs I’ve read on the site, you’re working on master. It suggests you rebase against master, the arm64 specific docs don’t mention gcc3, so if there’re something I’m missing, please let me know.
Even the haiku-toolchains-ubuntu tracks the latest buildtools. Shouldn’t the default configure script
Again, I followed what I assumed were current docs around arm64. What did I fail to pay attention t If they’re that out of date, a disclaimer would be useful. Insofar as the gcc3 goes, are we talking about legacy binaries? Is this true for the arm builds?

The idea with pre-commit is it runs before commits — thus “pre”. You make a local a commit, it runs some preflight checks and if they exit with a non-zero error code, it’ll reject your commit with a list of things you need to fix. Running locally gives you far faster feedback, shifting things left. It prevents things like trailing white space inadvertently ending up in your commits before it even gets pushed. It can also run as a quality gate on the pipeline side to do the same thing.

I don’t know much about the bot running on Gerrit, but if it’s not rejecting mixed like endings, trailing white space, etc., that’s a missed requirement. It lets misconfigured editors introduce noisy diffs. I don’t think it’s running tools like ShellCheck. I don’t think it’s linting YAML, JSON, etc. There’s adequate coverage on the C sources, but the bash scripts, etc. are just as important. Which is why a style guide should include those things.

I think we’re coming at it from two different directions and I’d love to reach a common understanding. I am not a chaos monkey looking to make everyone miserable. :slight_smile:

I sincerely appreciate your thoughtful feedback. It was great you drew my attention to the gcc3, which I’m apparently ignorant of. I’m sure there’s other stuff I’ve missed.

My daughter has epilepsy, it’s gotten much worse lately, and I’m looking for something cool to take my mind off some of it. If I come off as disagreeable, it’s my preoccupation with that. Hard to watch your kid go through that and feel powerless.

Thanks,
Chris

2 Likes

Just to clear up this one issue: we are still binary compatible to BeOS R5 on x86-32. This means that large parts of our codebase (userland code only) needs to be able to be compiled using gcc 2.95. This does also affect the ARM port for the sources that are used on both platforms. The kernel only parts do not need to be compiled with older compilers anymore, though. But only those can be cleaned up.

Ah, that makes more sense. I assumed pre-commit was reffering to the master branch, as in when pushing directly (which developers can do, but is discouraged) or when using submit on gerrit that this would then change stuff in the commit.

The way you explained makes a lot more sense to me, though stashing something in a commit locally while it is not finished is also an important consideration.

For stuff like whitespace changes the problem is that this indeed makes diffs harder to read, but at the same time for the same reason, I prefer whitespace fixes over large areas like a whole file to be in seperate commits, so functional changes are seperate.

We have had a similar discussion about the tool to automatically follow the code guideline rejecting code, but this also does not handle all edge cases, and in that case we decided to keep this as a bot on gerrit and allow you to use it locally, but not make it a requirement that the output of haiku format is used, additionally it will format stuff you didn’t even touch in the file (or on gerrit) which is annoying to follow.

Axeld is correct and it is gcc2 not gcc3, anyhow the gist is that our 32bit intel version is binary compatible to BeOS, since function mangling changed in gcc3 the parts that are required to have the BeOS mangling (on that specific arch) are still compiled with gcc2, that does not mean you have to use gcc2 on arm64. You can use modern versions of gcc or clang as you wish, just some parts need to be compileable with gcc2 aswell. (though I suspect some code is needlessly compiled with it, but that has to be investigated case by case)

Having more linting (on gerrit specifically) Is something I am very interested in, in particular doxygen and sphinx documentation sources could use this. Having this locally as a first step would be interesting. I am not sure how good it is to reject uploading a commit however.
It could be a good idea to make some stuff we know works correctly a hard requirement on gerrit for submitting, but not for uploading. (especially if a new dev does not know how to fix certain issues and needs help)

As for misconfigured editors, it depends a bit. People developing on Haiku will probably use Haiku native editors first, and atleast for Koder (the one I use), I have disabled “trim whitespace on save” because it is not git aware, and strips it from all lines, and not only ones I touched. and thus makes the diffs noisier.

EDIT:
Regardless, I think your number 3. Is a good idea in any case. You can submit a change request on gerrit, and we can talk about the specifis there.

As for C Source files, we only have code guidelines for our C++ stuff in general. Though if tab guidance is wanted, I assume the same is true for anything else, unless the language has any specifics preventing it.