Why is the Null Terminator \0 not at Index 0 in C?

:roll_eyes:
When we iterate over a string with any print, putchar, etc function, the string will continue the iteration, at the first byte in memory, when we reach a buffer overflow. So why don´t we simply put a Null terminator at Index 0 to avoid buffer overflow?

C++ fixed that by using length-terminated strings in the standard template library. Null terminated strings are a thing of the past.

NUL terminated strings sure are with us, multitudes of them. But it has been working fine for very likely longer than you’ve been alive, Daydream. Can you rephrase the question?

To help with terminology - the 0 byte that terminates C strings is denoted NUL – N U L. (NULL - with 2 Ls - is a 0 pointer, not really a string or a valid pointer to any object.)

A string with NUL at index 0 is an empty string, “”

The meaning of this is unclear to me.

1 Like

Ah ok, I never used C++, so I didn’t know.

2 Likes

Most of the standard library functions that are at risk of buffer overflow do stop when they encounter a \0.

The problem is that e.g. if you pass e.g. strcpy(dest, src) a src that is larger than dest (i.e. the \0 is at an offset from src that is larger than the space allocated from the start of dest) then it will buffer overflow past the end of dest.

So putting a \0 at the start of src will indeed prevent the overflow occurring but it misses the point. If you do that to every string you pass to strcpy then it will not copy anything.

There are now safe versions of most of all functions at risk of this, e.g. strcpy_s(dest, dest_size, src), which will stop when dest_size bytes have been copied or \0 is encountered in src. This allows you to prevent the overflow occurring.

1 Like

… though I don’t find strcpy_s() on Haiku, so I guess you’re stuck with good old strncpy(), plus assign NUL to dest[dest_size - 1] (which strcpy_s() would reportedly do wrong, adding the NUL terminator at dest[dest_size].

strlcpy() is available too (don’t press me on details, I suck at C/C++/<insert all languages here>).

I should have mentioned strncpy, my mistake

I didn’t understand this. I think it would be easy to modify it in a way, that the strcpy would start at index 1 with the src string.
It’s more a hypothetical question. I would say it’s messed up at creation of the language.

What I also didn’t understand at C is, why the return value of a working programm is the boolean 0 for false and from a system fail a true 1.
I think it would be better the other way around.
The way it is, is counterintuitive for me.

I’m actually very unskilled at coding, but I think it pays off in the long run, to try to understand, what I am doing and not purly focus on the how.

The high level answer to these questions about C is that C works this way because it is a low level language and as such it tries to present an execution model that is close to that of the underlying machine. The idea is that this allows programmers to know fairly accurately how the machine will execute the code. It can also be argued that it allows to write more efficient code, but that is more open to discussion really.

It’s because arrays in C are just variables that store memory addresses, and in the case of strings the first element (at index 0) is at the memory address, the second element (index 1) is at the memory address + 1, etc. Being a low level language, C tries to guarantee that the memory layout of types and variables is known to the programmer up front, and allows some direct manipulation of pointers. In fact array indexing of most simple types on most CPUs will translate 1 to 1 into indexed address modes in the machine instructions. If arrays started from offset 1 it would not match what the machine actually does with pointers and arrays and thus break some ideas that are fundamental to the purpose of the language.

It’s good to ask questions and gain understanding.

You can think about the return value of an app like it is a “was there an error?” marker. If it’s 0 the answer is false (no, there was no error), if it’s non-zero (1 or a different number) then the answer is true (yes, there was an error).

Not only was there an error, but it’s indicated by the number. When there’s only one success outcome, and many error outcomes, the choce between zero and non-zero is clear.

And again, the meaning of truth values can be traced back to how the machine operates: zero for false and any other value for true because processors have instructions to test if something is zero or non zero.

Ok, I think I can understand this explaination. Thanks a lot. So, an Error Message 404 would also result to true, without a comparison operation?

Http status codes are standardised and maintained by IANA, success codes are in the 2xx class where success is 200 (OK). Both the response and the code depend on the request.

See here for a short description:

And here for a more thorough documentation of HTTP semantics:

More often than not, the reason software does a particular thing has to do with the original programmer’s circumstances and the choices made when with respect to a different problem.

It seems almost certain that the folks responsible for C were using the ASCII character encoding and chose to use the NUL char as a space efficient (takes exactly 1 byte) and unambiguous (valid ASCII character) way to indicate the end of the string.

Given that they were working in the domain of system programming, where being careful is a necessity and a given, the possible issues that might occur decades in the future weren’t likely a huge concern.

P.S.

https://cplusplus.com/reference/cstring/strcpy/

char * strcpy ( char * destination, const char * source );

Notice that the above web reference explicitly includes the following note:

To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

This representation was also used in PDP-7, so they didn’t have to invent anything new.