Native apps

No, it isn’t. The difference is that you have now made a hostile contribution, just the same as if you deliberately added gpl code or willingly added a backdoor. Those are grounds to remove people from the project, not to change the policies.

EDIT: actuall I am confused by your interpretation of the “new” rule? You seem to be arguing against a clause banning all AI, which doesn’t exist, the only thing there is is a clause which notes that LLM output of most tools is not okay because of copyright; if your LLM proveably does not have this problem there is nothing in the guidelines to reject it as such…

Microsoft trained its AI on the whole github.

It is not yet clear how the legality of AI content generation will end here… If, for example, you learn to program by studying someone else’s code, and then write your own code, this is not plagiarism. Just as people learn to speak, they learn from others, after all, no one says that they speak in someone else’s words. It will probably be recognized that AI, when generating content, does not violate any copyrights… Just a thought.
…So if you don’t want any information to be used to train AI, don’t make it public. :no_mouth:

1 Like

Sometimes the AI “generates” an exact copy of an existing piece of code, removing any mention of the original author and license. This happens with very well known pieces of code that have been quoted many times and are likely present several times in the training data.

It doesn’t matter if you used an LLM or not, when submitting code to Haiku, it is your responsibility to make sure that you own the copyright, or that you mention in sourcefile headers who owns the copyright, and what license they used that makes the code usable for Haiku.

Using an LLM is not currently banned, but it makes this checking job much harder, since the code may be generated (in that case it may or may not be ok, we’ll have to wait for someone to bring such a case in court to see), or may be a verbatim copy of some training data (in which case there’s no arguing, the code is copyrighted). And fixing these problems when they are noticed later tends to be difficult, as the affected part of the code needs to be rewritten, which in turn requires untangling it from later contributions in the same file.

So there is a simple rule: if you can’t trace for sure who owns the copyright, don’t put it in Haiku. If you do it anyway, you are wasting your and everyone else’s time. It doesn’t matter if you ignore licenses by using an LLM trained on non-compatible data, by copying the code directly (eg from Linux), or for any other reason.

Now if you do it in your own projects, itys your code, your choice of license, your problem. I see no reason to dnsallow that. We can complain about the waste of electricity, water for cooling the datacenters, and co2 emissions, that will eventually damage the planet, and I personally think that goes very much against the minimal approach of Haiku. But that is another entirely different debate and no reason for banning it.

5 Likes

Github has been discovered by Computing 101 professors: it now has 270 000 “hello world” repositories on it. I’m not kidding.

And snake games, and tetris, and hangman, all proudly produced by raw first-year students, thousands and thousands of them. Sometimes they even paste the course requirements into the README.

If I was an AI being trained on that, I’d probably commit suicide.

9 Likes