Discussion about this post

User's avatar
Alex Tolley's avatar

While very whizzy, and no doubt will get much better with time, it is important to remember that LLM-based AI does not understand computer languages and programming, but essentially extracts what has been produced in the past. These AIs do not understand the books on programming, nor the code they train on in GitHub, Stack Exchange, and numerous books, often obtained illegally. LLM AIs "stand on teh shoulders of others", but do not acknowledge them, nor pay them for their works.

In the US, copyrights have been extended for years. Copyright even prevented derivative works. Woe betide anyone making even a crude Mickey Mouse. How close to an original work is hotly determined in the courts. Even using some chords from a song can result in infringement. Academic authors who plagiarise even a line from another without attribution can feel the weight of academic dismissal.

Intellectual Property (IP) was so important to US corporations that it was used as a bludgeon in trade agreements. An employee moving from one tech company to another might not be able to fully use the knowledge gained inside another company.

And yet, somehow, this is all OK when it comes to behemoth technology companies that suck up all the content they can to train their models. So extensive has this been that concerns were raised about the lack of new training content and the need to fabricate it from prior works.

If LLMs were trained on freely available open source software or compensated creators of software by some model like Spotify's for playing musicians' songs, or streamers' of content, then that might be acceptable for commercial use. But this is not the case, as copyright lawsuits are still in the courts, and the AI corporations are squealing that they shouldn't have to pay for their use of content for training or usage in AI coding output, where the code source can be identified.

LLM-generated code is likely to be very vulnerable to malware from poisoned libraries to prompt injection. As code is increasingly created by LLMs, this problem will likely be increasingly visible. Whether LLM AIs will also play defence and spot vulnerable attack surfaces and injected malware remains to be seen. I hope the tools will eventually do this routinely, preventing widespread harm.

The bottom line is that LLMs exist by ignoring their theft of IP, the very IP that corporations insisted was important for their success and must not be infringed. Yes, it is impressive that mindless machines can quickly create code without understanding it, just making probabilistic copies of existing code, steered by prompts. It works so well with code because code is so structured. IDEs can easily detect code errors, something that cannot be done with creative writing much beyond spelling and grammar checking. It remains to be seen if our civilizational code base remains robust or whether it will result in a slow decline in the quality of our systems.

Expand full comment
Shreeneet Rathi's avatar

Interesting it is how coding is evolving, I still find, 'make an entire app with prompts' a bit too misleading, but make your app great with multiple thoughtful prompts & MCP connected knowledge sources lot more realistic & a great value addition for coding ecosystem!

We have been improving our systems at multiple ends leveraging LLM based deep reviews, & I still am surprised how quick it was to bring about these improvements into production the moment we realized the gaps existed!

Expand full comment

No posts