In the tech industry, we are parasites after all. As a Drupal Creator Doris Buitart said: A few years ago, we were all more ‘takers’ than ‘makers’. Buytaert was referring to common practices in the open source community. He says that “Takers don’t make meaningful contributions to the open source projects they tap into,” which harms projects they rely on. Even the most dedicated open source contributors need more than contributions.
This same parasitic trend is happening with Google, Facebook and Twitter, each of which relies on other people’s content, and perhaps even more so with today’s generative AI (GenAI). Sourcegraph by Steve Yegge It dramatically proclaims that “LLM is not just the biggest change since social, mobile and cloud, but the biggest change since the World Wide Web,” and perhaps they’re right. However, these Large Language Models (LLMs) are inherently parasitic, relying on scraping other people’s code repositories (GitHub), technology answers (Stack Overflow), literature, etc.
Similar to what is happening with open source, content creators and aggregators are beginning to block LLM access to their content.in light of Decrease in site trafficStack Overflow, for example, joined the much-requested Reddit. LLM creators pay for usage rights Data for training LLM More by Wired. It’s a bold move, reminiscent of open-source licensing wars and paywalls imposed by publishers to keep Google and Facebook out. But will it work?
I’m pretty sure the technology parasite has a longer history than open source, but that’s where my career started, so I’ll start there. Since the early days of Linux and MySQL, there have been companies founded to profit from the contributions of others. For example, in modern Linux, Rocky Linux and Alma Linux both promise “bug for bug compatibility” with Red Hat Enterprise Linux (RHEL), but they have contributed nothing to Red Hat’s success. Is not … In fact, the natural consequence of the success of these two his RHEL clones is to eliminate the host and bring about its own demise. That’s why someone in the Linux field called them open source “garbage bags.”
It may be too flashy, but I know what they mean. This is the same criticism that was once made against AWS (“strip-mining” criticism against AWS). lose relevance by the day) and sparked many closed source license changes, business model distortions, and endless debates about open source sustainability.
Of course, open source is more powerful than ever. However, individual open source projects have varying degrees of health. Some projects (and project admins) have found ways to manage “takers” within their communities. others are not. But the trend is that open source continues to grow in importance and power.
draining the well
Now let’s talk about LLM. Big companies like JP Morgan Chase spend billions of dollars We employ over 1,000 data scientists, machine learning engineers, and more to drive billion-dollar impact in personalization, analytics, and more. While many companies are wary of publicly adopting something like ChatGPT, the reality is that developers are in a tough spot. We are already using LLM to drive productivity improvements.
The cost of those gains is only now becoming clear. That is the cost of companies like Stack Overflow, which have traditionally been a source of productivity gains.
For example, as detailed by Similarweb, traffic to Stack Overflow traffic has declined by an average of 6% each month since January 2022, and dropped sharply to 13.9% in March 2023. It might be an oversimplification to blame ChatGPT and other GenAI-driven tools for this decline, but it’s also simplistic to think they aren’t involved.
just Ask Peter Nixie, founder of Intentional.io and a top 2% user on Stack Overflow, whose answers reach over 1.7 million developers. Despite being a prominent figure on Stack Overflow, Nixie said, “It’s unlikely I’ll ever write anything there again.” why? LLMs like ChatGPT threaten to exhaust the knowledge pool on Stack Overflow.
“What if we stopped pooling knowledge with each other and instead poured it directly into The Machine?” Nixie asks. “The Machine” refers to her GenAI tools such as ChatGPT. For example, it’s great to get answers from AI tools like her Copilot on her GitHub, trained on GitHub repositories, her Stack Overflow Q&A, etc. But these questions are asked privately, and unlike Stack Overflow, you don’t get a public repository of information. “So GPT4 had been trained on all the questions posed to her by 2021, [on Stack Overflow,] What does GPT6 train for?” he asks.
One way information highway
See the problem? It’s not easy, and it could be more serious than what we bargained for in open source land. “If this pattern is replicated elsewhere, and the direction of our collective knowledge changes, from the outside to the human and from the inside to the machine, then we will have to deal with all our previous dependencies on machines. You rely on that pattern in turn,” he suggests. This is a problem, to say the least. “AI will simply grow to become the dominant knowledge source, similar to the rapidly growing COVID-19 variant,” he stresses. “Taking the example of StackOverflow, the pool of human knowledge that was once ours could be reduced to mere weights within Transformers.”
There are many things at risk, but not only large amount of cash It continues to flow into the AI. We also need to assess the relative value of information generated by ChatGPT and others.For example, Stack Overflow Banned answers from ChatGPT In December 2022, with too much text and too little information, it looks like this: correct If the answer from ChatGPT is too low, the answer posted by ChatGPT will be materially harmful To the site and to users looking for the right answer [emphasis in original]”Things like ChatGPT are not designed to give correct information, just probabilistic information Match patterns in the data. In other words, open source can be filled with “dirt bags,” but without a steady stream of good training data, LLMs can simply replenish with garbage information and become useless.
In general, I’m not downplaying the promise of LLM and GenAI. As with open source, news publishers, and others, we are grateful to OpenAI and other companies for helping us harness the collectively-created information of Reddit (itself a collection of individual posts). We support contributors like expect to pay for the parts they play. Open source has had a licensing war, and it looks like something similar is about to happen in the GenAI world, but with a bigger impact.
Copyright © 2023 IDG Communications Inc.
https://www.infoworld.com/article/3697733/chatgpt-s-parasitic-machine.html#tk.rss_all ChatGPT’s Parasitic Machine | Infoworld