AI: The largest socialist wealth transfer of the past 50 years
A few months back, Elon Musk, the right-wing owner of Twitter and Grok, his pet Generative AI project, posted something I wrote on his Twitter feed, with the caption “This is the quality of humor we want from Grok.”
He even had it pinned to his profile for a short while.
I wrote this over on Quora in March of 2024. On the one hand, it’s interesting to know that Elon Musk reads my stuff. On the other, do you notice anything funny about the screenshot of his Tweet?
Yup, no credit.
The Tweet went viral, and has since been posted all over Facebook, Tumblr, Twitter, Reddit, and TikTok…all without attribution.
Right now, as I write this, OpenAI, the company behind ChatGPT, has a market cap of $157,000,000,000, making it more valuable than companies like AT&T, Lowe’s, and Siemens.
It is not a profitable company; in fact, it’s burning cash at a prodigious rate. Unlike other companies, though, which burned cash early on to achieve economies of scale, OpenAI’s costs scale directly with size, which is not at all normal for tech companies. At its current rate of growth, in four years its datacenters will consume more electricity than some entire nations.
But I’m not here to talk about whether AI is the next Apple or the next Pets dot com. Instead, let’s talk about what generative AI is, and how it represents the greatest wealth transfer of the last fifty years.
AI is not intelligent. Generative AI does not know anything. Many people imagine that it’s a huge database of all the world’s facts, and when you ask ChatGPT something, it looks up the answer in that immense library of knowledge.
No.
Generative AI is actually more like an immense, staggeringly complex autocomplete. It ingests trillions of words, and it learns “when you see these words, the most likely next words are those words.” It doesn’t understand anything; in a very real sense, it doesn’t even “understand” what words are.
As the people over at MarkTechPost discovered, many LLM models struggle to answer basic arithmetic questions.
AIs make shit up. They have no knowledge and understand nothing; when presented with text input, they produce text output that follows the basic pattern of the input plus all the text they’ve seen before. That’s it. They will cheerfully produce output that looks plausible but is absolutely wrong — and the more sophisticated they are, the more likely they are to produce incorrect output.
If you want to understand Generative AI, you must, you absolutely must understand that it is not programmed with knowledge or facts. It takes in staggering quantities of text from all over and then it “learns” that these words are correlated with those words, so when it sees these words, it should spit out something that looks like those words.
It doesn’t produce information, it produces information-shaped spaces.
To produce those information-shaped spaces, it must be trained on absolutely staggering quantities of words. Hundreds of billions at least; trillions, preferably. This is another absolutely key thing to understand: the software itself is simple and pretty much valueless. Only the training gives it value. You can download the software for free.
So where does this training data come from?
You guessed it: the Internet.
OpenAI and the other AI companies sucked in trillions of words from hundreds of millions of sites. If you’ve ever posted anything on the Internet — an Amazon review, a blog, a Reddit post, anything — what you wrote was used to train AI.
AI companies are worth hundreds of billions of dollars. All that worth, every single penny of it, comes from unpaid work by people who provided content to the AI companies without their knowledge or consent and without compensation.
This is probably the single largest wealth transfer in modern history, and it went up, not down.
There are a few dirty secrets lurking within the data centers of AI companies. One is the staggering energy requirements. Training ChatGPT 4 required 7.2 gigawatt-hours of electricity, which is about the same amount that 6,307,200 homes use in an entire year. (I laugh at conservatives who whine “eLeCtRiC cArS aRe TeRrIbLe WhErE wIlL aLl ThE eLeCtRiCiTy CoMe FrOm” while fellating Elon Musk over how awesome AI is. Training ChatGPT 4 required enough power to charge a Tesla 144,000 times. Each single ChatGPT query consumes a measurable amount of power — about 2.9 watt-hours of electricity.
Image: Jason Mavrommatis
All the large LLMs were trained on copyrighted data, in violation of copyright. Every now and then they spit out recognizable chunks of the copyrighted data they were trained on; pieces of New York Times articles, Web essays, Reddit posts. OpenAI has, last time I checked, something like 47 major and hundreds of smaller copyright lawsuits pending against it, all of which it is fighting. (It might be more by now; there are so many it’s hard to keep up.)
That, I think, is the defining computer science ethical problem of our time: To what extent is it okay to build value and make money from other people’s work without their knowledge or consent?
Elon Musk recognizes the value in what I write. He recognizes that it has both artistic and financial value. He posts my content as an aspirational goal. He doesn’t credit me, even as he praises my work.
That’s a problem.
Those who create things of value are rarely recognized for the value they create, if the things they create can’t immediately be liquidated for cash. That’s not new. What’s new is the scale to which other people’s creativity is commoditized and turned into wealth by those who had nothing whatsoever to do with the work, and are merely profiting from the labor of others without consent.
“Because copyright today covers virtually every sort of human expression — including blogposts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials. […]
Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”
It also claims their use of other people’s work is “fair use,” even while they admit that chatbots sometimes spit out verbatim chunks of recognizable work. This is a highly dubious claim — while fair use doesn’t have a precise legal definition (the doctrine of fair use exists as an affirmative defense in court to charges of copyright infringement), one of the key components of fair use has always been commercialization of other people’s work…and with a market cap of $157,000,000,000, it’s pretty tough to argue that OpenAI is not commercializing other people’s work. It charges $20/month for full access to ChatGPT.
So at the end of the day, what we have is this: a company founded by people who are neither writers nor artists, producing hundreds of billions of dollars of wealth from the uncompensated, copyrighted work of writers and artists whilst cheerfully admitting that could not produce any value if they had to pay for their training data.
And it’s not just copyrighted data.
OpenAI Dall-e cheerfully spit this image out when I typed “Scrooge McDuck stealing money from starving artist.”
Here’s the thing:
Scrooge McDuck is trademarked. Trademark law is not the same as copyright law. Trademarks are more like patents than copyrights; in the US, trademarks are administered by the Patent and Trademark Office, not the copyright office.
In no way, shape, or form is this “fair use.”
Generative AI recognizes trademarked characters. You can ask it for renderings of Godzilla or Mickey Mouse or Spider-Man or Scrooge McDuck and it’ll cheerfully spit them out. The fact that Dall-e recognizes Scrooge and Spider-man and Godzilla demonstrates without a shadow of a doubt it was trained on trademarked properties.
So far, all the lawsuits aimed at AI infringement have been directed at the companies making AI models, but there’s no reason it has to be that way. You “write” a book with AI or you create a cover for your self-published work with AI and it turns out there’s a trademark or copyright violation in it? You can be sued. That hasn’t happened yet, but it will.
(Side note: The books I publish use covers commissioned from actual artists. Morally, ethically, and legally, this is the right thing to do.)
Why do I call OpenAI and its kin a socialist wealth transfer? Because they treat products of value as a community property. Karl Marx argued that socialism is the transition between capitalism and communism, a system where nothing is privately owned and everything belongs to the public, and that’s exactly how OpenAI and its kin see creative works: owned by nobody, belonging to the public, free to use. It’s just that “free to use” means “a vehicle for concentrating wealth.”
From creators according to their ability, to OpenAI according to its greed.
It seems to me that what we need as a society is a long, serious conversation about what it means to create value, and who should share in that value. It also seems to me this is exactly the conversation the United States is fundamentally incapable of having.