As another author, every time I encounter this I'm incensed. Oh, it would take to long to get legal access? It would be too expensive? You're a multi-million-dollar company. Suck it up and do it legally. Just because you're big doesn't mean you get a free pass to be break the rules.
And every time I see an AI-generated book competing against mine out there, I gnash my teeth. Why get a low-quality rehash when you could buy the original?
Yeah, it's incredibly frustrating to see them just decide they get to do whatever they want because it's so *vitally important* (to them). Guess what, guys? Building worthwhile things takes time if you want to actually do a good job!
Don't sweat those AI-generated books. I know it feels like you're competing with them, but you're on a whole different level than them. What with being human and all. <3
People who think an actual writer would write a book using AI, or people who think about writing their first book using AI... I believe, all of them just have this incredibly wrong notion that it's IDEAS that matter, and not the execution. That the ideas are the art, not the actual doing (painting, writing, you name it).
"I have so many good ideas, but I don't really know how to write/paint that well..."
I am so bemused by such statements.
Mate, the ideas were never the problem. The magic lies in the process, in the doing.
Trying to merge your crafting abilities with that image/feeling in your head - that's what being an artist is all about (or, you know, the commonly shared anguish about this gap being unbridgeable...)
Yeah. I generally think if those people -- the "I have a great idea!" people -- wanted to write something, they would. They'd stop talking about it and actually do it. But mostly, they don't. And that's okay! It's not for everyone. I'm glad people want to, like, do my taxes in trade for money.
Thank you so much for this post, Jodi. I was horrified to see my debut was a part of this. I really needed to read such a cool-headed post. It's outrageous that Meta can even be allowed to get away with this (and hopefully, they won't in the end). You're right that we have to fight against it in any way we can.
Ah man, it's so hard to see the debuts in there, especially! The thrill of those, the once-in-a-lifetime moment of publishing your debut novel . . . I'm really sorry this is happening, Rachel. And I don't think they'll be allowed to get away with it -- not after everything. It'll just take time to see how everything shakes out. And in the meantime, you keep writing. :)
Goodness this ticks me off so much! I don't see any of mine in there so far, thankfully, but I see so many from friends that it makes me heartsick. As both an artist and an author I hate genAI as much.
It's actually really important to fight the good fight here, because language is always changing, and therefore a LLM isn't just trained once but is constantly being trained, and requires continuous human-made input. It's never too late to fight it, because a LLM is always out of date. https://www.scientificamerican.com/article/ai-generated-data-can-poison-future-ai-models/
I'll read the article after I've had more coffee, but yeah, dead internet theory is real, especially on sites like Facebook, where bots are interacting with bots, praising shrimp Jesus. It's wild to see. These companies desperately want our work -- they can't function without it!
It's honestly baffling, though. Like . . . if it can't be done without this amount of human work and creativity, maybe it's best left to humans?
Pretty much, yes. "Artificial intelligence" aka "machine learning" has a lot of great uses. For example, you could feed it millions of mammograms, tell it which women ended up having cancer, and have it learn to spot the cancer long before a human could see it. That's totally possible today and wouldn't even eliminate that many jobs, because you'd still need a radiologist to perform the mammograms and confirm the diagnoses.
But that's not as sexy as LLMs. We've been fascinated by chatbots since like the '90s, when they were far less sophisticated (but also, you know, didn't require an entire nuclear power plant to run them). LLMs are also less useful, ALONG with being highly unethical in their creation, but they make people buy stock. Apparently. :-\
There are now tools like โTessโ, which at least claim to be opt-in and to pay artists (https://www.tess.design/) so clearly itโs possible to create a legal framework for genAI tools which benefit writers. Big companies like Meta have no excuse.
Especially given how big FB ads are; โany works you let us use get discounted advertising on all our platformsโ would be a serious incentive for a lot of people, surely? They could absolutely have made this work for everyone involved. But theyโd rather steal and hoard wealth.
I don't know that not using "AI" is an effective step, or even possible. We've stepped on copyright so frivolously to this point I think we hardly notice anymore.
For me it was Pinterest. A vast array of images, photographs, art with no attribution, no link back to the creator. Use images however, wherever you want with no regard for copyright. I honestly sent dozens of message reporting pins, "No link to the creator." "Some random person is using this image to promote their business or their..." The response was honestly along the lines of, "We don't really care about the creator or copyright." Everyone loved Pinterest and didn't think about how those images came into existence.
Now we've integrated different versions of AI so quickly into so many services, I don't know that we can stop using it (without becoming hobbits).
- I don't know a lot about the technology behind LLM but logically, predictive text probably leverages an LLM. I honestly don't know if that's fed from the browser or the website or if there's a central api they all tap into but almost every website I go to suggests what I should type next.
- I can (and do) turn predictive text off in Word because it's annoying. But Microsoft profits aren't more or less because I'm not using predictive test.
- I also use DuckDuckGo and it's started to give me predictive answers to search questions.
- Canva offers MagicWrite that offers to "help write copy and brainstorm ideas."
- Since Meta is the one stealing work to support their LLM, I imagine that's filtering into Facebook in some way.
- Grammarly, I would imagine though haven't researched, could have had the rules of grammar programmed in. But we don't know if it was trained solely on work people submitted to the site for review or from other sources. And did the people who submitted text for Grammarly to review know that their words would be used to train service? (I imagine there was a footnote somewhere.)
- Substack has AI to allow me to generate an image for my post.
Not all versions of AI are an LLM but it seems everyone wants an algorithm to automate some part of a process for them.
I would very much like to see the DOJ crack down on pirate sites, period. Copyright work should be protected.
I would also like to see copyright enforced in regard to training any AI model. I think big companies would still try to get around it. But one judgement that AI models have to respect copyright sets a precedent that makes it easier for artists to pursue justice in protecting their work.
And I'd actually like to see a company create AI that respects copyright (I did see Tess in the below comments). I think we'd be surprised how many people would be willing to allow their work to train AI if they're fairly compensated (and who knows what "fair" is in the wild west).
The difference with something like Grammarly or an algorithm is that it's programmed with the rules of grammar or whatever. Predictive text for search only uses other terms that people have searched for. A LLM is programmed to "learn" and then fed so much text that it can get statistically good at stuff, but the programmers didn't create the rules.
Correct. These are very different technologies. On Threads, I'm seeing a lot of people call spellcheck AI, and that's simply not true. Spellcheck runs from a dictionary stored on the computer and any time a word pops up that doesn't match something in the dictionary, it flags it. Same for basic grammar. These technologies have been around for decades.
Agreed. Search engines use terms other people have searched for. But predictive text in the body of an email or a textbox (the way substack is suggesting words to complete my sentences even as I type this) is logically probably an LLM of some sort. Since LLMs are statistic rather than semantic it's looking at the characters I've typed thus far and (as long as there aren't that many typos ๐) predicting the characters that come next which it would seem to have probably pulled from some language dataset somewhere.
I don't know a lot about the real technical parts of the technology. If it's not an LLM I'd be curious to understand more how predictive text models were built.
Again, I'm not saying at all that any of the AI tools should be built off the work of others without credit or compensation. 100% no. I just think it's easy to say "AI is bad" then hit tab on the next predictive text suggestion or say, "Hey Alexa..." Acknowledging that AI in different forms has permeated almost every avenue of technology doesn't mean the road to how we got here is acceptable.
originally autocorrect was more about comparing to an existing dictionary. Predictive text in digital keyboards might be using more AI now. And yes, assuming we can find an ethically trained model (e.g., if the keyboard is using *your* words you type) then predictive text is really nice--I *do* tend to text the same things over and over. ๐ But I still wouldn't want an AI trained on my words to write a book as me, because by definition a LLM can never write something better than what it's been trained on (and arguably writes to the average): and I want to improve my writing. :)
Yeah, copyright and intellectual property is under-protected, at least when it comes to creatives like authors and artists. People want access to art, but they don't want to pay for it, or bother with the legal means of getting it for free (libraries). It's frustrating, and it seems like something more people need to be educated about.
And sometimes, it's something people only care about once they've been affected. There's been . . . a number of authors who were using AI-generated videos and images for their book promotion, and now they're seeing themselves in the LibGen dataset. I truly hope that, now that they understand how it feels, they will learn to respect others' copyright and IP, too.
There are some things going around for companies to legally acquire work to train their AI, but none of the deals are good enough for authors that I'd, personally, sign them yet. It is a step in the right direction though.
I found two of my books in the dataset today. I'm not happy about it. But I wasn't (knowingly) using generative AI for anything, anyway. (I've seen AI images slowly creep onto stock photo sites, even though they're not supposed to be there. I've resorted to avoiding any stock image that was uploaded after 2022, just to be safe.)
Yeah, it's definitely harder to avoid those things. I've had the same trouble when looking at stock photo sites, or -- my gosh -- Pinterest. It's terrible.
So far, only 1 of my books is on the list but that is 1 too many. Book pirating is a big issue but with the internet and ebooks, how can you actually stop it? But a big company using pirated content is just so frustrating, they could buy what they want but they just don't want to spend the money. And even if they bought my books legally, I wouldn't want them using them to train anything.
There's not much you and I can do to stop piracy right now. When I was a debut, I paid for a service that would send takedown notices to pirate sites, but it was so expensive and it was *constant*. Eventually, I had to give it up. (Partly because they changed how it worked and I *really* couldn't afford it.)
If you're published with a big house, they probably have a legal department that will send takedown notices, though it's overwhelming for them, too. But off the top of my head, HarperCollins has a form to do it on their website, so if you have any books with them and you know the link for the illegal download, you can pop that in there.
Otherwise . . . we just have to try not to think about it too hard.
As far as the AI training . . . I know. And reading through the articles, it certainly seems like at least some of the people involved knew it wasn't right, either. But that's why agents and author groups are working on it right now. That's why there's a class-action going. When it's our turn to show up to support them, we'll do that! :)
For now, yeah, just screenshot and keep writing your books! I have a folder on my computer called "books meta stole," just for the sassiness of it.
But here's the thing -- you don't have to do anything else right now. Tons of people are doing it for you! My agent has been working on AI and copyright stuff for months. Her work will help all authors. The AALA and Authors Guild are doing this, too.
You know how authors are always like, "I just want to write my books and let someone else handle the hard, boring stuff!" That's what's happening right now. :)
Asking a writer if they'd use an LLM to write their book is like asking a pro athlete if they'd use a robot to play their position for them. I get that most people find writing to be an onerous task, and even for professional writers it's hard work, but so is becoming/being good at anything. It says a lot about our grifter culture that people don't even spare a thought for 1) what it is that constitutes the fruits of your labor (i.e., the books themselves, as opposed to the attention and money that might come from them), or 2) how you go about producing them. I can imagine that a person asking that question (hopefully not your tax guy) is just looking for confirmation that everyone else is as dead inside as they are.
The art community is feeling things about AI also. Real art has to be stolen for AI art to exist. And artists who create with their hands are not happy about AI art being slapped on a canvas and sold.
ALL THIS. Someone sent me the screenshot of all my books that were used. Sigh.
Screw 'em. I'm still gonna write.
YEP.
As another author, every time I encounter this I'm incensed. Oh, it would take to long to get legal access? It would be too expensive? You're a multi-million-dollar company. Suck it up and do it legally. Just because you're big doesn't mean you get a free pass to be break the rules.
And every time I see an AI-generated book competing against mine out there, I gnash my teeth. Why get a low-quality rehash when you could buy the original?
Bah.
Yeah, it's incredibly frustrating to see them just decide they get to do whatever they want because it's so *vitally important* (to them). Guess what, guys? Building worthwhile things takes time if you want to actually do a good job!
Don't sweat those AI-generated books. I know it feels like you're competing with them, but you're on a whole different level than them. What with being human and all. <3
yeah if you can't afford to do it, then maaaaaybe you, I don't know, just shouldn't do it?
People who think an actual writer would write a book using AI, or people who think about writing their first book using AI... I believe, all of them just have this incredibly wrong notion that it's IDEAS that matter, and not the execution. That the ideas are the art, not the actual doing (painting, writing, you name it).
"I have so many good ideas, but I don't really know how to write/paint that well..."
I am so bemused by such statements.
Mate, the ideas were never the problem. The magic lies in the process, in the doing.
Trying to merge your crafting abilities with that image/feeling in your head - that's what being an artist is all about (or, you know, the commonly shared anguish about this gap being unbridgeable...)
Yeah. I generally think if those people -- the "I have a great idea!" people -- wanted to write something, they would. They'd stop talking about it and actually do it. But mostly, they don't. And that's okay! It's not for everyone. I'm glad people want to, like, do my taxes in trade for money.
But like you said, the magic lies in the process.
What's my work going to teach AI? How to be a snot? : )
Quite possibly. :)
Thank you so much for this post, Jodi. I was horrified to see my debut was a part of this. I really needed to read such a cool-headed post. It's outrageous that Meta can even be allowed to get away with this (and hopefully, they won't in the end). You're right that we have to fight against it in any way we can.
Ah man, it's so hard to see the debuts in there, especially! The thrill of those, the once-in-a-lifetime moment of publishing your debut novel . . . I'm really sorry this is happening, Rachel. And I don't think they'll be allowed to get away with it -- not after everything. It'll just take time to see how everything shakes out. And in the meantime, you keep writing. :)
Goodness this ticks me off so much! I don't see any of mine in there so far, thankfully, but I see so many from friends that it makes me heartsick. As both an artist and an author I hate genAI as much.
Yeah, it definitely stinks! The best thing we can do, though, is keep writing. Creating is resistance! <3
Absolutely. GenAI is soulless, it can never fully capture what we do, but only spit out a poor facsimile. Letting it run us all off is not the answer.
It's actually really important to fight the good fight here, because language is always changing, and therefore a LLM isn't just trained once but is constantly being trained, and requires continuous human-made input. It's never too late to fight it, because a LLM is always out of date. https://www.scientificamerican.com/article/ai-generated-data-can-poison-future-ai-models/
(scroll down to example 1 for an illustration: https://www.nature.com/articles/s41586-024-07566-y#Sec6)
which was meant to be encouraging! That means it's not too late to stand up for logic.
I'll read the article after I've had more coffee, but yeah, dead internet theory is real, especially on sites like Facebook, where bots are interacting with bots, praising shrimp Jesus. It's wild to see. These companies desperately want our work -- they can't function without it!
It's honestly baffling, though. Like . . . if it can't be done without this amount of human work and creativity, maybe it's best left to humans?
Pretty much, yes. "Artificial intelligence" aka "machine learning" has a lot of great uses. For example, you could feed it millions of mammograms, tell it which women ended up having cancer, and have it learn to spot the cancer long before a human could see it. That's totally possible today and wouldn't even eliminate that many jobs, because you'd still need a radiologist to perform the mammograms and confirm the diagnoses.
But that's not as sexy as LLMs. We've been fascinated by chatbots since like the '90s, when they were far less sophisticated (but also, you know, didn't require an entire nuclear power plant to run them). LLMs are also less useful, ALONG with being highly unethical in their creation, but they make people buy stock. Apparently. :-\
This was a fantastic post. Thanks for writing all this out.
There are now tools like โTessโ, which at least claim to be opt-in and to pay artists (https://www.tess.design/) so clearly itโs possible to create a legal framework for genAI tools which benefit writers. Big companies like Meta have no excuse.
Especially given how big FB ads are; โany works you let us use get discounted advertising on all our platformsโ would be a serious incentive for a lot of people, surely? They could absolutely have made this work for everyone involved. But theyโd rather steal and hoard wealth.
100% copyright work should be protected.
I don't know that not using "AI" is an effective step, or even possible. We've stepped on copyright so frivolously to this point I think we hardly notice anymore.
For me it was Pinterest. A vast array of images, photographs, art with no attribution, no link back to the creator. Use images however, wherever you want with no regard for copyright. I honestly sent dozens of message reporting pins, "No link to the creator." "Some random person is using this image to promote their business or their..." The response was honestly along the lines of, "We don't really care about the creator or copyright." Everyone loved Pinterest and didn't think about how those images came into existence.
Now we've integrated different versions of AI so quickly into so many services, I don't know that we can stop using it (without becoming hobbits).
- I don't know a lot about the technology behind LLM but logically, predictive text probably leverages an LLM. I honestly don't know if that's fed from the browser or the website or if there's a central api they all tap into but almost every website I go to suggests what I should type next.
- I can (and do) turn predictive text off in Word because it's annoying. But Microsoft profits aren't more or less because I'm not using predictive test.
- I also use DuckDuckGo and it's started to give me predictive answers to search questions.
- Canva offers MagicWrite that offers to "help write copy and brainstorm ideas."
- Since Meta is the one stealing work to support their LLM, I imagine that's filtering into Facebook in some way.
- Grammarly, I would imagine though haven't researched, could have had the rules of grammar programmed in. But we don't know if it was trained solely on work people submitted to the site for review or from other sources. And did the people who submitted text for Grammarly to review know that their words would be used to train service? (I imagine there was a footnote somewhere.)
- Substack has AI to allow me to generate an image for my post.
Not all versions of AI are an LLM but it seems everyone wants an algorithm to automate some part of a process for them.
I would very much like to see the DOJ crack down on pirate sites, period. Copyright work should be protected.
I would also like to see copyright enforced in regard to training any AI model. I think big companies would still try to get around it. But one judgement that AI models have to respect copyright sets a precedent that makes it easier for artists to pursue justice in protecting their work.
And I'd actually like to see a company create AI that respects copyright (I did see Tess in the below comments). I think we'd be surprised how many people would be willing to allow their work to train AI if they're fairly compensated (and who knows what "fair" is in the wild west).
The difference with something like Grammarly or an algorithm is that it's programmed with the rules of grammar or whatever. Predictive text for search only uses other terms that people have searched for. A LLM is programmed to "learn" and then fed so much text that it can get statistically good at stuff, but the programmers didn't create the rules.
Correct. These are very different technologies. On Threads, I'm seeing a lot of people call spellcheck AI, and that's simply not true. Spellcheck runs from a dictionary stored on the computer and any time a word pops up that doesn't match something in the dictionary, it flags it. Same for basic grammar. These technologies have been around for decades.
Agreed. Search engines use terms other people have searched for. But predictive text in the body of an email or a textbox (the way substack is suggesting words to complete my sentences even as I type this) is logically probably an LLM of some sort. Since LLMs are statistic rather than semantic it's looking at the characters I've typed thus far and (as long as there aren't that many typos ๐) predicting the characters that come next which it would seem to have probably pulled from some language dataset somewhere.
I don't know a lot about the real technical parts of the technology. If it's not an LLM I'd be curious to understand more how predictive text models were built.
Again, I'm not saying at all that any of the AI tools should be built off the work of others without credit or compensation. 100% no. I just think it's easy to say "AI is bad" then hit tab on the next predictive text suggestion or say, "Hey Alexa..." Acknowledging that AI in different forms has permeated almost every avenue of technology doesn't mean the road to how we got here is acceptable.
originally autocorrect was more about comparing to an existing dictionary. Predictive text in digital keyboards might be using more AI now. And yes, assuming we can find an ethically trained model (e.g., if the keyboard is using *your* words you type) then predictive text is really nice--I *do* tend to text the same things over and over. ๐ But I still wouldn't want an AI trained on my words to write a book as me, because by definition a LLM can never write something better than what it's been trained on (and arguably writes to the average): and I want to improve my writing. :)
Yeah, copyright and intellectual property is under-protected, at least when it comes to creatives like authors and artists. People want access to art, but they don't want to pay for it, or bother with the legal means of getting it for free (libraries). It's frustrating, and it seems like something more people need to be educated about.
And sometimes, it's something people only care about once they've been affected. There's been . . . a number of authors who were using AI-generated videos and images for their book promotion, and now they're seeing themselves in the LibGen dataset. I truly hope that, now that they understand how it feels, they will learn to respect others' copyright and IP, too.
There are some things going around for companies to legally acquire work to train their AI, but none of the deals are good enough for authors that I'd, personally, sign them yet. It is a step in the right direction though.
I found two of my books in the dataset today. I'm not happy about it. But I wasn't (knowingly) using generative AI for anything, anyway. (I've seen AI images slowly creep onto stock photo sites, even though they're not supposed to be there. I've resorted to avoiding any stock image that was uploaded after 2022, just to be safe.)
Yeah, it's definitely harder to avoid those things. I've had the same trouble when looking at stock photo sites, or -- my gosh -- Pinterest. It's terrible.
So far, only 1 of my books is on the list but that is 1 too many. Book pirating is a big issue but with the internet and ebooks, how can you actually stop it? But a big company using pirated content is just so frustrating, they could buy what they want but they just don't want to spend the money. And even if they bought my books legally, I wouldn't want them using them to train anything.
There's not much you and I can do to stop piracy right now. When I was a debut, I paid for a service that would send takedown notices to pirate sites, but it was so expensive and it was *constant*. Eventually, I had to give it up. (Partly because they changed how it worked and I *really* couldn't afford it.)
If you're published with a big house, they probably have a legal department that will send takedown notices, though it's overwhelming for them, too. But off the top of my head, HarperCollins has a form to do it on their website, so if you have any books with them and you know the link for the illegal download, you can pop that in there.
Otherwise . . . we just have to try not to think about it too hard.
As far as the AI training . . . I know. And reading through the articles, it certainly seems like at least some of the people involved knew it wasn't right, either. But that's why agents and author groups are working on it right now. That's why there's a class-action going. When it's our turn to show up to support them, we'll do that! :)
Arghhh! The 3rd book in my trilogy is listed. I screenshot the LibGen result, is that all we can do at this point? So disheartening. Thank you, Jodi.
For now, yeah, just screenshot and keep writing your books! I have a folder on my computer called "books meta stole," just for the sassiness of it.
But here's the thing -- you don't have to do anything else right now. Tons of people are doing it for you! My agent has been working on AI and copyright stuff for months. Her work will help all authors. The AALA and Authors Guild are doing this, too.
You know how authors are always like, "I just want to write my books and let someone else handle the hard, boring stuff!" That's what's happening right now. :)
Hang in there, Olivia!
Thank you for taking the time to respond, Jodi. Iโll keep an eye on your stack for more news. AND create a โbooks Meta stoleโ file.๐ญ
9 of mine are on there. Itโs disheartening.
Asking a writer if they'd use an LLM to write their book is like asking a pro athlete if they'd use a robot to play their position for them. I get that most people find writing to be an onerous task, and even for professional writers it's hard work, but so is becoming/being good at anything. It says a lot about our grifter culture that people don't even spare a thought for 1) what it is that constitutes the fruits of your labor (i.e., the books themselves, as opposed to the attention and money that might come from them), or 2) how you go about producing them. I can imagine that a person asking that question (hopefully not your tax guy) is just looking for confirmation that everyone else is as dead inside as they are.
The art community is feeling things about AI also. Real art has to be stolen for AI art to exist. And artists who create with their hands are not happy about AI art being slapped on a canvas and sold.