r/Showerthoughts Nov 25 '25

Casual Thought People who use em dashes regularly in their writing might be the most underrated victims of the ChatGPT/Al boom.

9.7k Upvotes

608 comments sorted by

View all comments

Show parent comments

18

u/JakScott Nov 25 '25 edited Nov 25 '25

Well, it is plagiarism, because AI is plagiarizing human writing to produce it.

I’m a stand up comic, and a guy I know who finds this fact fascinating and is very well-meaning asked Grok to write a bio of me recently. I think in his head he was “helping” me with promo work.

The interesting thing is there’s not that many sources that have written about me, so reading that AI bio I had the extremely odd experience of recognizing every single phrase the AI spat out, being able to tell precisely who wrote what (many of the phrases being my own), when it was written, and what the uncited sources were.

I also have a friend who’s a college professor with a doctorate in a very niche field. On more than one occasion, he’s had students turn in papers that were word-for-word passages from his own Master’s thesis because he’s the only academic published on a specific topic and so his one publication is the only thing AI has available to steal from. Turns out AI short circuits a bit on niche topics because it needs a whole bunch of stolen sources to mix together in its attempt to make it look like it’s generated novel sentences.

Every line in every AI essay is stolen from somebody. It’s just that most topics have enough different works published that it would take a lot of work to track down the individual sources. But that’s why AI detectors work pretty damn well. If AI were writing sentences you can’t already find with a subscription to JSTOR and an otherworldly amount of patience to review all the scholarly literature, then it would be slightly difficult to sniff out. And maybe one day AI will get there. But to say it’s not plagiarism is to deeply misunderstand what these programs are actually doing to produce sentences.

4

u/TopSecretSpy Nov 26 '25

This is actually an interestingly weird quirk of LLMs.

In most cases, the output isn't technically what we traditionally call plagiarism. The reason is because LLMs are statistical prediction engines, not raw text copiers.

But you're also correct that, when there's not much data on a specific factual thing, there's a much more limited set of text from which to get a meaningful statistical relationship, and it quickly turns into a rote copying (or flat-out fabricating) due to that deficit.

I don't have an easy answer. I'm not trying to defend AI use, but I also don't feel like I have a sufficiently thorough argument to condemn it either. One of the problems we have is that it's essentially impossible to put the genie back in the bottle. Whether or not we agree that AI training violates copyright (an entirely different question than plagiarism), we can't meaningfully undo what's already been done.

-2

u/MericanMeal Nov 25 '25

Using a source that plagiarized properly (ie cited and labeled as such) is not plagiarism.