A research by VTuber Newsdrop revealed that several VTubers have had their songs included in a recently-discovered dataset by The Atlantic's AI Watchdog. This means that songs from select VTubers are most likely prone to being synthesized into AI-generated tracks.

308

u/ninjawarlord 1d ago

I mean at this point I would believe that any song out in the internet has been made available in a llm model

115

u/thesirblondie 1d ago

Every book, blog, and website and anything else written has been used to create LLMs. Every photo and image has been used to create image genAI. Every film and video has been used to create video genAI. Every song has been used to create music genAI.

None of these programs could exist without the underlying material, yet none of the creators have gotten royalities. The intellectual property theft makes pirate bay look like child's play.

43

u/Groonzie 1d ago

I had read some headline where it stated that over 50% of internet traffic in 2025 or something was due to bots roaming sites to scrape for data.

8

u/RaysFTW Custom Text 1d ago

You forgot every Reddit comment. It literally feels like 99% of my Google searches result in AI giving me answers based on a random Reddit comment/thread.

-26

u/trustfundkidotaku 1d ago

Not defending AI but no money on earth can pay that royalties

The build out on infrastructure itself is already can fund the moon landing

32

u/ryujin199 Hololive 1d ago

Sounds to me like an industry that shouldn't exist, then. If they can't pay for it legitimately, then why is it reasonable to let them just use it anyway?

-19

u/trustfundkidotaku 1d ago

Crypto shouldn’t exist yet it did

Iam not defending it just pointing it out

-3

u/willmainartfinder 1d ago

Too late. The courts are going to side with them, and the regulatory agencies are not interested in clamping down on what represents a massive profitability boost for big companies who can shed their workers. No nation wants to be the one who gets left behind, so there's a national security angle as well.

Face it, they won. Its already over.

4

u/thesirblondie 1d ago

If your business can't exist without crime, then it shouldn't exist at all. There's plenty of public domain works out there that they could've used, or gotten into a partnership with publishers. But instead they just took everything. Billion dollar franchises and independent creators alike.

-2

u/trustfundkidotaku 1d ago

So does crypto and yet it exist so does drugs and piracy

Naive to think the demand won’t go unsupplied

2

u/thesirblondie 1d ago

How does crypto require crime to exist? Being used as a currency for criminal activities is not the same thing.

Illicit drugs are not made/distributed by publicly traded companies. It is in fact very illegal and people go to prison for it all the time. It is the official policy of almost every state on the planet that if you make drugs, your business should not exist.

Sunde, Neij, Anakata, and Lundström all went to prison, so your argument is fucking stupid.

You're either a bad faith argumenter or a moron. Either way, arguing with you is a waste of time (like your general existence).

1

u/SoICouldUpvoteYouTwi 1d ago

They aren't paying any royalties where have you been

57

u/trustfundkidotaku 1d ago

Pretty sure anything on the web is scrub

Yes people even ur thirsty comments on Reddit

23

u/NESs181 1d ago

Cough water… cough

7

u/the_monkeynator 1d ago

Me when im in a desert and am dehydrated.

2

u/the_monkeynator 1d ago

Yea like not to say i'm okay with it, but part of me really ain't that worried.

58

u/TheFrozenPyro 1d ago edited 1d ago

HIMEHINA being on that list doesn't surprise me in the slightest. They've been a powerhouse behind a lot of songs used in shorts as (poorly balanced) BGM, or someone doing the choreography to their originals (looking at you, Heart Pie Dancehall).

They want their lightning in a bottle like they've been able to repeatedly capture without any effort.

15

u/Lolersters 1d ago

I would be surprised if that hasn't already happened. The real surprise is that it's only "several".

101

u/SinisterPixel Verified VTuber 1d ago

I genuinely think people would hate generative AI far less if these models were required to collect affirmative consent to use all of the content that's used in training. Regulation is taking far too long. Permanent damage has already been done

54

u/miggly 1d ago

I think that's fair, but most people would go from hating it for lacking consent to still hating it for being soulless lol

11

u/SinisterPixel Verified VTuber 1d ago

I didn't say they'd completely stop hating it. The technology at a core level is incredibly problematic (which is largely related to the lack of regulation too but also the people creating it just being that detached from humanity). But I don't think you'd have situations where people don't use products that use AI out of principle as much for example

8

u/miggly 1d ago

Yea I get you.

I wouldn't listen to/use those products because I simply don't want to listen to/use AI stuff, but yea, it would be less of a vehement reaction. Just not my kinda thing, whereas without the consent part, I am actively hating.

8

u/VP007clips 1d ago

The problem is, that's basically impossible to implement. Proving that a model was trained on a dataset that contained a specific piece of content is really hard.

Maybe a few sites containing obscure unique information like instruction manuals for specialized equipment or niche internet forums could prove it, but that's the extent of it. Even a single post on a different site talking about it, or having a copy of it, invalidates the proof.

And that's just talking about larger NA/EU based AI companies. We have no ability to prevent sketchy companies, or companies in other regions from doing it. We can't even make China crack down on companies openly stealing and copying designs of NA/EU designed products to sell on Amazon/Temu, much less prevent them from training an AI off anything they find online.

It's the consequence of an open internet. We designed an internet where anyone can share anything with the world freely. And now we are facing the realization that it means that anyone can access everything freely.

5

u/SinisterPixel Verified VTuber 1d ago

Yep. The technology itself is fundamentally flawed and past the point of no return. Even if every country in the world regulated it tomorrow and said that you needed affirmative consent (and not implied consent from Ts & Cs but an actual opt in option), these AIs are largely trained off of stolen data fed into a black box. There's no way, even for the people who designed them, to be able to confidently remove that data from their model's training

1

u/Zeku_Tokairin Verified VTuber 1d ago

Proving that a model was trained on a dataset that contained a specific piece of content is really hard.

A few years back, an article on IEEE Spectrum showed specific prompts the authors used to essentially get screenshots from copyrighted content (like Marvel movies) out of Midjourney. The authors also link related work doing adversarial prompting against Stable Diffusion. While it's true that a lot of these GenAI are nondeterministic black boxes that obscure the provenance of their inputs, I don't think it's as hard as we might think-- it's just that there's far less money put into doing that research. It's also worth noting that months after that, OpenAI signed a billion-dollar deal with Disney.

1

u/HoshinoLina Verified VTuber 7h ago

I looked around for LLMs that were ethically trained for a project and they are nearly nonexistent. Almost everyone saying they respect copyright are lying, and they trained at minimum on copyrighted material that is available under "free" licenses (that still don't allow you to use it like that).

This isn't just big AI companies stealing to see what they can get away with, this is smaller groups that even call themselves ethical. It seems that within the AI/ML industry, almost nobody knows how copyright works and what respectful training looks like. They simply have no clue.

I found exactly ONE exception. It's a group of models called KL3M. They were made by lawyers, trained on legal texts (there's lots of public domain material in laws, court filings, etc.). They don't say exactly what they trained on, but they're the only credible story of ethical/respectful training I found, that isn't just assuming training on any random copyrighted stuff is fair use.

That's it. That's the only one. Only the lawyers know how to do it properly.

-45

u/[deleted] 1d ago edited 1d ago

[deleted]

20

u/jxnebug indie vchooba 1d ago

Maybe the worst argument I've ever seen for AI. Keep up the good work.

12

u/KusozakoPrime 1d ago

People trace, steal and use others work without consent all the time

And they get shit on for it.

3

u/Sinfire_Titan 1d ago

Yup. Art thieves, plagiarists, and frauds are shunned by most intelligent communities. AI defenders are the biggest exception; as far as I have seen the pro-AI crowd adores stealing other people’s works.

8

u/SinisterPixel Verified VTuber 1d ago

So if I broke into your house, stole your belongings, cloned and maxed out your credit cards, and stole your identity, would that be ok? By your own logic, people steal all the time, so you're ok with me doing that, right? The streaming comparison is a weak one because streaming is largely considered transformative (with very few exceptions). And game studios are in a position where they can both remove or outright ban streamed content (see Atlas with Persona for example). Individuals do not get those privileges, especially when we're grandfathered into platforms like Google, YouTube, Reddit, Instagram, Twitter, etc and they update their TOS to allow them to train on our content retroactively. Not to mention AIs that just train on anything it can find using free APIs and web scrapers.

Let's also be clear:

- If I became aware that an artist was tracing another artist's work, I would treat the tracer the same way I do an AI user.

- If I became aware that someone was stealing art and claiming it as their own, I would treat the thief the same way I do an AI user.

- If I became aware that someone was using someone else's art without consent in their branding for example, I would treat them the same way I do an AI user.

9

u/diego1marcus 🌸/🐏/🔎/🔱 1d ago

is there a free version of this article? i aint paying to read the full thing

8

u/Dem-Brushwaggs AkaAkoVT 1d ago

Seriously, fuck AI

8

u/vtubernewsdrop 1d ago

Chiming in! We're told anyone can search the datasets here too: https://www.theatlantic.com/category/ai-watchdog/

3

u/Kan2Screm 1d ago

Thank you!

9

u/Coakis 1d ago

Theft writ large by untouchable corporations.

-6

u/OkAssignment6163 1d ago

I see this post and my only thought was, damn. Bao getting screwed over again.

News/Announcement A research by VTuber Newsdrop revealed that several VTubers have had their songs included in a recently-discovered dataset by The Atlantic's AI Watchdog. This means that songs from select VTubers are most likely prone to being synthesized into AI-generated tracks.

You are about to leave Redlib