The guy replying to me is (as far as I can tell) the sole owner and moderator of .wtf, which is the instance I’ve been using up until this point. I kinda already knew they allowed AI slop, as there’s nothing in the rules that says otherwise, but this interaction really sealed my decision. “Hey, person who makes music. If you don’t like another musician using the fascist plagiarism machine, how about you offer to create art for them? After all, if people simply donated their time and effort, maybe they wouldn’t have to resort to pissing in the face of their fellow artists of a different medium. Think about it.”
Also, I think you can donate to the instance in crypto?
Fuck right off with that.
On another note: PeerTube itself uses Whisper for automatic subtitle generation. It’s something I don’t LIKE, but I approached the devs about it and they responded very thoughtfully. I’ll admit I don’t know all the differences between locally run, open source models that are used for accessibility and the horrible plagiarism machines we all despise the most. I suspect they’re still built off exploitive tech / trained on stolen data and whatnot, and Whisper being the product of OpenAI doesn’t inspire confidence, but Framasoft only uses it to detect speech, not create it. That’s hardly “generative” at all, is it? It’s just creating subtitles. Now, that doesn’t mean the program itself is ethical given how it was likely created (as the devs acknowledge), and we SHOULD push for ethical, FLOSS methods of doing these sorts of things. I’m sure it can be done, it wasn’t exploitative before the AI boom, right? This is where my knowledge ends and I ask for feedback. Any thoughts?


Fwiw 99.9% of the time someone talks about ak “open source” generative AI model, what they really mean is “open weight”.
— a random Reddit post I found when looking for a good definition to share
Some people (including the author of that definition) don’t like the need for open source models to have an open source dataset. It’s also not clear to me whether that definition is even supposed to mean the dataset is actually public domain, or just clearly defined (e.g. “we trained on all top 100 best-selling books from the period 2000–2020”). The former would obviously be very meaningfully different from closed models in terms of accusations of ethical problems in the training process.
Open-weight models basically just mean you can download it and make some slight tweaks and run it at home. It means the big AI companies aren’t benefiting financially from your use and can’t train on what you feed them for their next model, and because these are typically designed to be run locally rather than in a data centre the environmental impacts are lessened. But in terms of the training process it’s no better than closed models.