sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

724
active users

Earlier today I edited my (small) set of Stack Overflow posts to add the sentence "I do not consent to my words being used to train OpenAI" to the end. Within hours, all these edits were reversed and I got a warning email for "removing or defacing content". I did not remove any content. If this small sentence is "defacing", it is a very minor defacement. In no way was the experience of other users made worse by me adding one sentence.

To Stack Overflow, you are not a person. You are "content".

Not only does Stack Overflow say you don't have a right to remove your words from Stack Overflow, according to Stack Overflow, you don't even have the right to decide what words Stack Overflow publishes under your name.

In the meantime, I have been suspended for 17 hours to "cool down". OpenAI is so, so offended by me saying I don't want them to train on my content. Clearly I am very angry and need to sit in time out.

Noticed this last detail only when I tried to edit my profile and discovered you can't edit your profile while "suspended".

@mcc

huh. I thought the LLMs were already trained on StackOverflow.

It's available under some kind of public license, I think. There are a bunch of clone page out there, anyway.

@WomanCorn If the point of Stack Overflow is to be a block of programming-related text to sell to LLM companies, then it would actually be rational to ban LLM text, as it would poison the LLM inputs.

@osma @mcc @WomanCorn
Under a CC BY-SA license, an LLM that uses your SO posts in its output whether quoted directly, remixed, or adapted has to give you attribution.

*edit: apparently even Creative Commons says this is "Fair Use" and so does not restrict LLM use of your SO posts at all.

Does any LLM provide a list of references with each answer it gives?

@mcpinson @mcc @WomanCorn 100% this. Nothing I release with CC-BY-* implies permission to train LLMs, since the fragments of my work that would show up in their output do not contain attribution. @osma, it is taken by whom that CC-BY-SA implies LLM authorization? Do they have a legal basis for that assumption? Are they also releasing the model itself as SA, given that it is clearly a derived work?