Bluesky users debate plans around user data and AI training

MT HANNACH
4 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Bluesky social network recently published a proposal on Github Describing new options, this could give users to indicate whether they want their publications and data to be scratched for things such as the generative training of AI and public archiving.

CEO Jay Graber discussed the proposal earlier this week, While on stage at South by Southwest, but he drew new attention on Friday evening after her Posted on this subject on Bluesky. Some users reacted with alarm to the plans of the company, which they considered as a reversal of the previous insistence of Bluesky will not sell user data to advertisers And will not train on AI on user publications.

“Oh, hell no!” User sketch wrote. “The beauty of this platform was the sharing of information. Especially the AI ​​generation. Don’t cave now.

To input replied This generation of generative AI “already scratching public data on the web”, including Bluesky, because “everything that is on Bluesky is public as a website is public”. She therefore said that Bluesky tries to create a “new standard” to govern this scratch, similar to the Robots.txt Remove that websites use to communicate their authorizations to web robots.

Debates on training and copyright on AI have Slipping robots.Among other things, stressing the fact that he is not legally enforceable. Bluesky supervises its proposed standard as that which would have a similar “mechanism and expectations”, providing “a format readable by machine, which the right players should respect, and have an ethical weight, but is not legally enforceable”.

According to the proposal, users of the Bluesky application or other applications that use the underlying ATPROTOCOLcould enter their parameters and allow or prohibit the use of their Bluesky data in four categories: generative IA, bridging of protocols (that is to say, connecting different social ecosystems), loose data and web archiving (such as the Wayback Machine of the Internet Archive).

If a user indicates that he does not want his data to form a generative AI, the proposal indicates: “Companies and research teams creating AI training sets should respect this intention when they see it, when the websites are scraped, or by making bulk transfers using the protocol itself.”

Molly White, who writes the necessary and web3 quote bulletin is just an excellent blog, described this As “good proposal” and said it was “weird to see inflamed people Bluesky for that”, because it is not so “welcoming in the scratch of the AI” but “try to add a consent signal to allow users to communicate preferences for the scratch that already occurs”.

“I think weakness with that and [Creative Commons’] A similar proposal for “preferably signals” is that they count on scrapers to respect these signals by a desire to be good actors, “continued White. “We have already seen some of these societies blow in front of robots and hacker materials to scratch.”

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *