Scraping the Web Now, Asking for Permission Later


Federico Viticci, writing at MacStories about Apple’s details on their AI model being trained on web content:

As a creator and website owner, I guess that these things will never sit right with me. Why should we accept that certain data sets require a licensing fee but anything that is found “on the open web” can be mindlessly scraped, parsed, and regurgitated by an AI? Web publishers (and especially indie web publishers these days, who cannot afford lawsuits or hiring law firms to strike expensive deals) deserve better.

I agree wholeheartedly. I felt similarly when I looked at the data that trained Google’s AI. I see Chorus and our forum very clearly in their training data. We didn’t agree to that. Our community never agreed to that. Google played a massive role in devaluing small and medium sized websites (and the online ad business) and we’re certainly not going to be the ones getting any publishing deals. None of it sits well with me.