You might be familiar with what I'm terming the "Token Wars" - in which #LLM and #GenAI companies seek to ingest text, image, audio and video content to create their #ML models. Tokens are the basic unit of data input into these models - meaning that #scraping of web content is widespread.
In retaliation, many sites - such as Reddit, Inc. and Stack Overflow - are entering into content sharing deals with companies like OpenAI, or making their sites subscription only.
Another solution that has emerged recently is content blocking based on user agent. In web programming, the client requesting a web page identifies themself - usually as a browser or a bot.
User agents can be blocked by a website's robots.txt file - but only if the user agent respects the robots.txt protocol. Many web scrapers do not. Taking this a step further, network providers like Cloudflare are now offering solutions which block known token scraper bots at a a network level.
I've been playing with one of these solutions called #DarkVisitors for a couple weeks after learning it about it on The Sizzle and was **amazed** at how much traffic to my websites were bots, crawlers and content scrapers.
https://darkvisitors.com
(No backhanders here, it's just a very insightful tool)