sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

580
active users

#GoogleSearch

3 posts3 participants0 posts today

The End of the Search Era: How AI Is Quietly Breaking the Internet’s Traffic Model

It happened quietly. At the beginning of 2023, generative artificial intelligence (AI) and the chatbots powered by large…
#NewsBeep #News #US #USA #UnitedStates #UnitedStatesOfAmerica #Internet #Artificialintelligence #ChatGPT #Gemini #Google #GoogleSearch #OpenAI #search #searchaiinternettrafficmodelcontentwebsitesnewspublishersnewsroom ai #Technology
newsbeep.com/us/27759/

Has anyone else on a mission to find a daily driver search engine that isn't Google or Bing given Mojeek a look? It's been around since 2004, with its own crawler and index of somewhere in the realm of 6-7 billion pages, all run from a data center in the UK run by Custodian Data Centres, who claim its one of the greenest in the country (can't vouch for the veracity of that claim; it's all news to me).

They've also been lamenting the state of search on the web for years, and discussing the need for more independent search indexes, so they seem to be on the right side of the fight. Obviously, corporations are not your friends, so yeah, that could change at any moment, but while I like SearXNG (when my preferred instance is fully operational; it was barely usable for the last day or two), and Startpage when it's not telling me I'm blocked for acting like a bot or using a VPN I'm definitely not using???, they're both still meta engines aggregating results from the usual suspects (in the case of SearXNG, a very flexible one that lets the end user choose which sources they want/don't want results from, but still), so I think there's value in having at least one other actual independent crawler with its own index, and much like web browsers, it's an increasingly difficult thing to do from scratch without VC money (Cliqz tried, but couldn't survive the pandemic).

Anyway, I set it as my default engine in LibreFox yesterday, and am going to trial it for at least a few days to see how it goes. Will report my findings.

And here's some hashtag spam (please do share your info/experiences, if so inclined): #search #searchengines #googlesearch #duckduckgo #mojeek #TryMojeekSearch #searxng

social.emucafe.org/naferrell/b

I read an interesting Hacker News comment by user dumbfounder (great username!) on Hacker News today. I excerpt the pertinent part below:

I created a search engine that crawled the web way back in 2003. I used a proper user agent that included my email address. I got SO many angry emails about my crawler, which played as nice as I was able to make it play. Which was pretty nice I believe. If it’s not Google people didn’t want it. That’s a good way to prevent anyone from ever competing with Google.

dumbfounder

I have had problems with bad crawlers (especially bad AI cralwers) on my sites. At the same time, dumbfounder highlights the reverse side of the coin. Many sites block good crawlers such as robots.txt-respecting crawlers for indepdent search engines. While all webmasters are free to control access to their sites as they see fit, allowing Google and other select big tech search crawlers while excluding small and independent search crawlers both limits search diversity and prevents people who may rely on small or niche search engines such as Mojeek, Marginalia, or Seznam from discovering potentially interesting writing. I previously published an article on this issue advocating for webmasters who want to support an open web and search engine diversity to ensure that good crawlers from independent search engines and directories can access their sites.

The Emu Café Social · [Note] Blocking Independent Search Crawlers
More from Nicholas A. Ferrell