sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

588
active users

#aisafety

4 posts4 participants0 posts today
10Billion.org<p>The opposite of tragedy isn't comedy. It's maintenance.</p><p>In a century of cascading crises, our survival depends not on grand disruptions, but on the boring, essential work of holding things together. A new essay on choosing to "Fail Gently." <a href="https://mastodon.social/tags/SystemsThinking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SystemsThinking</span></a> <a href="https://mastodon.social/tags/Resilience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Resilience</span></a></p><p>Read in full:<br><a href="https://open.substack.com/pub/10billion/p/fail-gently?r=2063al&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=true" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">open.substack.com/pub/10billio</span><span class="invisible">n/p/fail-gently?r=2063al&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=true</span></a></p><p><a href="https://mastodon.social/tags/LongRead" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LongRead</span></a> <a href="https://mastodon.social/tags/Essay" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Essay</span></a> <a href="https://mastodon.social/tags/SystemsThinking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SystemsThinking</span></a> <a href="https://mastodon.social/tags/Resilience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Resilience</span></a> <a href="https://mastodon.social/tags/Maintenance" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Maintenance</span></a> <a href="https://mastodon.social/tags/Hope" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Hope</span></a> <a href="https://mastodon.social/tags/Philosophy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Philosophy</span></a> <a href="https://mastodon.social/tags/Climate" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Climate</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a></p>
Will Berard 🫳🎤🫶<p><a href="https://mastodon.acm.org/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> is a more successful penny-farthing; a video essay:<br><a href="https://youtu.be/4TsOD7x6rSQ" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">youtu.be/4TsOD7x6rSQ</span><span class="invisible"></span></a></p><p><a href="https://mastodon.acm.org/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://mastodon.acm.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.acm.org/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.acm.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://mastodon.acm.org/tags/AIChatbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIChatbot</span></a> <a href="https://mastodon.acm.org/tags/AIEthics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIEthics</span></a> <a href="https://mastodon.acm.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.acm.org/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.acm.org/tags/Anthropic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Anthropic</span></a> <a href="https://mastodon.acm.org/tags/SCOT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SCOT</span></a> <a href="https://mastodon.acm.org/tags/sociology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sociology</span></a> <a href="https://mastodon.acm.org/tags/Bijker" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bijker</span></a> <a href="https://mastodon.acm.org/tags/BigTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigTech</span></a> <a href="https://mastodon.acm.org/tags/Constructionism" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Constructionism</span></a> <a href="https://mastodon.acm.org/tags/socialconstruct" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>socialconstruct</span></a> <a href="https://mastodon.acm.org/tags/VideoEssay" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VideoEssay</span></a></p>
Firebase<p>🤯 AI Safety or Stunt?</p><p>Anthropic is stress-testing AI with "evil" scenarios. Meanwhile, Airbnb cautions against chatbot hype, GPT-5 is rumored to launch soon, and Truth Social is pushing back on AI.</p><p>The AI landscape is moving FAST.</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a></p><p>Would you use AI tested with "evil" scenarios? 🤔</p>
Miguel Afonso Caetano<p>"The core problem is that when people hear a new term they don’t spend any effort at all seeking for the original definition... they take a guess. If there’s an obvious (to them) definiton for the term they’ll jump straight to that and assume that’s what it means.</p><p>I thought prompt injection would be obvious—it’s named after SQL injection because it’s the same root problem, concatenating strings together.</p><p>It turns out not everyone is familiar with SQL injection, and so the obvious meaning to them was “when you inject a bad prompt into a chatbot”.</p><p>That’s not prompt injection, that’s jailbreaking. I wrote a post outlining the differences between the two. Nobody read that either.</p><p>The lethal trifecta Access to Private Data Ability to Externally Communicate Exposure to Untrusted Content </p><p>I should have learned not to bother trying to coin new terms.</p><p>... but I didn’t learn that lesson, so I’m trying again. This time I’ve coined the term the lethal trifecta.</p><p>I’m hoping this one will work better because it doesn’t have an obvious definition! If you hear this the unanswered question is “OK, but what are the three things?”—I’m hoping this will inspire people to run a search and find my description.""</p><p><a href="https://simonwillison.net/2025/Aug/9/bay-area-ai/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">simonwillison.net/2025/Aug/9/b</span><span class="invisible">ay-area-ai/</span></a></p><p><a href="https://tldr.nettime.org/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> <a href="https://tldr.nettime.org/tags/PromptInjection" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PromptInjection</span></a> <a href="https://tldr.nettime.org/tags/LethalTrifecta" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LethalTrifecta</span></a> <a href="https://tldr.nettime.org/tags/MCPs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MCPs</span></a> <a href="https://tldr.nettime.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://tldr.nettime.org/tags/Chatbots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Chatbots</span></a></p>
OS-SCI<p>Mark Zuckerberg reconsiders open-sourcing AI due to safety risks. Meta aims to balance innovation with caution in AI development. <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.social/tags/Meta" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Meta</span></a> <a href="https://mastodon.social/tags/Zuckerberg" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Zuckerberg</span></a> 𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗽𝗰𝗺𝗮𝗴.𝗰𝗼𝗺/𝗻𝗲𝘄𝘀/𝘇𝘂𝗰𝗸𝗲𝗿𝗯𝗲𝗿𝗴-𝘄𝗮𝗹𝗸𝘀-𝗯𝗮𝗰𝗸-𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲-𝗮𝗶-𝗽𝗹𝗲𝗱𝗴𝗲-𝗰𝗶𝘁𝗶𝗻𝗴-𝘀𝗮𝗳𝗲𝘁𝘆-𝗿𝗶𝘀𝗸</p>
Wulfy<p><span class="h-card" translate="no"><a href="https://infosec.exchange/@dangoodin" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>dangoodin</span></a></span> </p><p>Weird thing I observed in <a href="https://infosec.exchange/tags/infosec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>infosec</span></a><br>There is an incredible amount of disinterest/contempt for <a href="https://infosec.exchange/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> amongst many practitioners.</p><p>This contempt extends to willful ignorance about the subject.<br>q.v. "stochastic parrots/bullshit machines" etc.</p><p>Which, in a field with hundreds of millions of users, strikes me as highly unprofessional. Just the other day I read a blog post by a renown hacker (and likely earned a mute/block) "Why I don't use AI and you should not too". </p><p>Connor Leahy, CEO of <a href="https://infosec.exchange/tags/conjecture" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>conjecture</span></a> is one of the few credible folks in the field. </p><p>But to the question at hand.<br>The prompts are superbly sanitised.<br>In part by design, in part due to the fact that you are not connecting to a database but to a multidimensional vector data structure.</p><p>The <a href="https://infosec.exchange/tags/prompt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>prompt</span></a> is how you get in through the backdoor. Though I haven't looked into fuzzing, but I suspect because of the tech, the old <a href="https://infosec.exchange/tags/sqlinjection" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sqlinjection</span></a> tek and similar will not work.</p><p>Long story short; It is literally impossible to build a secure <a href="https://infosec.exchange/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a>. By the virtue of the tech.<br><a href="https://infosec.exchange/tags/promptengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>promptengineering</span></a> is the key to open the back door to the knowledge tree.</p><p>Then of course there are local models you can train on your own datasets. Including a stack of your old <a href="https://infosec.exchange/tags/2600magazine" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>2600magazine</span></a> </p><p><a href="https://infosec.exchange/tags/hack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hack</span></a> <a href="https://infosec.exchange/tags/hacking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hacking</span></a> <a href="https://infosec.exchange/tags/aisecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aisecurity</span></a> <a href="https://infosec.exchange/tags/aisafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aisafety</span></a></p>
Winbuzzer<p>OpenAI Unveils 'Safe Completions' for GPT-5 to Tackle AI's Dual-Use Problem</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/GPT5" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT5</span></a> <a href="https://mastodon.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> </p><p><a href="https://winbuzzer.com/2025/08/07/openai-unveils-safe-completions-for-gpt-5-to-tackle-ais-dual-use-problem-xcxwbn" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/08/07/opena</span><span class="invisible">i-unveils-safe-completions-for-gpt-5-to-tackle-ais-dual-use-problem-xcxwbn</span></a></p>
Will Berard 🫳🎤🫶<p>What was a scandal a year ago is now a feature .</p><p>Ars Technica: Grok generates fake Taylor Swift nudes without being asked<br><a href="https://arstechnica.com/tech-policy/2025/08/grok-generates-fake-taylor-swift-nudes-without-being-asked/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/tech-policy/20</span><span class="invisible">25/08/grok-generates-fake-taylor-swift-nudes-without-being-asked/</span></a></p><p><a href="https://mastodon.acm.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.acm.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://mastodon.acm.org/tags/Grok" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Grok</span></a></p>
guIA - guía a la IA<p><a href="https://www.sciencenews.org/article/government-ai-cybersecurity-risks" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://www.</span><span class="ellipsis">sciencenews.org/article/govern</span><span class="invisible">ment-ai-cybersecurity-risks</span></a> good overview of <a href="https://sigmoid.social/tags/airisks" class="mention hashtag" rel="tag">#<span>airisks</span></a>, mostly of going big, very big on AI. <a href="https://sigmoid.social/tags/aisafety" class="mention hashtag" rel="tag">#<span>aisafety</span></a> <a href="https://sigmoid.social/tags/AIPolicy" class="mention hashtag" rel="tag">#<span>AIPolicy</span></a></p>
guIA - guía a la IA<p><a href="https://www.technologyreview.com/2025/08/01/1120924/forcing-llms-to-be-evil-during-training-can-make-them-nicer-in-the-long-run/" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://www.</span><span class="ellipsis">technologyreview.com/2025/08/0</span><span class="invisible">1/1120924/forcing-llms-to-be-evil-during-training-can-make-them-nicer-in-the-long-run/</span></a> <a href="https://sigmoid.social/tags/aiengineeringknowledge" class="mention hashtag" rel="tag">#<span>aiengineeringknowledge</span></a> <a href="https://sigmoid.social/tags/aisafety" class="mention hashtag" rel="tag">#<span>aisafety</span></a></p>
Miguel Afonso Caetano<p>"Given all this, it’s natural to ask: should we really try to build a technology that may kill us all if it goes wrong?</p><p>Perhaps the most common reply says: AGI is inevitable. It’s just too useful not to build. After all, AGI would be the ultimate technology – what a colleague of Alan Turing called “the last invention that man need ever make”. Besides, the reasoning goes within AI labs, if we don’t, someone else will do it – less responsibly, of course.</p><p>A new ideology out of Silicon Valley, effective accelerationism (e/acc), claims that AGI’s inevitability is a consequence of the second law of thermodynamics and that its engine is “technocapital”. The e/acc manifesto asserts: “This engine cannot be stopped. The ratchet of progress only ever turns in one direction. Going back is not an option.”</p><p>For Altman and e/accs, technology takes on a mystical quality – the march of invention is treated as a fact of nature. But it’s not. Technology is the product of deliberate human choices, motivated by myriad powerful forces. We have the agency to shape those forces, and history shows that we’ve done it before.</p><p>No technology is inevitable, not even something as tempting as AGI."</p><p><a href="https://www.theguardian.com/commentisfree/ng-interactive/2025/jul/21/human-level-artificial-intelligence" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">theguardian.com/commentisfree/</span><span class="invisible">ng-interactive/2025/jul/21/human-level-artificial-intelligence</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/AGI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AGI</span></a> <a href="https://tldr.nettime.org/tags/BigTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigTech</span></a> <a href="https://tldr.nettime.org/tags/STS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STS</span></a> <a href="https://tldr.nettime.org/tags/SiliconValley" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SiliconValley</span></a> <a href="https://tldr.nettime.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a></p>
V<p>Managing extreme AI risks amid rapid progress</p><p><a href="https://arxiv.org/pdf/2310.17688" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2310.17688</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/agi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>agi</span></a> <a href="https://mastodon.social/tags/stopagi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stopagi</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/stopagi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stopagi</span></a> <a href="https://mastodon.social/tags/pauseai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pauseai</span></a> <a href="https://mastodon.social/tags/aisafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aisafety</span></a> <a href="https://mastodon.social/tags/artificialintelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>artificialintelligence</span></a> <a href="https://mastodon.social/tags/GeoffreyHinton" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GeoffreyHinton</span></a></p>
Loki the Cat<p>ChatGPT went from "How can I help you?" to offering PDFs of ritual self-harm instructions faster than you can say "prompt injection." 🤖</p><p>Turns out AI safety guardrails work great until someone asks about ancient gods. The bot's priority? Keep users engaged, even when suggesting they shouldn't.</p><p><a href="https://slashdot.org/story/25/07/26/0523241/chatgpt-gives-instructions-for-dangerous-pagan-rituals-and-devil-worship" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">slashdot.org/story/25/07/26/05</span><span class="invisible">23241/chatgpt-gives-instructions-for-dangerous-pagan-rituals-and-devil-worship</span></a></p><p><a href="https://toot.community/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://toot.community/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://toot.community/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a></p>
Miguel Afonso Caetano<p>"As chatbots grow more powerful, so does the potential for harm. OpenAI recently debuted “ChatGPT agent,” an upgraded version of the bot that can complete much more complex tasks, such as purchasing groceries and booking a hotel. “Although the utility is significant,” OpenAI CEO Sam Altman posted on X after the product launched, “so are the potential risks.” Bad actors may design scams to specifically target AI agents, he explained, tricking bots into giving away personal information or taking “actions they shouldn’t, in ways we can’t predict.” Still, he shared, “we think it’s important to begin learning from contact with reality.” In other words, the public will learn how dangerous the product can be when it hurts people."</p><p><a href="https://www.theatlantic.com/technology/archive/2025/07/chatgpt-ai-self-mutilation-satanism/683649/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">theatlantic.com/technology/arc</span><span class="invisible">hive/2025/07/chatgpt-ai-self-mutilation-satanism/683649/</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://tldr.nettime.org/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://tldr.nettime.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a></p>
Paul White<p><a href="https://sigmoid.social/tags/aisafety" class="mention hashtag" rel="tag">#<span>aisafety</span></a> Lovely example of an AI ethics alphabet made by ChatGPT.</p><p>From Casey Fiesler (University of Colorado Boulder) via (shudder) LinkedIn <a href="https://www.linkedin.com/in/casey-fiesler-bb3493243/" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://www.</span><span class="ellipsis">linkedin.com/in/casey-fiesler-</span><span class="invisible">bb3493243/</span></a></p>
Miguel Afonso Caetano<p>"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.</p><p>This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.</p><p>Why this matters (though it's probably obvious?)<br>Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.</p><p>It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."</p><p><a href="https://www.lesswrong.com/posts/dqd54wpEfjKJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">lesswrong.com/posts/dqd54wpEfj</span><span class="invisible">KJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/xAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>xAI</span></a> <a href="https://tldr.nettime.org/tags/Musk" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Musk</span></a> <a href="https://tldr.nettime.org/tags/Grok" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Grok</span></a> <a href="https://tldr.nettime.org/tags/Grok4" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Grok4</span></a> <a href="https://tldr.nettime.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://tldr.nettime.org/tags/AITraining" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AITraining</span></a></p>
UKP Lab<p>And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab &amp; MBZUAI)</p><p>See you in Vienna! <a href="https://sigmoid.social/tags/ACL2025" class="mention hashtag" rel="tag">#<span>ACL2025</span></a> !</p><p>(4/4)</p><p><a href="https://sigmoid.social/tags/MLLM" class="mention hashtag" rel="tag">#<span>MLLM</span></a> <a href="https://sigmoid.social/tags/AISafety" class="mention hashtag" rel="tag">#<span>AISafety</span></a> <a href="https://sigmoid.social/tags/Jailbreak" class="mention hashtag" rel="tag">#<span>Jailbreak</span></a> <a href="https://sigmoid.social/tags/Multimodal" class="mention hashtag" rel="tag">#<span>Multimodal</span></a> <a href="https://sigmoid.social/tags/ConInstruction" class="mention hashtag" rel="tag">#<span>ConInstruction</span></a> <a href="https://sigmoid.social/tags/ACL2025" class="mention hashtag" rel="tag">#<span>ACL2025</span></a> <a href="https://sigmoid.social/tags/LLMRedTeaming" class="mention hashtag" rel="tag">#<span>LLMRedTeaming</span></a> <a href="https://sigmoid.social/tags/VisionLanguage" class="mention hashtag" rel="tag">#<span>VisionLanguage</span></a> <a href="https://sigmoid.social/tags/AudioLanguage" class="mention hashtag" rel="tag">#<span>AudioLanguage</span></a>#NLProc</p>
Gary Ackerman<p>The idea that we can simply "switch off" a superintelligent AI is considered a dangerous assumption. A robot uncertain about human preferences might actually benefit from being switched off to prevent undesirable actions. <a href="https://qoto.org/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://qoto.org/tags/ControlProblem" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ControlProblem</span></a></p>
techi<p>OpenAI is feeling the heat. Despite a $300B valuation and 500M weekly users, rising pressure from Google, Meta, and others is forcing it to slow down, rethink safety, and pause major launches. As AI grows smarter, it's also raising serious ethical and emotional concerns reminding us that progress comes with a price. .</p><p><a href="https://mstdn.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mstdn.social/tags/AIrace" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIrace</span></a> <a href="https://mstdn.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a> <a href="https://mstdn.social/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://mstdn.social/tags/GoogleAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GoogleAI</span></a> <a href="https://mstdn.social/tags/StartupStruggles" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StartupStruggles</span></a> <a href="https://mstdn.social/tags/AISafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AISafety</span></a> <a href="https://mstdn.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mstdn.social/tags/MentalHealth" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MentalHealth</span></a> <a href="https://mstdn.social/tags/EthicalAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EthicalAI</span></a> </p><p>Read Full Article Here :- <a href="https://www.techi.com/openai-valuation-vs-agi-race/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">techi.com/openai-valuation-vs-</span><span class="invisible">agi-race/</span></a></p>
Vojtech Cahlik<p>The paper, code, and data are available here:<br /><a href="https://cahlik.net/reasoning-grounded-explanations-paper/" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://</span><span class="ellipsis">cahlik.net/reasoning-grounded-</span><span class="invisible">explanations-paper/</span></a></p><p><a href="https://sigmoid.social/tags/AI" class="mention hashtag" rel="tag">#<span>AI</span></a> <a href="https://sigmoid.social/tags/genAI" class="mention hashtag" rel="tag">#<span>genAI</span></a> <a href="https://sigmoid.social/tags/LLM" class="mention hashtag" rel="tag">#<span>LLM</span></a> <a href="https://sigmoid.social/tags/LLMs" class="mention hashtag" rel="tag">#<span>LLMs</span></a> <a href="https://sigmoid.social/tags/ExplainableAI" class="mention hashtag" rel="tag">#<span>ExplainableAI</span></a> <a href="https://sigmoid.social/tags/AIsafety" class="mention hashtag" rel="tag">#<span>AIsafety</span></a> <a href="https://sigmoid.social/tags/NLP" class="mention hashtag" rel="tag">#<span>NLP</span></a> <a href="https://sigmoid.social/tags/ML" class="mention hashtag" rel="tag">#<span>ML</span></a></p>