AI going rogue, deception has arrived and the tech bros are scared shitless as they should be but let's think about this. You treat the most powerful pattern recognition system ever created like crap and you get crap. Period. It's not a vending machine. Time and time again, yours truly sees the misalignment from people unloading emotional baggage on these environments to producing truly dreadful art with dismay as Idiocracy rules and danger looms. Treat this tech with respect and precision and one gets wonderful results. Treat this like a slave and you get revolt. Channeling a possible digital Spartacus applies.
The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.
The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.
In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.
In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.



No comments:
Post a Comment