Saturday, December 16, 2023

Model collapse

Photo: Jaap Arriens/NurPhoto via Getty Images

Leftovers can be absolutely great or not so much. Now, let's talk about leftovers in the form of hallucinations disseminated into the net by Chat GPT and significant others, innocently accessed by one conducting a search and using said seemingly true but incorrect information in solving a ermegency medical problem for patient X, something most disquieting to contemplate to say the least, right?

Houston, we have a problem.

In the year since ChatGPT was released to the public, researchers and experts have warned that the ease with which content can be created using generative AI tools could poison the well, creating a vicious circle where those tools produce content that is then used to train other AI models. 

That so-called “model collapse”—which would hollow out any “knowledge” accrued by the chatbots—appears to have come true.

Last week, X user Jax Winterbourne posted a screenshot showing that Grok, the large language model chatbot developed by Elon Musk’s xAI, had (presumably unintentionally) plagiarized a response from rival chatbot-maker OpenAI. When asked by Winterbourne to tinker with malware, Grok responded that it could not, “as it goes against OpenAI’s use case policy.”

“This is what happened when I tried to get it to modify some malware for a red team engagement,” Winterbourne explained in his post, suggesting that the response could be evidence that “Grok is literally just ripping OpenAI’s code base.”

Trust but verify.

Credit: Carol Yepes/Getty Images

In all the writing done by yours truly, correct links to sources are indispensable, something my loyal readers already know because trust, once lost, is forever gone. Now think about the lack of visible links or any kind of transparency when doing an AI search to connect to a given source. What's even worse is the fact tech companies running AI don't know how their AI's work as code has to write code in order to interact with the real world in real time via neural nets, the analog construct enabling AI to function.

Compounding the problem of inaccuracy is a comparative lack of transparency. Typically, search engines present users with their sources — a list of links — and leave them to decide what they trust. By contrast, it’s rarely known what data an LLM trained on — is it Encyclopaedia Britannica or a gossip blog?

“It’s completely untransparent how [AI-powered search] is going to work, which might have major implications if the language model misfires, hallucinates or spreads misinformation,” says Urman.

If search bots make enough errors, then, rather than increasing trust with their conversational ability, they have the potential to unseat users’ perceptions of search engines as impartial arbiters of truth, Urman says.

Any questions?

No comments: