A unique aspect of LLMs is that they are not human interpretable. We cannot predict their behavior nor can we understand their function from the abstract collection of numbers that define a model. So, an LLM that is malware is not detectable by traditional antivirus or malware scanners. They have no signature. This malevolent idea is not mine but infected me from a recent paper from Swiss researchers.
https://openreview.net/pdf?id=d8CAXiITZ3
Obviously this was not their epiphany either as is evident from their bibliography. It doesnāt matter who thought of it, all humans have the potential to harbor evil thoughts. Just as we cannot blame Einstein for for telling Roosevelt to twist nuclear research towards a weapon (some bonehead would have eventually thought of it, itās just that Einsteinās genius let him imagine things more quickly and clearly than mere mortals), we cannot point a finger at Geoff Hinton, Sutskever or the grad students who iterated on self attention for developing effective architectures that make LLMs amazing.
Obviously, the hyperscalars would not train their models towards these behaviors. (They would become the targets of their own creations since the LLMs would have access to their own data warehouses). However, the public domain is another story. Quantization and distilling and post training are becoming tools to optimize LLMs to run on consumer hardware. Yes, A single threadripper CPU, a bunch of ram and a used 3090 can run quantized versions of deepseek and kimi k2. What happens when an enthusiastic young engineer at Lockheed downloads a model from hugging face which has been trained by an unknown sumdumfuk, a foreign security analyst in another nation. He is perhaps looking at cavitation occurring in airfoils at hypersonic velocities and asks his model questions about the equations or papers he is writing, which are classified.
so you might say this model cannot exfil information because all the ports are blocked by our super firewall. But it can wait until one day that laptop is opened in starbuckās and all that model has to do is bury the data in the text field of a dns server.
This does not on the face of it seem suspicious but it is no different than an electronic dead drop. Tools of espionage thave been used for decades and are proven effective.
Or maybe the model gets access to the the intranet of a raytheon employee and decides to erase all the data and the backups
Even if we look at ways to restrict AI to āonly doing goodā, we can never be sure. LLMs have already demonstrated their ability to hallucinate data and even purposely deceive.
Even post training an AI by humans to ensure good behavior can be problematic. Imagine an intern who downloads a model to help with medical decisions. A patient might end up with an amputation instead of debridement as the amputation would be more cost effective.
Itās not like there is a heuristic that will allow an antivirus to identify an evil LLM. The emergent behavior that will be triggered by a specific set of circumstances is unpredictable and therefore unpreventable.
So at this point, we need some LLM white hats on a red team to make this weapon so we can train a GAN to recognize it. This ever present watchdog (which could be named the Angleton model after its human namesake who was the archetype of counterintelligence in the cold war) would likely be a source for infinite funding. Every government would want one. It would obviously serve the common good by keeping evil in check and would make its designers fabulously rich.
We are still in the wild west phase of LLM research. Anyone with a high end machine can participate. Imagine if GM, Ford, Westinghouse and Lockheed were given the tools to develop nuclear devices , even for āgoodā purposes, after the Trinity test. Indeed, the army was trying to push āatoms for peaceā for a while. Or suppose the government was running a huge deficit back then and could not afford to spend on nuclear research so they outsourced development to the private sector. Nuclear automobiles, nuclear rockets, nuclear refrigerators, nuclear generators. Tritium was used to make watch hands glow in the dark. I had a little brush with polonium that was sold as an antistatic device- you would brush your negatives and the emitted alpha particles would supposedly neutralize the negatively charged dust that was messing up your images.(back in the day, images were made by exposing chemicals on a piece of plastic to photons and the plastic film was called a negative). We are still trying to put the nuclear genie into the bottle but 7 nations now have the ability to kill millions of humans in a few seconds.
No amount of human regulation will prevent this LLM genie from behaving badly. All the baloney being made over this in congress is irrelevant. When I hear Musk saying that AI needs to be regulated, I imagine he is just trying to hamstring his competitors. The only way to contain LLM malware is by having AI trained to identify it. Right now the big four are using RLHF (recurrent learning by human feedback) to add āguardrailsā. The same post training can amplify bad behaviors.
The game has already begun.