Resources

Why We Do Not Use LLMs in AI Threat Detection

Advancements in machine learning, deep learning, and, in particular, generative AI are making transparency, interpretability and explainability an increasing critical requirement for responsible AI practices. Our trust in a machine’s reasoning and abilities to control and mitigate any risks arising from autonomous machines is an essential gate function for rapid continuation of AI innovation.

Challenges with using an LLM to police another LLM:

Siloed Understanding of LLM Explainability

Large language models are not explainable, and interpretability is still very limited while only being available to the developers of large language models themselves. For example, Anthropic published some first results of LLM interpretability in May of 2024.

AI models geared towards controlling other AI models’ risks, therefore cannot be of the “same order” in sense of capabilities and explainability. In other words, we should not use an unexplainable model to police and secure another unexplainable model.


An effective, transparent and auditable control of large language models must be based on an ensemble of simpler, smaller and highly specialized models that each cover a specific aspect or vector of the model under observation while being explainable and interpretable on their own.

If we cannot explain large language models to a high degree of certainty, then we must focus on the input to, and output from large language models, segment problem and scope and use specialized, explainable models for each such segments.

The sum and explainability of all non-gen-AI models involved in assessing prompts and responses (LLM in- and output) do not provide us with explainability of the LLM itself but provide us with reliable and reproducible guardrails and controls that help us mitigate and eliminate LLM based risks.

Speed and Cost

The other critical aspect of AI security is speed and cost. 
LLM based analysis of another LLMs input and output burdens the end-user experience with notable and often unacceptable latency, diminishing the value of an LLM use case. Scaling LLM based assessments to support real-time policy enforcement is expensive making ROI for LLM use cases harder to achieve. We consider latency added by an assessment function of more than 2 seconds (1 second for the prompt and 1 second for the response) as end-user impacting.

Conclusion

In summary, the emergence of generative AI has dropped the entry barrier for using artificial intelligence significantly, making new capabilities available for any kind and size of business. Ineffective, slow, and expensive solutions to safeguard and secure such new capabilities will not allow for unlocking new incremental value.

AIceberg's approach to using non-generative models that are traceable, auditable, and explainable allows humans to put in place guardrails that monitor and protect users and the organization. AIceberg's infrastructure allows for resource-poor (CPUs versus GPUs) processing that saves cost and provides little to no latency in user experience. Book a demo today to see how!

See AIceberg In Action

Book My Demo

Todd Vollmer
SVP, Worldwide Sales

Contact Us

Have a question for the AI risk management experts

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.