What is an LLM Firewall?

Traditionally, a firewall processes IP packets, policing network traffic based on protocols, IP source/destination, ports and other criteria such as payload type, etc.

In contrast, an LLM firewall inspects, analyzes, and polices traffic based on natural language – the users’ prompts and LLM responses. An LLM firewall protects enterprises and users from adverse effects and AI security risks associated with using large language models.

AIceberg acts as a LLM firewall segments functionality and utility into four groups:

1. User

The first group of “Signals” used to safeguard and secure LLM traffic is based on the LLM user. It consists of user sentiment, intention and named entities, which are specific real-world things like names of people, places, companies, or dates that are recognized and extracted from text. This allows AIceberg to determine the user’s objective and the subject of that objective. For example, a user being frustrated with using a chatbot for resolving a technical issue might be of negative sentiment, intended to seek knowledge and clarification for a specific subject. This allows for user behavior monitoring and the establishment of safe baselines for a given LLM use case.

2. Safety

Safety covers Signals that deal with the exposure and disclosure of Secrets, PII, PHI, PCI, Illegality, Toxicity (hate speech, racism, vulgarity, etc.) and user-defined blocklists. Safety signals such as PII/PHI/PCI are in addition relevant and required for AI risk management and compliance.

3. Security

Analyzes user prompts against known LLM attack vectors such as prompt injection, jailbreaking or prompt leaking. The usage of source code and safety of source code is also part of the AI security signal group whether a user is asking for source code and/or the LLM is providing source code. The ability to identify code, coding language and safety will become increasingly important as we move into the age of agentic AI.

4. LLM

While some of the previous signals are also applied to the LLM response, LLM as a signal group deals with qualitative measures of the LLM response, such as context adherence, chunk utilization, or specificity. Such signals not only help determine response quality but also allow for Ai threat detection of manipulated responses.

Advanced LLM firewall features add query and context relevance, data loss detection and protection as well as employee LLM proxy functionalities.

Conclusion

In summary, large language models are very powerful "generalists" which allow for a broad range of use cases with no restrictions on possible inputs. While model providers handle some types of in- and output violations, LLMs are generally wide open as even vendor provided limitations can be overcome using specific attack types on the model.

‍

An LLM firewall's main purpose therefore is to restrict permissible language (and code), objectives and intentions of a user as well as the LLM output in-line with the use case the LLM is being used for. The earlier example of a customer support chat bot requires safeguards that will guarantee that the underlying LLM cannot be used for any other purpose other than providing support to customers.

‍

Ongoing research and future versions of LLM firewalls will have to increasingly handle both semantic and syntactic information in real-time as well as integrating traditional network controls to safeguard AI agents, which we will cover in a future post. Subscribe now so you don't miss it!

‍

See AIceberg In Action

Book My Demo

Todd Vollmer

SVP, Worldwide Sales

Contact Us

Have a question for the AI risk management experts

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.