Tue Mar 26 / Preetam Joshi
Detectors to check for completeness and conciseness of LLM outputs.
We are excited to announce version 0.1.0 of aimon-python-sdk. This release contains completeness and conciseness detectors for textual outputs of LLMs. These new detectors can either be used independently or in combination with our previously announced hallucination detector. We have updated our simple client to be able to configure these detectors for both batch and point inference.
Completeness in an LLM output is vital because it ensures that the information provided is comprehensive and answers the user’s query in full. When an LLM’s response is incomplete, it can lead to misunderstandings or partial insights, which may necessitate further queries or even result in incorrect actions based on insufficient information.This is particularly critical in complex domains where omissions could have significant consequences, such as legal advice, technical support, or healthcare guidance. Moreover, completeness is a benchmark of an LLM’s ability to understand and process complex requests, acting as an important measure of its reliability.
Here’s an example showing how an incomplete answer can affect customer satisfaction and how Aimon Rely can help detect this issue:
LLM giving incomplete answers to a customer
Aimon Rely detecting whether the answer to the first question is complete.
A concise response reduces cognitive load, aiding in better comprehension and retention of information. When an LLM provides a concise answer, it strips away the extraneous details and focuses on delivering the core message, making the information more digestible and easier to understand. This is particularly beneficial when users seek quick, actionable answers or when they are interfacing with LLMs on platforms with limited display areas, like mobile devices or smartwatches. Conciseness enhances user experience by providing clear, to-the-point answers that fulfill users’ needs without overwhelming them with superfluous information. Example:
LLM adding a lot o un-necessary information for the customer’s query
Aimon Rely detecting that the output is verbose with explanations at the sentence level
In this example, the chatbot’s answer, while informative, is far from concise, offering an explanation of weather patterns and additional context that the customer did not request. This could lead to frustration for a user looking for a quick, straightforward answer.
Using a few lines of code, a developer can check for completeness and conciseness issues in an offline evaluation run. These checks can also be performed in real time for applications that are tolerant to a slight increase in latency. Unlike our hallucination detector, the conciseness and completeness detectors are a bit more compute heavy at this point and hence the higher latency.
Our proprietary technology allows us to compute the completeness and conciseness quality metrics at 1/10th of GPT-4’s cost and 2x improvement in latency (avg. latency of 0.92s [Aimon Rely] v/s 1.8s [GPT-4]). On our internal datasets we see good performance metrics against expert annotated data. Since there are a lack of industry standard benchmark datasets for completeness and conciseness tasks, we will be publishing our evaluation datasets. After this, we will also publish our detailed benchmark metrics. Stay tuned for these updates.
AIMon helps you build more deterministic Generative AI Apps. It offers specialized tools for monitoring and improving the quality of outputs from large language models (LLMs). Leveraging proprietary technology, AIMon identifies and helps mitigate issues like hallucinations, instruction deviation, and RAG retrieval problems. These tools are accessible through APIs and SDKs, enabling offline analysis real-time monitoring of LLM quality issues.