Tue Mar 26 / Preetam Joshi

From Wordy to Worthy: Increasing Textual Precision in LLMs

Detectors to check for completeness and conciseness of LLM outputs.

Completeness and Conciseness Image.

Overview

We are excited to announce version 0.1.0 of aimon-python-sdk. This release contains completeness and conciseness detectors for textual outputs of LLMs. These new detectors can either be used independently or in combination with our previously announced hallucination detector. We have updated our simple client to be able to configure these detectors for both batch and point inference.

Completeness

Completeness in an LLM output is vital because it ensures that the information provided is comprehensive and answers the user’s query in full. When an LLM’s response is incomplete, it can lead to misunderstandings or partial insights, which may necessitate further queries or even result in incorrect actions based on insufficient information.This is particularly critical in complex domains where omissions could have significant consequences, such as legal advice, technical support, or healthcare guidance. Moreover, completeness is a benchmark of an LLM’s ability to understand and process complex requests, acting as an important measure of its reliability.

Here’s an example showing how an incomplete answer can affect customer satisfaction and how Aimon Rely can help detect this issue:

Completeness Example

LLM giving incomplete answers to a customer

Completeness Aimon Response

Aimon Rely detecting whether the answer to the first question is complete.

Conciseness

A concise response reduces cognitive load, aiding in better comprehension and retention of information. When an LLM provides a concise answer, it strips away the extraneous details and focuses on delivering the core message, making the information more digestible and easier to understand. This is particularly beneficial when users seek quick, actionable answers or when they are interfacing with LLMs on platforms with limited display areas, like mobile devices or smartwatches. Conciseness enhances user experience by providing clear, to-the-point answers that fulfill users’ needs without overwhelming them with superfluous information. Example:

Completeness Example

LLM adding a lot o un-necessary information for the customer’s query

Conciseness Aimon Response

Aimon Rely detecting that the output is verbose with explanations at the sentence level

In this example, the chatbot’s answer, while informative, is far from concise, offering an explanation of weather patterns and additional context that the customer did not request. This could lead to frustration for a user looking for a quick, straightforward answer.

Using a few lines of code, a developer can check for completeness and conciseness issues in an offline evaluation run. These checks can also be performed in real time for applications that are tolerant to a slight increase in latency. Unlike our hallucination detector, the conciseness and completeness detectors are a bit more compute heavy at this point and hence the higher latency.

Our proprietary technology allows us to compute the completeness and conciseness quality metrics at 1/10th of GPT-4’s cost and 2x improvement in latency (avg. latency of 0.92s [Aimon Rely] v/s 1.8s [GPT-4]). On our internal datasets we see good performance metrics against expert annotated data. Since there are a lack of industry standard benchmark datasets for completeness and conciseness tasks, we will be publishing our evaluation datasets. After this, we will also publish our detailed benchmark metrics. Stay tuned for these updates.

About AIMon

We offer specialized tools for monitoring and improving the output quality of large language models (LLMs).

Leveraging proprietary technology, we help mitigate issues like hallucinations, adherence to instructions, and conciseness.

Additionally, we provide solutions for optimizing Retrieval-Augmented Generation (RAG) retrieval and indexing.

These tools are accessible through our APIs and SDKs, enabling offline analysis, real-time monitoring and in-line detection of LLM quality issues.