Tue Mar 11 /
Customer | An Agentic AI assistant automating customer support for technical products |
---|---|
Industry | Customer Experience |
Primary Adopter | CTO, Software Engineering |
Duckie AI is an Agentic AI assistant that automates customer support for B2B SaaS companies. It quickly finds relevant information, generates solutions, and conducts technical investigations, leading to faster resolution times, increased productivity, and improved customer satisfaction for its customers.
Being an AI-first company, the accuracy and relevance of their AI systems was of the highest importance. To get insights into the output of their LLM and RAG systems, they implemented a popular OSS LLM Evaluation framework that internally used LLM judges to score their LLM outputs for metrics such as contextual hallucination and RAG relevance scores.
However, they realized over time that the scores were inconsistent and showed a high degree of variance. This meant drawing the line between good and bad became harder and required consistent time investment.
AIMon provides judging as a service. The platform allows Duckie to choose from a variety of judges they need to evaluate a variety of use cases, either offline or online. These run in parallel and serve back results together. AIMon’s advanced hallucination detection and instruction adherence solutions help uphold accuracy.
After deploying AIMon, the company experienced major improvements in AI performance:
✅50% lower cost than using LLMs as judges while achieving lower latencies.
✅Ability to evaluate for hallucinations and instruction deviations offline and continuously.
By implementing AIMon’s specialized evaluation tools, Duckie has addressed critical challenges in their AI-powered customer support system. The transition to AIMon’s judging-as-a-service platform with HDM-1 model for hallucination detection has eliminated inconsistency issues they faced with traditional LLM judges, allowing them to clearly distinguish between acceptable and unacceptable AI outputs.
The integration has delivered substantial business benefits, including a 50% reduction in evaluation costs compared to previous methods. With an enhanced ability to guardrail against hallucinations and instruction deviations both in real-time and offline, Duckie has significantly improved system reliability. As they prepare to implement AIMon’s RAG Evaluation and Reranking model, they’re well-positioned to further optimize their context retrieval processes and continue delivering exceptional customer experiences.
AIMon helps you build more deterministic Generative AI Apps. It offers specialized tools for monitoring and improving the quality of outputs from large language models (LLMs). Leveraging proprietary technology, AIMon identifies and helps mitigate issues like hallucinations, instruction deviation, and RAG retrieval problems. These tools are accessible through APIs and SDKs, enabling offline analysis real-time monitoring of LLM quality issues.