Tue Mar 11
Customer | An Agentic AI assistant automating customer support for technical products |
---|---|
Industry | Customer Experience |
Primary Adopter | CTO, Software Engineering |
Duckie AI is an Agentic AI assistant that automates customer support for B2B SaaS companies. It quickly finds relevant information, generates solutions, and conducts technical investigations, leading to faster resolution times, increased productivity, and improved customer satisfaction for its customers.
Being an AI-first company, the accuracy and relevance of their AI systems was of the highest importance. To get insights into the output of their LLM and RAG systems, they implemented a popular OSS LLM Evaluation framework that internally used LLM judges to score their LLM outputs for metrics such as contextual hallucination and RAG relevance scores.
However, they realized over time that the scores were inconsistent and showed a high degree of variance. This meant drawing the line between good and bad became harder and required consistent time investment.
AIMon provides judging as a service. The platform allows Duckie to choose from a variety of judges they need to evaluate a variety of use cases, either offline or online. These run in parallel and serve back results together. AIMon’s advanced hallucination detection and instruction adherence solutions help uphold accuracy.
After deploying AIMon, the company experienced major improvements in AI performance:
✅50% lower cost than using LLMs as judges while achieving lower latencies.
✅Ability to evaluate for hallucinations and instruction deviations offline and continuously.
By implementing AIMon’s specialized evaluation tools, Duckie has addressed critical challenges in their AI-powered customer support system. The transition to AIMon’s judging-as-a-service platform with HDM-1 model for hallucination detection has eliminated inconsistency issues they faced with traditional LLM judges, allowing them to clearly distinguish between acceptable and unacceptable AI outputs.
The integration has delivered substantial business benefits, including a 50% reduction in evaluation costs compared to previous methods. With an enhanced ability to guardrail against hallucinations and instruction deviations both in real-time and offline, Duckie has significantly improved system reliability. As they prepare to implement AIMon’s RAG Evaluation and Reranking model, they’re well-positioned to further optimize their context retrieval processes and continue delivering exceptional customer experiences.
Backed by Bessemer Venture Partners, Tidal Ventures, and other notable angel investors, AIMon is the one platform enterprises need to drive success with AI. We help you build, deploy, and use AI applications with trust and confidence, serving customers from fast-moving startups to Fortune 200 companies.
Our benchmark-leading ML models support over 20 metrics out of the box and let you build custom metrics using plain English guidelines. With coverage spanning output quality, adversarial robustness, safety, data quality, and business-specific custom metrics, you can apply any metric as a low-latency guardrail, for continuous monitoring, or in offline evaluations.
Finally, we offer tools to help you iteratively improve your AI, including capabilities for bespoke evaluation and training dataset creation, fine-tuning, and reranking.