EvidenceBot: A Privacy-Preserving, Customizable RAG-Based Tool for Enhancing Large Language Model Interactions

# Description

This paper introduces EvidenceBot, a local Retrieval-Augmented Generation (RAG)-based tool designed to overcome critical limitations of commercial LLM platforms, such as privacy risks, context size restrictions, limited parameter configurability, and lack of evaluation capabilities. EvidenceBot enables secure, flexible, and efficient LLM interactions by extracting and appending only the most relevant text chunks from large document sets, ensuring privacy while preserving accuracy. The tool supports hyperparameter experimentation, empowering users to tailor LLM outputs to their needs, and integrates evaluation metrics to compare model responses against ground truths. EvidenceBot’s architecture is built on LangChain and ChromaDB for semantic indexing, Ollama for model deployment, and a Streamlit-based dashboard for interactivity. By enabling privacy-preserving RAG, parameter experimentation, and robust evaluation, EvidenceBot provides individuals and organizations with a practical and customizable resource for leveraging LLMs effectively.

# Findings

We find that EvidenceBot successfully bridges critical gaps in commercial and open-source LLM usage by offering a privacy-preserving and customizable RAG pipeline. We show that users can securely query large document sets locally, configure chunk sizes, and control the number of retrieved contexts to balance efficiency with accuracy. We demonstrate that EvidenceBot allows systematic experimentation with key model parameters such as temperature and RAG-related settings, helping users optimize responses for their specific needs. We further find that EvidenceBot uniquely integrates evaluation capabilities, enabling users to benchmark LLM responses against ground truths using metrics such as BLEU-4, ROUGE-L, BERTScore, and METEOR, both in chat and batch modes. We illustrate use cases where the tool provides not only privacy-preserving document analysis but also experimentation with response generation and large-scale model evaluation. We conclude that EvidenceBot addresses a critical absence of integrated open-source solutions by combining RAG, hyperparameter optimization, and evaluation in one system, thus making it a versatile platform for research, professional, and domain-specific applications. We acknowledge limitations in its current design, including text-only processing and reliance on Ollama for GGUF-format models, and propose extending its capabilities to multimodal data and diverse model formats as future directions.