RAG-Powered Medical Intelligence

MediQuery
Powered by RAG

Next-generation clinical assistant providing source-grounded medical insights. Built for accuracy, reliability, and clinical safety.

92%
Accuracy
5%
Hallucination
153+
Test Cases
Medical-AI-Chatbot - Visual Studio Code
VS Code showing Medi-Query RAG chatbot implementation with Python and HTML code
Built with production-grade technologies
Python
Python
Flask
Flask
LangChain
LangChain
Pinecone
Groq
AWS
Docker
Docker
GitHub Actions
GitHub Actions

Technical Architecture

Built with production-grade RAG pipeline featuring hybrid retrieval, cross-encoder reranking, and cloud-native deployment on AWS infrastructure.

RAG-Powered Accuracy.
92% answer relevancy with source-grounded medical responses.

RAG-Powered Accuracy.

Hybrid Retrieval Pipeline
MMR + BM25 ensemble with cross-encoder reranking for precision.

Hybrid Retrieval Pipeline

AWS Cloud Deployment
Docker containers on EC2 with GitHub Actions CI/CD pipeline.

AWS Cloud Deployment

Vector Search with Pinecone
384-dim embeddings for semantic medical document retrieval.

Vector Search with Pinecone

Llama 3.3 70B Generation
Context-only LLM responses with 88.7% faithfulness score.

Llama 3.3 70B Generation

Comprehensive Testing Suite
153 tests covering unit, integration, security, and performance.

Comprehensive Testing Suite

Intelligent Medical Assistant

Beautiful, responsive interface with light and dark modes. Get evidence-based medical information with source citations.

Medi-Query Light Mode - Showing asthma causes conversation
Light Mode
Medi-Query Dark Mode - Showing asthma diagnosis and treatment conversation
Dark Mode
<5sResponse Time
92%Accuracy
YesSource Citations
2Themes
Open Source

Hybrid Retrieval Pipeline

Production-grade RAG architecture combining multiple retrieval strategies for maximum accuracy and relevance.

View on GitHub
01

Query Rewriting

Llama 3.1 8B resolves pronouns and expands medical terminology for precise retrieval.

# Pronoun Resolution
"What are its side effects?"
→ "What are the side effects of metformin?"
02

Hybrid Retrieval

MMR (60%) + BM25 (40%) ensemble combines semantic understanding with keyword matching.

ensemble_retriever = EnsembleRetriever(
  retrievers=[mmr_retriever, bm25_retriever],
  weights=[0.6, 0.4]
)
03

Cross-Encoder Reranking

ms-marco-MiniLM-L-6-v2 scores query-document pairs for final relevance ranking.

reranker = CrossEncoder(
  "cross-encoder/ms-marco-MiniLM-L-6-v2"
)
top_docs = reranker.rerank(query, docs, top_n=8)
04

Context-Only Generation

Llama 3.3 70B generates responses strictly grounded in retrieved context.

system_prompt = """Answer ONLY from the 
provided context. If information is not 
in the context, say 'I don't have enough 
information to answer that.'"""

Context-Aware Chunking

Optimized document segmentation preserves semantic coherence for medical content.

Chunk 1
Chunk 2
Chunk 3

100-token overlap ensures context continuity

Chunk Size
800 tokens
Optimal for medical paragraphs
Chunk Overlap
100 tokens
Prevents information loss at boundaries
Embedding Model
all-MiniLM-L6-v2
384-dimensional vectors
Vector Index
Pinecone
Serverless, high-performance

Session Memory

Maintains last 10 exchanges (20 messages) for pronoun resolution only.

max_session_messages: 12

Temperature Control

Set to 0 for deterministic, factual medical responses without creativity.

temperature: 0

Hallucination Prevention

Strict context-only mode reduces hallucinations from 35% to 5%.

hallucination_rate: 5%

System Architecture

Production-grade RAG pipeline with hybrid retrieval, cross-encoder reranking, and cloud-native deployment

CI/CD Pipeline Architecture

GitHub Actions with Docker, ECR, and EC2 deployment

CI/CD Pipeline Architecture showing Developer Workflow, GitHub Actions CI/CD, AWS Infrastructure with ECR and EC2, and External APIs including Groq and Pinecone

RAGAS Evaluation Metrics

92.33%Answer RelevancyRAGAS Score
88.78%FaithfulnessContext Adherence
90.00%Context RecallDocument Coverage
4.2sResponse TimeUnder 5s SLA
5%Hallucination RateDown from 35%
100%Test Pass Rate153 Tests

RAG Pipeline Flow

0.8s
1

Query Rewrite

Llama 3.1 8B resolves pronouns and expands queries

1.2s
2

Hybrid Retrieval

MMR + BM25 ensemble (0.6/0.4 weights)

0.3s
3

Cross-Encoder Rerank

ms-marco-MiniLM-L-6-v2 selects top-8 docs

2.1s
4

Response Generation

Llama 3.3 70B with context-only mode

Technology Stack

Flask/PythonBackend
LangChainOrchestration
PineconeVector DB
Groq APILLM Inference
HuggingFaceEmbeddings
AWS EC2Deployment
DockerContainers
GitHub ActionsCI/CD

Key Configuration Parameters

chunk_size800
chunk_overlap100
embedding_dim384
mmr_k10
mmr_fetch_k30
lambda_mult0.5
reranker_top_n8
max_session_msgs12
temperature0

Technical Documentation

Comprehensive documentation covering architecture, testing, deployment, and evaluation metrics

Software Requirements Specification

IEEE 830-1998 compliant SRS document covering functional and non-functional requirements, system architecture, and use cases.

27 Functional Requirements12 Non-Functional RequirementsIEEE Standard Format

RAG Architecture

Complete documentation of the Retrieval-Augmented Generation pipeline including document indexing, query processing, and response generation.

Document Indexing PipelineHybrid Retrieval SystemCross-Encoder Reranking

CI/CD Pipeline

GitHub Actions workflow with Docker containerization, Amazon ECR registry, and automated EC2 deployment.

Automated TestingDocker ContainerizationAWS Deployment

RAGAS Evaluation Report

Comprehensive evaluation using the RAGAS framework measuring faithfulness, relevancy, precision, and recall metrics.

92.33% Answer Relevancy88.78% Faithfulness90% Context Recall

Testing Documentation

Complete test suite documentation covering unit, integration, security, performance, and system workflow tests.

153 Total Tests100% Pass Rate9 Test Categories

Security Testing

Security vulnerability testing including XSS prevention, SQL injection protection, and API key security validation.

XSS ProtectionSQLi PreventionAPI Security

Project Highlights

Problem Solved

  • Information overload in medical queries reduced through curated retrieval
  • Hallucination rate reduced from 35% to 5% using RAG architecture
  • Context loss in conversations handled via pronoun resolution
  • Source attribution enables response verification

Key Achievements

  • 92% accuracy vs 65% for non-RAG systems (+27% improvement)
  • 100% SLA compliance with all queries under 5 seconds
  • Production-ready with Gunicorn, health endpoint, and Docker
  • Automated CI/CD with GitHub Actions and AWS deployment

Frequently Asked Questions

Everything you need to know about Medi-Query's RAG architecture and technical implementation

What is Medi-Query and how does it work?
How accurate is the RAG pipeline?
What technology stack powers Medi-Query?
How is the system deployed?
What testing coverage does Medi-Query have?
How does the conversation memory work?

Medical AI Reimagined

Experience evidence-based medical information powered by RAG technology. 92% accuracy, source-grounded responses, and production-ready deployment.