MEDIQUERYAI Assistant

RAG-Powered Medical Intelligence

MediQuery
Powered by RAG

Next-generation clinical assistant providing source-grounded medical insights. Built for accuracy, reliability, and clinical safety.

92%

Accuracy

Hallucination

153+

Test Cases

Hybrid Retrieval

MMR + BM25

92% Accuracy

RAGAS Evaluated

Pinecone Vector DB

384-dim embeddings

Medical-AI-Chatbot - Visual Studio Code

VS Code showing Medi-Query RAG chatbot implementation with Python and HTML code

Built with production-grade technologies

Python

Flask

LangChain

Pinecone

Groq

AWS

Docker

GitHub Actions

Technical Architecture

Built with production-grade RAG pipeline featuring hybrid retrieval, cross-encoder reranking, and cloud-native deployment on AWS infrastructure.

RAG-Powered Accuracy.
92% answer relevancy with source-grounded medical responses.

Hybrid Retrieval Pipeline
MMR + BM25 ensemble with cross-encoder reranking for precision.

AWS Cloud Deployment
Docker containers on EC2 with GitHub Actions CI/CD pipeline.

Vector Search with Pinecone
384-dim embeddings for semantic medical document retrieval.

Llama 3.3 70B Generation
Context-only LLM responses with 88.7% faithfulness score.

Comprehensive Testing Suite
153 tests covering unit, integration, security, and performance.

Intelligent Medical Assistant

Beautiful, responsive interface with light and dark modes. Get evidence-based medical information with source citations.

Light Mode

Dark Mode

<5sResponse Time

92%Accuracy

YesSource Citations

2Themes

Open Source

Hybrid Retrieval Pipeline

Production-grade RAG architecture combining multiple retrieval strategies for maximum accuracy and relevance.

View on GitHub

Query Rewriting

Llama 3.1 8B resolves pronouns and expands medical terminology for precise retrieval.

# Pronoun Resolution
"What are its side effects?"
→ "What are the side effects of metformin?"

Hybrid Retrieval

MMR (60%) + BM25 (40%) ensemble combines semantic understanding with keyword matching.

ensemble_retriever = EnsembleRetriever(
  retrievers=[mmr_retriever, bm25_retriever],
  weights=[0.6, 0.4]
)

Cross-Encoder Reranking

ms-marco-MiniLM-L-6-v2 scores query-document pairs for final relevance ranking.

reranker = CrossEncoder(
  "cross-encoder/ms-marco-MiniLM-L-6-v2"
)
top_docs = reranker.rerank(query, docs, top_n=8)

Context-Only Generation

Llama 3.3 70B generates responses strictly grounded in retrieved context.

system_prompt = """Answer ONLY from the 
provided context. If information is not 
in the context, say 'I don't have enough 
information to answer that.'"""

Context-Aware Chunking

Optimized document segmentation preserves semantic coherence for medical content.

Chunk 1

Chunk 2

Chunk 3

100-token overlap ensures context continuity

Chunk Size

800 tokens

Optimal for medical paragraphs

Chunk Overlap

100 tokens

Prevents information loss at boundaries

Embedding Model

all-MiniLM-L6-v2

384-dimensional vectors

Vector Index

Pinecone

Serverless, high-performance

Session Memory

Maintains last 10 exchanges (20 messages) for pronoun resolution only.

max_session_messages: 12

Temperature Control

Set to 0 for deterministic, factual medical responses without creativity.

temperature: 0

Hallucination Prevention

Strict context-only mode reduces hallucinations from 35% to 5%.

hallucination_rate: 5%

System Architecture

Production-grade RAG pipeline with hybrid retrieval, cross-encoder reranking, and cloud-native deployment

CI/CD Pipeline Architecture

GitHub Actions with Docker, ECR, and EC2 deployment

RAGAS Evaluation Metrics

92.33%Answer RelevancyRAGAS Score

88.78%FaithfulnessContext Adherence

90.00%Context RecallDocument Coverage

4.2sResponse TimeUnder 5s SLA

5%Hallucination RateDown from 35%

100%Test Pass Rate153 Tests

RAG Pipeline Flow

0.8s

Query Rewrite

Llama 3.1 8B resolves pronouns and expands queries

1.2s

Hybrid Retrieval

MMR + BM25 ensemble (0.6/0.4 weights)

0.3s

Cross-Encoder Rerank

ms-marco-MiniLM-L-6-v2 selects top-8 docs

2.1s

Response Generation

Llama 3.3 70B with context-only mode

Technology Stack

Flask/PythonBackend

LangChainOrchestration

PineconeVector DB

Groq APILLM Inference

HuggingFaceEmbeddings

AWS EC2Deployment

DockerContainers

GitHub ActionsCI/CD

Key Configuration Parameters

chunk_size800

chunk_overlap100

embedding_dim384

mmr_k10

mmr_fetch_k30

lambda_mult0.5

reranker_top_n8

max_session_msgs12

temperature0

Technical Documentation

Comprehensive documentation covering architecture, testing, deployment, and evaluation metrics

Software Requirements Specification

IEEE 830-1998 compliant SRS document covering functional and non-functional requirements, system architecture, and use cases.

27 Functional Requirements12 Non-Functional RequirementsIEEE Standard Format

RAG Architecture

Complete documentation of the Retrieval-Augmented Generation pipeline including document indexing, query processing, and response generation.

Document Indexing PipelineHybrid Retrieval SystemCross-Encoder Reranking

CI/CD Pipeline

GitHub Actions workflow with Docker containerization, Amazon ECR registry, and automated EC2 deployment.

Automated TestingDocker ContainerizationAWS Deployment

RAGAS Evaluation Report

Comprehensive evaluation using the RAGAS framework measuring faithfulness, relevancy, precision, and recall metrics.

92.33% Answer Relevancy88.78% Faithfulness90% Context Recall

Testing Documentation

Complete test suite documentation covering unit, integration, security, performance, and system workflow tests.

153 Total Tests100% Pass Rate9 Test Categories

Security Testing

Security vulnerability testing including XSS prevention, SQL injection protection, and API key security validation.

XSS ProtectionSQLi PreventionAPI Security

Project Highlights

Problem Solved

•Information overload in medical queries reduced through curated retrieval
•Hallucination rate reduced from 35% to 5% using RAG architecture
•Context loss in conversations handled via pronoun resolution
•Source attribution enables response verification

Key Achievements

•92% accuracy vs 65% for non-RAG systems (+27% improvement)
•100% SLA compliance with all queries under 5 seconds
•Production-ready with Gunicorn, health endpoint, and Docker
•Automated CI/CD with GitHub Actions and AWS deployment

Frequently Asked Questions

Everything you need to know about Medi-Query's RAG architecture and technical implementation

What is Medi-Query and how does it work?

How accurate is the RAG pipeline?

What technology stack powers Medi-Query?

How is the system deployed?

What testing coverage does Medi-Query have?

How does the conversation memory work?

Medical AI Reimagined

Experience evidence-based medical information powered by RAG technology. 92% accuracy, source-grounded responses, and production-ready deployment.

MediQuery Powered by RAG

Technical Architecture

Intelligent Medical Assistant

Hybrid Retrieval Pipeline

Query Rewriting

Hybrid Retrieval

Cross-Encoder Reranking

Context-Only Generation

Context-Aware Chunking

Session Memory

Temperature Control

Hallucination Prevention

System Architecture

CI/CD Pipeline Architecture

RAGAS Evaluation Metrics

RAG Pipeline Flow

Query Rewrite

Hybrid Retrieval

Cross-Encoder Rerank

Response Generation

Technology Stack

Key Configuration Parameters

Technical Documentation

Software Requirements Specification

RAG Architecture

CI/CD Pipeline

RAGAS Evaluation Report

Testing Documentation

Security Testing

Project Highlights

Problem Solved

Key Achievements

Frequently Asked Questions

Medical AI Reimagined

MediQuery
Powered by RAG