AI Document Search & Question Answering System (RAG) | Projects

Project Overview

Objective

Built a multimodal AI system for document, audio, and image understanding with natural language question answering using retrieval-augmented generation (RAG).

Stack

FastAPINext.jsPyPDFLoaderSentenceTransformers (bge-m3)QdrantElasticsearchBLIPTyphoon OCRWhisperGPT-4o-miniGPT-4.1GPT-5

Delivery highlights

Developed an advanced multimodal AI platform by extending and integrating multiple existing systems including AI Document Question Answering System with RAG and LLM, Multimodal Semantic Search, AI Meeting Transcription & Q&A, and Text-to-Image Semantic Search into a unified architecture that enables cross-modal retrieval and context-aware reasoning across documents, audio, and images, designing and implementing RESTful APIs using FastAPI for file upload, background processing, indexing, and question-answering workflows, using PyMuPDF for document parsing, Typhoon OCR API for extracting text from images and scanned PDFs, Whisper for speech-to-text transcription from audio, and BLIP for image captioning, while applying text chunking and generating semantic embeddings with SentenceTransformers (BAAI/bge-m3) stored in Qdrant for vector similarity search, combined with Elasticsearch for keyword-based retrieval to implement a hybrid search system that improves retrieval accuracy and reduces hallucination, and leveraging selectable Large Language Models (GPT-4o-mini, GPT-4.1, GPT-5) via LangChain to generate context-grounded answers with source attribution, supported by scalable backend services with background job processing and persistent storage, and a modern frontend built with Next.js for file upload, semantic search, and interactive knowledge exploration across multiple data sources.

Back to Topic Projects Back to All Projects

Project Videos

1 items

Demo Video

Watch on source

Related Projects

3 items

AI Document Question Answering System with RAG and LLM

Personal ProjectsYear: 2026

Built PDF upload and natural language QA system with retrieval-augmented generation.

Visual Question Answering System with YOLO, CLIP, ViT, BLIP, BLIP Caption, and LLM

Personal ProjectsYear: 2026

Built end-to-end VQA platform for image upload, scene understanding, and LLM-based answers.

AI Meeting Transcription, Summarization & Q&A System (RAG + LLM):

Personal ProjectsYear: 2026

Built an end-to-end system for meeting transcription, summarization, and context-aware Q&A using Whisper, Qdrant, and LLMs, with FastAPI + React for real-time processing and interactive querying.

Project Overview

Objective

Built a multimodal AI system for document, audio, and image understanding with natural language question answering using retrieval-augmented generation (RAG).

Stack

FastAPINext.jsPyPDFLoaderSentenceTransformers (bge-m3)QdrantElasticsearchBLIPTyphoon OCRWhisperGPT-4o-miniGPT-4.1GPT-5

Delivery highlights

Developed an advanced multimodal AI platform by extending and integrating multiple existing systems including AI Document Question Answering System with RAG and LLM, Multimodal Semantic Search, AI Meeting Transcription & Q&A, and Text-to-Image Semantic Search into a unified architecture that enables cross-modal retrieval and context-aware reasoning across documents, audio, and images, designing and implementing RESTful APIs using FastAPI for file upload, background processing, indexing, and question-answering workflows, using PyMuPDF for document parsing, Typhoon OCR API for extracting text from images and scanned PDFs, Whisper for speech-to-text transcription from audio, and BLIP for image captioning, while applying text chunking and generating semantic embeddings with SentenceTransformers (BAAI/bge-m3) stored in Qdrant for vector similarity search, combined with Elasticsearch for keyword-based retrieval to implement a hybrid search system that improves retrieval accuracy and reduces hallucination, and leveraging selectable Large Language Models (GPT-4o-mini, GPT-4.1, GPT-5) via LangChain to generate context-grounded answers with source attribution, supported by scalable backend services with background job processing and persistent storage, and a modern frontend built with Next.js for file upload, semantic search, and interactive knowledge exploration across multiple data sources.

Back to Topic Projects Back to All Projects

Project Videos

1 items

Demo Video

Watch on source

Related Projects

3 items