Multimodal Semantic Search Chat System (FastAPI, Qdrant, CLIP, BLIP,Typhoon OCR):

Built a multimodal chat system leveraging vector embeddings and cross-modal retrieval for semantic search across text and images with multilingual support. This project demonstrates practical execution from architecture and implementation to measurable delivery outcomes.

Personal ProjectsYear 2026

Project Overview

Objective

Built a multimodal chat system leveraging vector embeddings and cross-modal retrieval for semantic search across text and images with multilingual support.

Stack

FastAPINext.jsPostgreSQLQdrantSentenceTransformerCLIPBLIPGoogleTranslatorTyphoon OCR

Delivery highlights

Developed a multimodal semantic search chat system enabling cross-modal retrieval across text and images using a dual-database architecture (PostgreSQL for structured chat data and Qdrant for vector similarity search). Leveraged SentenceTransformers for multilingual text embeddings and CLIP for unified image–text representation, enhancing image understanding through BLIP-based captioning and Typhoon OCR for text extraction with translation. Designed a hybrid search pipeline combining semantic similarity from both text and image modalities with ranking and filtering to improve retrieval relevance and accuracy. Built scalable backend services using FastAPI and integrated with a Next.js frontend to support real-time chat interaction and efficient semantic search.

Back to Topic Projects Back to All Projects