Project Overview
Objective
Built text-to-video semantic scene retrieval with multilingual query processing.
Stack
OpenCVCLIP (ViT-B/32)FAISSFastAPIReact.jsTailwind CSS
Delivery highlights
- Built an end-to-end text-to-image semantic search system that enables users to search for images using natural language instead of keyword matching by leveraging CLIP (ViT-B/32) to encode both text and images into a shared embedding space and indexing precomputed image embeddings with FAISS for efficient similarity search. Developed a FastAPI backend to process queries and return matched image URLs with similarity scores, along with a React.js frontend for real-time search and dynamic result visualization.