Image-to-Image and Video Similarity Search System

Built query-image driven retrieval across both image and video collections. This project demonstrates practical execution from architecture and implementation to measurable delivery outcomes.

Personal ProjectsYear 2026

Project Overview

Objective

Built query-image driven retrieval across both image and video collections.

Stack

CLIP (ViT-B/32)FAISSBLIPFastAPIReactAxios

Delivery highlights

  • Built an end-to-end visual retrieval system that accepts a query image and performs cross-media similarity search across both images and videos. The system encodes the uploaded image into a semantic embedding using CLIP (ViT-B/32) in PyTorch and compares it against pre-indexed image files and extracted video keyframes stored in a FAISS innerproduct index. For videos, the system aggregates matched frame timestamps, merges temporally adjacent segments into consolidated intervals (start_time, end_time), and ranks results based on similarity scores. Integrated BLIP to automatically generate descriptive captions for the query image to improve interpretability and contextual understanding. The backend is implemented with FastAPI, providing endpoints for index construction and search execution, returning structured JSON responses containing media paths, similarity scores, and timestamp ranges. The frontend is developed using React and Axios, enabling configurable search parameters (top_k, similarity threshold, caption length) and interactive video playback with direct navigation to detected relevant segments
Back to Topic ProjectsBack to All Projects

Related Projects

3 items

Multimodal Semantic Retrieval (Video and Image Search)

Personal ProjectsYear: 2026

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.

Text-to-Image Semantic Search Module

Personal ProjectsYear: 2026

Built text-to-image semantic search using CLIP shared embedding space and FAISS indexing.

Multilingual Semantic Video Event and Action Search Engine and API

Personal ProjectsYear: 2026

Built FastAPI semantic search over videos with clip indexing and multilingual query support.