AI Video Search and Scene Analysis System | Projects

Project Overview

Objective

Developed an AI-powered video search and analysis system with scene detection and summarization capabilities.

Stack

FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISSYOLOv8Next.jsGPT-5GPT-4o-miniGPT-4.1

Delivery highlights

Developed a video analysis application that allows users to upload videos and automatically analyze their content. The system extracts frames from videos, detects objects using YOLOv8, and generates visual embeddings with CLIP to represent each scene. These embeddings are stored in a FAISS vector index, enabling fast search and retrieval of relevant scenes based on objects or visual similarity. Built backend services with FastAPI for video upload, timeline generation, frame preview, and scene navigation, and integrated LLMs (GPT-5, GPT-4o-mini, GPT-4.1) to automatically summarize video timelines. A Next.js interface was developed to let users easily search scenes, preview frames, and jump directly to important timestamps in the video.

1 items

Demo Video

3 items

Personal ProjectsYear: 2026

Built FastAPI semantic search over videos with clip indexing and multilingual query support.

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Personal ProjectsYear: 2026

Built query-image driven retrieval across both image and video collections.

Objective

Developed an AI-powered video search and analysis system with scene detection and summarization capabilities.

Stack

FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISSYOLOv8Next.jsGPT-5GPT-4o-miniGPT-4.1

Delivery highlights

Developed a video analysis application that allows users to upload videos and automatically analyze their content. The system extracts frames from videos, detects objects using YOLOv8, and generates visual embeddings with CLIP to represent each scene. These embeddings are stored in a FAISS vector index, enabling fast search and retrieval of relevant scenes based on objects or visual similarity. Built backend services with FastAPI for video upload, timeline generation, frame preview, and scene navigation, and integrated LLMs (GPT-5, GPT-4o-mini, GPT-4.1) to automatically summarize video timelines. A Next.js interface was developed to let users easily search scenes, preview frames, and jump directly to important timestamps in the video.