Project Overview
Objective
Developed an AI-powered video search and analysis system with scene detection and summarization capabilities.
Stack
FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISSYOLOv8Next.jsGPT-5GPT-4o-miniGPT-4.1
Delivery highlights
- Developed a video analysis application that allows users to upload videos and automatically analyze their content. The system extracts frames from videos, detects objects using YOLOv8, and generates visual embeddings with CLIP to represent each scene. These embeddings are stored in a FAISS vector index, enabling fast search and retrieval of relevant scenes based on objects or visual similarity. Built backend services with FastAPI for video upload, timeline generation, frame preview, and scene navigation, and integrated LLMs (GPT-5, GPT-4o-mini, GPT-4.1) to automatically summarize video timelines. A Next.js interface was developed to let users easily search scenes, preview frames, and jump directly to important timestamps in the video.