Multilingual Video Understanding and Event Summarization System (Thai-English Timeline Intelligence)

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries. This project demonstrates practical execution from architecture and implementation to measurable delivery outcomes.

Personal ProjectsYear 2026

Project Overview

Objective

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.

Stack

FastAPIOpenCVBLIP-2FLAN-T5NLLB-200ReactTailwind CSS

Delivery highlights

  • Built an end-to-end AI video analysis system that accepts a video file as input and performs temporal segmentation based on a configurable Seconds Per Clip parameter (e.g., every 4 seconds). The system samples frames at fixed intervals using OpenCV, generates segment-level descriptions in English using BLIP-2, summarizes the overall video using FLAN-T5, and translates the outputs into Thai using NLLB-200, producing bilingual timeline descriptions and summaries (description_en, description_th, summary_en, summary_th).The system exposes its functionality through FastAPI, which handles video upload and returns structured JSON results. The frontend is built with React for user interaction and timeline visualization, while Tailwind CSS is used for UI styling and layout design.
Back to Topic ProjectsBack to All Projects

Related Projects

3 items

Multilingual Semantic Video Event and Action Search Engine and API

Personal ProjectsYear: 2026

Built FastAPI semantic search over videos with clip indexing and multilingual query support.

Text-to-Video Semantic Search

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Multimodal Semantic Retrieval (Video and Image Search)

Personal ProjectsYear: 2026

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.