Content-based access to video objects: Temporal segmentation, visual summarization, and feature extraction

Bilge Günsel*, A. Murat Tekalp, Peter J.L. Van Beek

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

38 Citations (Scopus)


The classical approach to content-based video access has been 'frame-based', consisting of shot boundary detection, followed by selection of key frames that characterize the visual content of each shot, and then clustering of the camera shots to form story units. However, in an object-based multimedia environment, content-based random access to individual video objects becomes a desirable feature. To this effect, this paper introduces an 'object-based' approach to temporal video partitioning and content-based indexing, where the basic indexing unit is 'lifespan of a video object', rather than a 'camera shot' or a 'story unit'. We propose to represent each video object by an adaptive 2D triangular mesh. A mesh-based object tracking scheme is then employed to compute the motion trajectories of all mesh node points until the object exits the field of view. A new similarity measure that is based on motion discontinuities and shape changes of the tracked object is defined to detect content changes, resulting in temporal lifespan segments. A set of 'key snapshots' which constitute a visual summary of the lifespan of the object is automatically selected. These key snapshots are then used to animate objects of interest using tracked motion trajectories for a moving visual representation. The proposed scheme provides such functionalities as object-based search/browsing for interactive video retrieval, surveillance video analysis, and object-based content manipulation/editing for studio postprocessing and desktop multimedia authoring. The approach is applicable to any video data where the initial appearance of object(s) can be specified, and the object motion can be modeled by a piecewise affine transformation. The system is demonstrated using different types of video: virtual studio productions (composited video), surveillance video, and TV broadcast video.

Original languageEnglish
Pages (from-to)261-280
Number of pages20
JournalSignal Processing
Issue number2
Publication statusPublished - 30 Apr 1998
Externally publishedYes


This work is supported by a National Science Foundation SIUCRC grant and a New York State Science and Technology Foundation grant to the Center for Electronic Imaging Systems at the University of Rochester.

FundersFunder number
National Science Foundation
New York State Science and Technology Foundation


    • Content-based access
    • Object-based queries
    • Object-based video
    • Video content summarization
    • Video databases


    Dive into the research topics of 'Content-based access to video objects: Temporal segmentation, visual summarization, and feature extraction'. Together they form a unique fingerprint.

    Cite this