Ana səhifə

Using mpeg-7 at the consumer terminal in broadcasting

Yüklə 381.5 Kb.
ölçüsü381.5 Kb.
USING MPEG-7 at the consumer terminal in broadcasting

Alan Pearmain

Electronic Engineering Department, Queen Mary, University of London,

Mile End Road, London E1 4NS, ENGLAND

Tel: +44 20 7882 5342; fax: +44 20 7882 7997


Mounia Lalmas, Ekaterina Moutogianni Damien Papworth, Pat Healey and Thomas Rölleke

Computer Science Department, Queen Mary, University of London,

Mile End Road, London E1 4NS, ENGLAND


The European Union IST research programme SAMBITS (System for Advanced Multimedia Broadcast and IT Services) project is using Digital Video Broadcasting (DVB), the DVB Multimedia Home Platform (MHP) standard, MPEG-4 and MPEG-7 in a studio production and multimedia terminal system to integrate broadcast data and Internet data. This involves using data delivery over multiple paths and the use of a back channel for interaction. MPEG-7 is being used to identify programme content and to construct queries to allow users to identify and retrieve interesting related content. Searching for content is being carried out using the HySpirit search engine. The paper deals with terminal design issues, the use of MPEG-7 for broadcasting applications and using a consumer broadcasting terminal for searching for material related to a broadcast.


MPEG-7, Digital Television, Information retrieval, MPEG-4, Multimedia Home Platform.


SAMBITS is a European Union IST research programme project investigating ways in which digital television can enhance programmes and provide the viewer with a personalised service. Part of this enhancement requires broadcasting and the Internet to work together. The project is working on studio systems for producing content that allow a broadcaster to add additional information to the broadcasts and to link broadcasting and the Internet. The project is also working on terminals capable of displaying the enhanced content in a way that is accessible to ordinary users [1].

The broadcasting chain starts with normal MPEG-2 broadcast content that is sent by standard DVB techniques, but this is linked to extra content, including MPEG-4 audio-video sequences and HTML pages. MPEG-2 and MPEG-4 multimedia information has MPEG-7 [2, 6, 8] metadata added at the studio which describes certain features of the content. The extra MPEG-4 content may be sent over the MPEG-2 transport stream as separate streams, as part of the data carousel, in private sections or it may be sent over the Internet.

The terminal is based on the Multimedia Home Platform (MHP) [3] reference software running on a set-top box. MHP currently only supports MPEG-2, so the project is adding software to support MPEG-4 and MEPG-7, storage of multimedia content and searching of multimedia content. It is intended that the user will be able to access this content with a system that is an advanced set-top box and television with a remote control.

The SAMBITS project has twelve partners: Institut fuer Rundfunktechnik GmbH, European Broadcasting Union, British Broadcast Corporation, Brunel University, Heinrich-Hertz-Institut für Nachrichtentechnik Berlin GmbH, KPN Research, Philips Research, Queen Mary University of London, Siemens AG, Telenor AS, Fraunhofer-Institut für Integrierte Publikations-und Informationssysteme, Bayerischer Rundfunk. Queen Mary is contributing to the consumer terminal: the MPEG-7 descriptors, information retrieval and the user interface. The project started in January 2000 and finished at the end of December 2001. There was a demonstration of the project at IBC2001 in Amsterdam in September 2001.


The outline of the complete system that is being developed is shown in Figure 1. The studio system involves the development of various authoring and visualization tools. Standard equipment is being used for the broadcast and Internet servers and the terminal development is based on a Fujitsu-Siemens ACTIVY set-top box.

Figure 1 The SAMBITS system

Some of the functions that are available in the terminal are:

  • Enhanced programmes containing additional content and metadata information.

  • Instant access to the additional content, which may be provided via DVB or via the Internet.

  • Access to information about the current programme.

  • Searching for additional information either using metadata from the current programme or using a stored user profile.

One of the features of the system is that it provides a platform for investigating how MPEG-7 descriptors can be used at the consumer end in a broadcasting environment. The first problem was to choose a suitable set of descriptors. The descriptors that are useful to a user are high-level descriptions of the content. The studio will also include lower-level descriptors such as the percentage of different colours in a scene or camera information (since the studio involves expert users, e.g. programme editors etc.), but these would not be useful at the terminal.

User interaction is limited to remote control buttons, rather than a keyboard, as many television users do not feel comfortable having to use a keyboard and keyboards are bulky and relatively expensive. This produces some challenges for the user interface design, particularly in the construction of queries.

The user will have the option whether or not to display the MPEG-7 data that is associated with the current programme via an Info button on the remote control. Searches are constructed based on the MPEG-7 metadata available for the current programme. The retrieval engine uses HySpirit (, a retrieval framework based on probabilistic relational algebra [4].


The Fujitsu-Siemens ACTIVY box, which is used for the terminal, has the following characteristics:

  • Win98 operating system.

  • Integrated DVB-receiver

  • Optimisation of the graphical subsystem for display on a TV-screen.

  • DVB-C or DVB-S input

  • TV output via SCART, FBAS, S-Video either in PAL or NTSC norm, including macrovision, flicker reduction, and hardware support for transparent overlays

  • VGA-output

  • 2 MPEG-2 decoder chips

  • Common Interface for Conditional Access Module (DVB compliant)

  • AC97 codec, AC3 pass through

  • S/P-DIF I/O (digital audio I/O interface)

  • 600MHz Celeron processor

The box has a similar form factor to the current generation of set-top boxes.


The terminal receives an MPEG-2 transport stream and additional material. The additional material can be of several types:

  • MPEG-7 metadata, either information about the main MPEG-2 programme or the MPEG-4 or other additional material;

  • An MPEG-4 stream that is synchronised with the main programme and displayed as an object overlaid on the MPEG-2 picture. The display of this stream will be at user discretion. A typical application of this feature is displaying a signer for people who are deaf;

  • MPEG-4 material that could be an additional stream in the multiplex or could be transmitted via the data carrousel or could be available from the broadcaster’s web server via the Internet;

  • Web pages transmitted in the data carrousel or available via the Internet;

  • Other material such as 3-D models or games transmitted via the data carrousel or from the broadcaster’s web server via the Internet.

One of the uses of MPEG-7 metadata is to indicate the extra content that is available at different times during the programme. The overall architecture of content management in the terminal is shown in Figure 2.

Figure 2 Content Review and storage

To synchronise the MPEG-7 data with the MPEG-2 stream, UDP packets containing time data are sent from the studio system to the terminal. The MPEG-7 user interface uses an integrated browser based on the Mozilla HTML browser. The MPEG-7 information is transformed from XML to HTML using style sheets, and the embedded browser then renders the HTML.

Additional controls for the MPEG-7 engine, such as searching for related material, are also placed in the HTML pages generated.

5MPEG-7 content description

The MPEG-7 standard specifies a rich set of description structures for audio-visual (AV) content, which can be instantiated by any application to describe various features and information related to the AV content. A Descriptor (D) defines the syntax and the semantics of an elementary feature. This can be either a low-level feature that represents a characteristic such as colour or texture, or a high-level feature such as the title or the author of a video. A Description Scheme (DS) uses Descriptors as building blocks in order to define the syntax and semantics of a more complex description. The syntax of Ds and DSs is defined by the Description Definition Language (DDL). The DDL is an extension of the XML schema language [5] and can also be used by developers for creating new Ds and DSs according to the specific needs of an application.

The set of description structures that MPEG-7 standardises is very broad so each application is responsible for selecting an appropriate subset to instantiate, according to the application’s functionality requirements. The choice of the MPEG-7 descriptions that were considered to be suitable for the SAMBITS terminal functionality was based on what was available at the time at working level [Error: Reference source not found]. The project contributed to the standardisation process. Elements that were still evolving and the use of which was not clear were not considered. The names were later updated to conform to the Final Committee Draft (FCD) elements [6]. The use of MPEG-7 has also been discussed in [7, 8, Error: Reference source not found].

After examining the available description schemes, those areas of MPEG-7 that were considered potentially useful to any SAMBITS application were identified, i.e. the Multimedia Description Schemes part. In particular, the Basic Elements on which the high level descriptions are built, the Content Creation and Production which provide information related to the programme, the Structural Aspects which allow a detailed structured description of the programme, and the User Preferences. Ds and DSs that describe low-level visual or audio aspects of the content were not considered to be useful for the terminal functionality desired, where high-level descriptions meaningful for the viewers were required. Elements from the above areas were then selected, so that the minimum functionality could be achieved at the SAMBITS terminal. The selection is shown in Table 1: For each chosen element type listed in the first column, the related elements (Ds and DSs) which are used are listed in the second column.


Contained Elements







Segment (type=”VideoSegmentType”)


MediaTime (datatype)





Related Material








Related MaterialType





Structured Annotation

Strcutured AnnotationType

Who, WhatObject, WhatAction, Where, When, Why, How














Table 1 MPEG-7 elements selected for the SAMBITS terminal

The following sections describe in more detail the MPEG-7 elements that were implemented for the terminal functionality.

5.1Structural Aspects

The MPEG-7 descriptions at the terminal focus on the structural aspects of the programme. The Segment DS is used to describe the structure of the broadcast programme. Specifically, the Video Segment DS, which describes temporal segments of the video, is used. The Video Segment Decomposition tools are then used for temporally decomposing segments into sub-segments to capture the hierarchical nature of the content. The result is called a Table of Contents where, for example, a video programme can be temporarily segmented into various levels of scenes, sub-scenes and shots. Media Locator and Media Time Ds contain the reference to the media and time information respectively.

The Table of Contents allows a granular description of the content, which is needed at the terminal to support the user navigation through the programme and to provide information at various levels of detail. It is also useful for the search functionality of the terminal, as this allows the retrieval to return the most relevant part within a video.

The hierarchically structured content description allows further descriptions to be attached at the different segments of the hierarchy, in order to provide a high level representation of the content at a given granularity. The MPEG-7 structures used in SAMBITS are described in the next subsections.

5.2 Creation Information

The Creation Information DS, which is part of the Content Creation & Production set of DSs, was used to provide general background information related to the videos. In particular, Creation DS provides a Title, an Abstract and the Creator. Classification DS describes how the material may be categorised into Genre and Language. For our Classification instance, free text is used instead of any classification schemes or controlled terms.

The Creation and Classification descriptions are useful for the search functionality by performing matching on the basis of these features. The creation and classification information can also be used in combination with profile information to perform ranking of search results according to user preferences.

The Related Material DS, which describes additional material that is linked to the content, was also implemented. In particular, the Media Locator of the referenced material is only included as it is assumed that the referenced material has also been described.

The Related Material descriptions at the terminal allow an integrated view of the main broadcast programme and all the linked content.

5.3 Textual Annotation

Free Text Annotation and Structured Annotation DSs provide the main description of each segment that is meaningful for a viewer. In particular, the following elements: Who, WhatObject, WhatAction, Where, When, Why, How, are used for the Structured Annotation.

The textual annotation provides the main features of the multimedia material that are used for matching the queries and the material when searching. It is therefore used for representing the material for the search engine and for representing the queries. It is envisaged that the structured annotation will also allow users to specify some keywords that most nearly represent the type of information that they wish to locate.

An example description of a video segment as used in Sambits can be seen in Figure 3. The video is described as audio-visual content, which is described by creation (title, abstract, creator), classification (genre, language) and media information (time, location). The video is temporally decomposed into scenes and scenes are decomposed into shots. For the segments at any level, there may be textual annotation, both free text and structured (who, what object, where etc.). Related material for each segment can also be specified, using links to its location. Note that it is enough to have the creation and classification information only at the root level, since it is inherited to child segments of the decomposition (unless they are instantiated again).

Figure 3 Example description of a video segment

Note that the description of the structure of the video is generated semi-automatically by first using a segmentation algorithm that identifies the shots and then editing the structure to achieve the desired hierarchical structure. The Creation Information and Textual Annotation DSs have then to be attached manually, with the support of existing tools.

To illustrate the procedure, we use the extract shown in Figure 4 of a sample MPEG-7 description of a soccer game. The extract consists of an audio-visual segment (AudioVisualSegmentType), composed of two sub-segments (SegmentDecomposition). Creation information is provided for the audio-visual segment, such as a Title, an Abstract, the Creator, the Genre and Language (the content management part of MPEG-7). The segment also has a free text annotation. The sub-segments (VideoSegmentType) correspond to video shots. Each sub-segment has a free text annotation component.

Spain vs Sweden (July 1998)

Spain scores a goal quickly in this World Cup soccer game against Sweden. The scoring player is Morientes.




Soccer game between Spain and Sweden.





Figure 4 Extract of a MPEG7 Description

5.4User Preferences

The User Preference DS is used to specify user preferences with respect to the content. Browsing Preferences that describe preferred views of the content (i.e. summary preferences) can be used for displaying the search results. Filtering and Search Preferences that describe preferences for content in terms of genre or language can be exploited to classify the search results.

The User Preferences that are best for the terminal functionality have not yet been fully determined. A number of user studies are currently taking place to investigate which of the standardised preferences best correspond to viewer needs. Note that the user preferences are created at the terminal side, as opposed to the content description that is created at the studio side.

The definition of descriptors within the MPEG-7 standard was still ongoing at the time of the project, but these descriptors were in the set that was the candidate for adoption in the standard.

5.5Binary MPEG-7

MPEG-7 data can be transported over the broadcast channel either as text or as a binary representation. The binary representation is a recent development within the standardisation process. If a binary form is used, it must first be decoded to the text description, which is an XML structure. An XSLT processor is then used, together with a style sheet, to produce a HTML version of the description. The HTML is sent to a local web server on the terminal. If the user requests the MPEG-7 data about the current programme, the HTML browser on the terminal is used to send a request to this local web server.

The MPEG-7 content data is displayed overlaid on an area of the screen. Another overlay strip on the screen shows the control buttons that have different uses depending on the mode of the terminal. In the MPEG-7 information display mode the round circle button allows the data display to be turned on or off.


Queries are constructed from MPEG-7 data for the current programme. An example of query construction is shown in Figure 5

Figure 5 Construction of a query

The user has asked for further information and has been presented with the information that is immediately available. He or she can select which of these items he or she wants to select with the up/down buttons on the remote control. Users are also presented with an option to extend the search. The search could then be extended either to the Internet server of the broadcaster or to the whole of the Internet.

The query is formulated as an XML query and sent to the HySpirit search engine. This is a retrieval framework based on a probabilistic extension of well-known database data models such as the relational model, the deductive model and the object-oriented model for information retrieval purposes. HySpirit allows content to be captured (e.g. terms), facts (e.g. authors) and structure (e.g. XML) in retrieving information from semi-structured and heterogeneous data sources.

We also use MPEG-7 descriptions of the multimedia content in order to develop integrated search mechanisms that assist users in identifying additional material that is specifically of interest to them.

The MPEG-7 data associated with the programme, or programme elements, is used to build queries. Context-sensitive buttons are used to display the MPEG-7 description of the current programme element and the user then uses check boxes to select the terms that they would like to use as the basis of an additional search. Thus, a description of a segment of a football match might include the name of a player, the name of a team, the stadium where the game is being played and so on. Any of these items could form the basis of a search for further information.

In addition, MPEG-7 is used to represent the available content and provide indexing information that is taken into account during the search in order to match the queries and retrieve the relevant material. Information retrieval techniques for MPEG-7 data were developed and implemented using the HySpirit framework. The method returns a list of ranked results, so that the parts most interesting to the user are presented first.

Figure 6 The search system

Figure 6 shows the system for processing a query. The query is formed in the HTML browser and sent to the local web server. The query is then sent to the search engine as a XML query. An indexing module in the search engine converts the XML query to a Probabilistic Relational Algebra (PRA) query that is suitable for submission to HySpirit and HySpirit returns PRA results that are converted to XML. These are then processed with an XSLT style sheet to give the results in a rank order as HTML to send to the browser. Figure 7 shows the results of the search with the ranking as a percentage. We may present this ranking information is some alternative graphical form. A filter module for the system is not shown in Figure 6, but this can be included so that the results presented are based on a user profile (see below).

Figure 7 Display of search results

7Use of user preferences

Users are able to store a profile of their preferences and both the metadata about the current programme and the search results will be filtered according to this profile before display. Some preferences will relate to the type of data to be displayed, e.g. a user could select that he was not interested in place information or that he only wanted to see the two best match results from a search. Other options could be added about the interests of the user exploiting the Navigation and Access component of MPEG-7. Such parameters are, for example, the number of results per page, whether they see a thumbnail or not, or the level of detail of the description of the results [9]. A possible development would be to monitor user searches and requests to automatically build a user profile to filter results. Note that viewers will have the option of editing a set of parameters in their personal profile concerning the display of the list of search results.

8project demonstration

The whole studio and terminal system developed in SAMBITS was demonstrated at IBC2001 in Amsterdam. Two scenarios were used in the demonstration: one based on a programme about dinosaurs and the other based on the 2001 Eurovision song contest broadcast.

The dinosaur programme offers MPEG-4 clips of background technical data that are available at appropriate times during the programme and related HTML pages. There is also MPEG-7 indexing of the main programme content and additional content so that particular programme segments can be found by content and the content description can be displayed superimposed on the main programme. A signer to assist the deaf is also broadcast as a synchronised MEPG-4 stream and the signer can, at viewer discretion, be superimposed on either the main programme or the MPEG-4 related content.

The Eurovision song contest programme allowed metadata on singer, song title, country etc. to be provided and searches to be carried out to find more information about the singer, songs from previous years, information about the singer country of origin etc. and an extra backstage camera view of the contest. Some of this material was available as MPEG-4 multimedia content via the object carousel and some was the type of information that would normally be available from the Internet. Some of the material was provided by web pages and there is a 3-D Hall of Fame that could be navigated to select previous contest winning entries.


The SAMBITS project has developed a system for enhancing and personalising digital television broadcasts by adding MPEG-4, MPEG-7 and other data to MPEG-2 transport streams or via the Internet. This paper has described some of the work in developing a consumer terminal to support the use of MPEG-7 metadata. The MPEG-7 metadata associated with broadcast programme content can be displayed on screen, at viewer request and used to select additional programme data or to construct queries. One use of additional MPEG-4 material is to broadcast a signer that can be superimposed on part of the screen for the hard of hearing and that can be turned on or off by the viewer. The consumer terminal for the enhanced broadcasts can be implemented on an advanced version of a set-top box using the Multimedia Home Platform software with extensions to support MPEG-4 and MPEG-7.

As part of this work suitable descriptors and description schemes for use at a consumer broadcast terminal have been developed. The system also allows the construction of queries from this metadata using the set-top box remote control. The queries are submitted to the HySpirit search engine and the results are returned in rank order. Results are filtered according to user preferences.

The initial tests of the system have used material related to a programme about dinosaurs and material related to the 2001 Eurovision song contest broadcast. The system should form an excellent platform for evaluating user reaction to these functions for integrating the Internet with television.


1[] P Healey, M Lalmas, E Moutogianni, Y Paker and A Pearmain. Integrating Internet and digital video broadcast data, Proceedings of 4th world multiconference on Systemics, Cybernetics and Informatics (SCI 2000), Orlando, Florida, July 23-26 2000 Vol.1 pp 624-627.

2[] ISO MPEG-7, MPEG-7 Multimedia Description Schemes WD (Version 4.0), ISO/IEC JTC 1/SC 29/WG 11/N3465, Beijing, July 2000.

3[] J-P Evain, The multimedia home platform-an overview, EBU Technical Review, no.275, Spring 1998, pp.4-10.

4[] N Fuhr, T Rolleke. HySpirit-a probabilistic inference engine for hypermedia retrieval in large databases. Advances in Database Technology - EDBT'98. 6th International Conference on Extending Database Technology Proceedings. Springer-Verlag. 1998, pp.24-38.

5[] D. C. Fallside (Ed.), XML Schema part 0: Primer, May 2001.

6[] ISO MPEG-7, Text of 15938-5 FCD Information technology – Multimedia Content Description Interface – Part 5 Multimedia Description Schemes, ISO/IEC JTC 1/SC 29/WG 11/N3966, Singapore, March 2001.

7[] W. Putz, The usage of MPEG-7 Metadata in a Broadcast Application, Media Futures, Florence, Italy May 2001, pp 235-238.

8[] P. Salembier, Overview of the MPEG-7 standard and of future challenges for visual information analysis, Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 01), Tampere, Finland, May 2001, pp. 75-84.

9[] D Ileperuma, M Lalmas and T Roelleke. MPEG-7 for an integrated access to broadcast and Internet data, Media Futures, Florence, Italy, May 2001, pp 129-132.

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2016
rəhbərliyinə müraciət