MVD format

One of the most popular representations of multiview video and arguably the most adapted  to free-viewpoint television is the so-called multiple-views-plus-depth (MVD) format. When MVD is used, for each view we dispose of a texture video sequence and of a depth map sequence, representing, for each temporal instant, the distance of the current pixel from the point of observation. An example of MVD video is shown in the following Figure.

Multiple views plus depth video

On the first row we show the color images of the first camera, at time instants 1 to 4. On the second row we show the corresponding depth maps. The third and the fourth row are the color images nad the depth maps  for the second camera.

MVD is extremely demanding in terms of storage space and transmission bandwidth, therefore compression is mandatory in order to manage this representation. Several approaches exist for MVD compression. A simple, first one, is to independently compress each texture and depth sequence from each view. This approach is commonly referred
to as Simulcast. Simulcast has the advantage of being simply implementable, backward compatible, and of allowing to decode immediately a single view for 2D screens.
It has been chosen as reference in the Call for Proposal issued by the MPEG committee for the standardization of MVD.  Of course, one expects that more sophisticated schemes, taking into account the redundancy between views and between texture and depth, would achieve far better compression performance than the multicast scheme (this is actually the rationale behind the CfP). For example, one can apply H.264/MVC over texture sequences and (separately) over depth maps. Since depth and texture have very different content, no coding gain is expected by jointly coding texture and depth with H.264/MVC. Nevertheless, some redundancy between texture and depth does exist, and
this scheme does not exploit it. For example they partially share movement and disparity information, and above all, rate allocation between them should be jointly performed.
However the latter is a quite difficult issue, and one of the key problems to be solved in order to achieve efficient coding.

We study hence a MVD compression scheme exploiting dense disparity estimation to obtain RD efficient prediction not only between textures (belonging to different views) but also between depth maps. Moreover we explore the relationship between the estimation parameters and the compression ones. This study results in a coding paradigm providing competitive performanceswith respect to the state of the art.

Mon blog professionnel d'enseignant/chercheur de Telecom-ParisTech