Category Archives: Vidéo

PhD position available

Update: this position is no longer available

Acquisition and visualization of the Plenoptic function with intermediate view synthesis

Context

There is an increasing interest towards the applications that allow Free Navigation Video Services [1], where users can modify the viewpoint on a scene while receiving a video. These services try to provide the user with the so-called Plenoptic function of the scene [2], defined as:

P_f(x,y,z,theta,phi,lambda,t)

It gives the light intensity at each position  for any incident angle , for any wavelength  and at any time.  This doctoral project is focused on three key problems related to the use of the Plenoptic function : its acquisition, synthesis and visualization.

Current tools for acquisition do not allow collecting the whole Plenoptic function; on the contrary, they allow a sampling of it. For example, in Super-MultiView video[3], the plane (z=z_0)  is fixed, and only the forward scene, i.e. when the polar angle comprised , is between -pi/2 and pi/2, is acquired. Moreover, the plane  is sampled at the position of each camera.

In this project we are interested in the interpolation of the Plenoptic function, i.e. in the synthesis of virtual viewpoints that were not acquired by real cameras. Moreover, we also want to explore the case of irregular sampling position of P_f.

Challenges

Access to the Plenoptic function would allow new ways to create and consume visual contents. For example, the Fyuse application [4] allows to change the view angle during the reproduction, while the Lytro system [5] allows post-acquisition refocusing.

Several scientific fields are concerned by this approach :

  • Image aesthetics [6]
  • virtual cinema [1]
  • perception and visual attention [8][9]
  • free viewpoint video  [10] [11]

These items interact one with the other : view synthesis is preliminary for virtual cinema and may benefit from visual attention and perception information ; the whole process impacts on the quality and the aesthetics of the resulting image.

Methodology

Image synthesis plays a key role in the system that we want to implement. We can see the problem as the interpolation of the Plenoptic function from a set of samples [12]. This reconstruction is based on the scene geometry and often uses post-processing for alleviating the synthesis artifacts.

Image synthesis and rendering have been long studied by the Computer Vision community and the Compression community, even outside the context of Plenoptic function interpolation. The first methods only used the images for the synthesis: they fall into the Image-Based Rendering (IBR) [13] family. Disparity estimation and occlusion detection are typical tools used to improve the synthesis for this case[14], and may prove useful in this doctoral project.

When the depth information is also available, we have the Depth Image-Based Rendering (DIBR) [15] family. Even though DIBR is known since the first 2000’s, the quality of synthesis is not fully satisfying yet [16]. Nevertheless, some promising methods have been proposed recently [17]. They combine temporal and inter-view redundancy to improve the synthesis.

Another difficulty may come from the camera positioning [18].  A preliminary calibration and synchronization phase are needed in order to have a high quality synthesis [19] [20] [21]. To this end, feature matching tools could be employed, such as  SIFT [22], SURF [23]. This look necessary in order to achieve the 3D scene understanding [1][18] .

Work agenda

This doctoral project will start with a deep and accurate study of the state of the art in the different concerned domains : image synthesis, camera calibration, 3D geometry, feature matching, visual attention. From a practical point of view, the PhD candidate may use the facilities at b<>com to test the acquisition of the Plenoptic function and to perform camera calibration and synchronization.

Then, the PhD candidate will test and implement different synthesis methods, starting from the state of the art, and then proposing more complex and effective solutions. Human vision principles should be integrated into the new approaches.

At the same time, the impact of the synthesis methods on such practical applications as visualization, free navigation, virtual cinema, …, will be taken into account. The final target of the doctoral project is the mastering of the complete system from acquisition to visualization.

Advisors :

Rémi Cozot, Maître de Conférences, Habilité à Diriger des Recherches, IRT b<>com, IRISA/Université de Rennes 1 – cozot@irisa.fr

Marco Cagnazzo, Maître de Conférences, Habilité à Diriger des Recherches, IRT b<>com, Telecom-ParisTech/Institut Mines-Télécom– cagnazzo@telecom-paristech.fr

Bibliography

  1. Tanimoto, Free-Viewpoint Television Image and Geometry Processing for 3-D Cinematography, M. Ronfard, Ré. & Taubin, G. (Eds.) Springer Berlin Heidelberg, 2010, 53-76
  2. H. Adelson and J. Bergen, “The plenoptic function and the elements of early vision,” In Computational Models of Visual Processing, pages 3-20. MIT Press, 1991
  3. Dricot, A.; Jung, J.; Cagnazzo, M.; Pesquet, B. & Dufaux, F. « Full Parallax 3D Video Content Compression ». Dans Novel 3D Media Technologies, Springer New York, 2015, 49-70
  4. http://fyu.se
  5. http://lytro.com
  6. C Bist, R. Cozot, G. Madec, X. Ducloux, Style Aware Tone Expansion for HDR Displays. Graphic Interface 2016
  7. Lino, M. Christie, Efficient composition for virtual camera control. ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2012S. Hillaire, A. Lécuyer, T. Regia-Corte, R. Cozot, J. Royan et G. Breton, Design and application of real-time visual attention model for the exploration of 3d virtual environments. IEEE Transactions on Visualization and Computer Graphics (TVCG), 18(3):356–368, 2012
  8. Hillaire, A. Lécuyer, R. Cozot et G. Casiez, Depth-of-field blur effects for first-person navigation in virtual environments. IEEE Computer Graphics and Applications, 28(6):47–55, 2008
  9. [Farin et al. 2006] D. Farin, Y. Morvan, PHN. de With, View Interpolation Along a Chain of Weakly Calibrated Cameras. IEEE Workshop on Content Generation and Coding for 3D-Television, Eindhoven, Netherlands, June 2006
  10. [Dufaux et al 2013] F. Dufaux, B. Pesquet-Popescu, M. Cagnazzo (eds.): Emerging Technologies for 3D Video. Wiley, 2013
  11. Chebira, A., Dragotti, P. L., Sbaiz, L., & Vetterli, M. (2003, September). Sampling and interpolation of the plenoptic function. In Image Processing, 2003. ICIP 2003. 2003 International Conference on (Vol. 2, pp. II-917). IEEE
  12. H Shum, S Kang, A review of image-based rendering techniques. Proceed. Intern. Symp. Visual Comm and Proc. (2000). doi: 10.1117/12.386541
  13. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. “Novel solutions for side information generation and fusion in multiview DVC”. In EURASIP Journal of Advances on Signal Processing, vol. 2013, no. 154, pp. 17, Octobre 2013.
  14. Fehn, C. (2004, May). Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Electronic Imaging 2004 (pp. 93-104). International Society for Optics and Photonics.
  15. Dricot, J. Jung, M. Cagnazzo, F. Dufaux, B. Pesquet-Popescu. “Subjective evaluation of Super Multi-View compressed contents on high-end light-field 3D displays”. In Elsevier Signal Processing: Image Communication, vol. 39, pp. 369-385, Novembre 2015
  16. Purica, E. Mora, M. Cagnazzo, B. Ionescu, B. Pesquet-Popescu. “Multiview plus depth video coding with temporal prediction view synthesis”. In IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 2, pp. 360 – 374, February 2016.
  17. [Snavely 2008] N. Snavely, S.M. Seitz, R. Szeliski. Modeling the world from internet photo collections. Int. J. Comput. Vis., 80 (2) (2008), pp. 189–210
  18. [Milani 2016] Simone Milani, Compression of multiple user photo galleries, Image and Vision Computing, Volume 53, September 2016, Pages 68-75
  19. [Zini et al 2013] L. Zini, A. Cavallaro, F. Odone. Action-based multi-camera synchronization. IEEE J. Emerging Sel. Top. Circuits Syst., 3 (2) (2013), pp. 165–174
  20. [Shen et al 2010] L. Shen, Z. Liu, T. Yan, Z. Zhang, P. An. View-adaptive motion estimation and disparity estimation for low complexity multiview video coding. IEEE Trans. Circuits Syst. Video Technol., 20 (6) (2010), pp. 925–930
  21. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol. 2, pp. 1150-1157). Ieee.
  22. Bay, H., Tuytelaars, T., & Van Gool, L. (2006, May). Surf: Speeded up robust features. In European conference on computer vision (pp. 404-417). Springer Berlin Heidelberg.

 

Article on Network Coding for Multi-view Video

Our article about Network Coding for Multiview Video has been accepted in Springer  EURASIP Journal on Advances in Signal Processing.

The basic idea is to adapt the scheduler of a multi-view stream (H.264/MVC format) to user preferences, exploiting Network Coding to maximize the PSNR.

The article is in Open Acces on the journal web site and in the Publications/Journals section of this site.

PhD Thesis: compression of avionics screen content

The airplane screens have a very specific video content, where text and graph are superposed to images or to a uniform background.

Compressing this kind of data requires adapted techniques, since the most important information (text, graph) is usually degraded by traditional, transform-based video compression techniques.

We want to investigate the use of classification, segmentation and inpainting to recognize the most relevant information and encode it with appropriate methods.

The PhD student will work at both Telecom-ParisTech and Zodiac Aerospace

APPLY HERE:

http://www.adum.fr/as/ed/voirproposition.pl?site=PSaclay&matricule_prop=9954

Decoded sequences for our ICIP’15 submission

The decoded video sequences for our submission to ICIP’15 are available here. Each file is about 300MB.

Reference method Proposed method
Four People Four People
Johnny Johnny
Kirsten and Sarah Kirsten and Sarah

The use case is the following. The three HEVC class-E sequences (Four_People, Johnny, Kirsten_and_Sarah) have been encoded with the proposed method (our ICIP’15 submission) and the standard HEVC encoder (HM13). Then we simulated transmission on a lossy channel, using a Gilbert-Elliot model. Finally, we decoded the received packets, employing a simple error concealement technique. These videos show the superiority of the proposed scheme with respect to the reference.