PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

1Fudan University     2University of Michigan    

Figure 1. Panoramic video object segmentation (PanoVOS). PanoVOS targets tracking and distinguishing the particular instances under content discontinuities (e.g. penguin in the image of $T = 15$) and serve distortion (e.g. penguin in the image of $T = 65$). We show the sample of (a) frames, (b) segmentation annotations, and (c) area proportion of foreground for the Penguin video in our dataset.

Abstract

Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, i.e., PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame. Extensive experiments demonstrate that compared with the previous SOTA models, our PSCFormer network exhibits a great advantage in terms of segmentation results under the panoramic setting. Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.

Experiments

We benchmark the state-of-the-art methods to the best of our knowledge, please see the PanoVOS Report for details. If your method is more powerful, please feel free to contract us for benchmark evaluation, we will update the results.

TABLE 2. Domain transfer result of (static image datasets + YouTubeVOS)→(PanoVOS Validation & Test). Subscript $s$ and $u$ denote scores in seen and unseen categories. $MF$ denotes multiple historical frames as reference. $↓$ represents the performance of the declining values compared to the YouTube-VOS dataset. $∗$ denotes a large-scale external dataset BL30K dataset is used during training. $†$ denotes no synthetic data is used during the training stage.

Downloads



The dataset is avalibale on Google Drive, please kindly refer to PanoVOS for more details.
🚀 Download the dataset using gdown command:
  🎉 train.zip 5.88 GB
  gdown https://drive.google.com/uc?id=178E1TYK7tgj-FXzgnjJyx9gZs2GhQBqr
  🎆 valid.zip 3.5 GB
  gdown https://drive.google.com/uc?id=10P49VBM7vhGHCqhYvaIjzs7wV0oKLGNl
  📌 test.zip 3.21 GB
  gdown https://drive.google.com/uc?id=1dOiJ55rDP82Fdvm32OYuh1RGMCdHxCMe

Evaluation



  • ● Following YouTube-VOS, we use Region Jaccard $\mathcal{J}$ ($\mathcal{J}_{s}$ and $\mathcal{J}_{u}$), Boundary $\mathcal{F}$ measure ($\mathcal{F}_{s}$ and $\mathcal{F}_{u}$), and their mean $\mathcal{J}\&\mathcal{F}$ as the evaluation metrics.
    ● For the validation and test sets, the first-frame annotations are proposed to indicate the objects that will be segmentated.
    ● The validation set online evaluation server is [here] for daily evaluation. (🚀Done!)
    ● The test set online evaluation server is [here] for daily evaluation. (🚀Done!)

    Citation

    Please consider to cite PanoVOS if it helps your research.
    @article{yan2023panovos,
      title={PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation},
      author={Yan, Shilin and Xu, Xiaohao and Hong, Lingyi and Chen, Wenchao and Zhang, Wenqiang and Zhang, Wei},
      journal={arXiv preprint arXiv:2309.12303},
      year={2023}
    }

    License

    Creative Commons License
    PanoVOS is licensed under a CC BY-NC-SA 4.0 License. The data of PanoVOS is released for non-commercial research purpose only.

    Contact

    Any questions, suggestions and feedback are welcomed. Please concat tattoo.ysl@gmail.com