Dense Semantic Image Segmentation with Objects and Attributes

Dense Semantic Image Segmentation with Objects and Attributes

Shuai Zheng1 Ming-Ming Cheng1 Jonathan Warrell1 Paul Sturgess1 Vibhav Vineet1 Carsten Rother2   Philip H. S. Torr1

1Torr-Vision Group, University of Oxford    2CV-Lab, TU Dresden

The concepts of objects and attributes are both important for describing images precisely, since more informative verbal descriptions often contain both adjectives and nouns (e.g. ‘I see a shiny red chair’). In this project, we formulate the problem of joint visual attribute and object class image segmentation as a dense multi-labeling problem, where each pixel in an image can be associated with both an object-class and a set of visual attribute labels.


Fig. 1 Goal of the semantic image segmentation.


Fig.2  Illustration of Factorial-CRF-based Semantic Segmentation for object classes and Attributes. (a) shows the input image. (b) shows the ground truth mask image for object classes. (c) shows the attributes masks. (d) compares various CRF topologies including a grid CRF , a fully-connected CRF , and a hierarchial fully connected CRF

Dataset for Semantic Image Segmentation for Objects and Attributes


Fig. 3 Extra annotation on NYU dataset (aNYU: Attributes augmented NYU Dataset). Attributes annotations on aPascal, CORE dataset.

Fill in a simple survey to let us to notify you when we have new update.

aNYU dataset for semantic image segmentation with objects and visual attributes
In this paper, we augment the NYU dataset with 11 additional visual attributes(1: Wood(Material) 2: Painted(Material) 3: Paper(Material) 4: Glass(Material) 5: Brick(Material) 6: Metal(Material) 7: Flat(Shape) 8: Plastic(Material) 9: Textured(Material) 10: Glossy(Surface) 11: Shiny(Surface)).

We have released this dataset (1449 Images in total, with train/validation split as follows. You can also random shuffle the 1449 images, and then take top 725 images for training, then 100 images for validation, and the rest 624 images for test). You can also use the AttriMarker tool to annotate other attributes you are interested in.

aNYU.tar.gz (md5: 6660ea0d900c51ec14e0122352aa92ae, 111MB)

traintestsplit_aNYU_ImageSpirit.tar.gz (md5: 3a149fafecc82dc11c621c7de87d54bf, 40K), 7.3MB) (md5: 9fc5265950e027a529ee680977112f2f, 7.5MB) UserGuide4AttriMarker.pdf files, md5:f159d93f63a7e4289b2ab5145ba08273, 70MB)

(R:random value for visualization, G:object_class_id, B:attribute_id), for the generated annotation, first column is the region_id, second column is the object_class_id, from third to the later are the attribute_id)

CORE dataset for semantic image segmentation with objects and visual attributes
In this paper, we use the CORE  dataset for evaluating dense semantic image segmentation for objects and visual attributes (including 1) Bare Metal, 2) Feathers, 3) Fur/Hair,Glass, 4) Painted Metal/Plastic, 5) Rubber,6) Scales,7) Skin, 8) Wood). This dataset contains 1059 Images in total, with train/validation split as follows. There are 594 val images, and 465 training images.

CORE.rar (md5: 8ec5f2357bf3d2301a4a9018f5ca984e, 117MB)

trantestsplit_CORE.tar.gz(md5:034819430f21ff16b09820b78fd69a3f, 1MB)


aPascal dataset for semantic image segmentation with objects and visual attributes
In this paper, we transfer the aPascal  dataset for detection to a new aPascal dataset for segmentation, by looking into the ground truth masks of train/validation sets in Pascal 2007-2012 datasets. The resulting dataset is a new aPascal dataset to evaluate dense semantic image segmentation for objects and visual attributes. This dataset contains 639 Images in total, with train/validation split as follows. There are 313 val images, and 326 training images. Region-level attributes in this dataset are those attributes annotated in original aPascal dataset.

aPascal.tar(md5: bf3ae4591ae270992a2d2e7eb177339d, 107MB)

Train/Val split:

C++/C Code

Note you might need to use ALE library to create the pixel-wise unary potential.


[1] Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, Philip H. S. Torr, “Dense Semantic Image Segmentation with Objects and Attributes“, IEEE International Conference Computer Vision and Pattern Recognition (IEEE CVPR), 2014. (accepted)[bib][slides][poster][video]

[2] Shuai ZhengMing-Ming Cheng, Wen-Yan Lin, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip H. S. Torr. “ImageSpirit: Verbal Guided Image Parsing” , ACM Transactions on Graphics, 2014. (* indicates the equal contribution.) [bib][project][youtube][youku]


Related Works

[0] Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip H.S. Torr. Associative Hierarchical Random Fields. IEEE PAMI. 2014. (The Automatic Labeling Environment Library) [Stable Code][ALE1.01].

[1] A. Adams, J. Baek, and A. Davis. Fast High-Dimensional Filtering Using the Permutohedral Lattice, Eurographics 2010.

[2] Philipp Krähenbühl and Vladlen Koltun. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. NIPS 2011.

[3] A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth, “Describing Objects by their Attributes”, CVPR 2009.

[4] Genevieve Patterson, James Hays. SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes.  CVPR 2012.

[5] Ruiqi Guo and Derek Hoiem. Support Surface Prediction in Indoor Scenes  ICCV, 2013.

[6] Sean Bell, Kavita Bala, Noah Snavely. Intrinsic Image in the wild. ACM TOG, 2014.

[7] Joseph Tighe and Svetlana Lazebnik “SuperParsing: Scalable Nonparametric Image Parsing with Superpixels,”  International Journal of Computer Vision, 2013.

[8] Derek Hoiem, Alexei A. Efros, and Martial Hebert. Geometric Context from a Single Image. ICCV 2005.


This project is supported by EPSRC EP/I001107/2Scene Understanding using New Global Energy Models”, ERC HELIOS 2013-2018Advanced Investigator Award Towards Total Scene Understanding using Structured Models”, and Google Research Award 2012-2013. Carsten Rother was awarded an ERC Consolidator Grant.

3 thoughts on “Dense Semantic Image Segmentation with Objects and Attributes

  1. 请问下在论文《Dense Semantic Image Segmentation with Objects and Attributes》里,为什么需要处理不同级别的属性使用,如果只有一个像素级不可以吗

  2. Good morning. I’m a student from Ecuador. I’m doing research about object recognition and found your work really interesting and useful. I really appreciate that information is available in your site, but it would be helpful for my project if you could share with me the Windows Executable and the source code if it is possible. My project is about semantic web and digital television. I have just started working on this so I will be waiting for your answer.

    Thank you very much.


  3. 您好,我是来自中国西安的一位大四学生,最近正好在拜读您的CVPR2014有关语义分割的论文,现在有一个不情之请,想学习一下您这篇文章的代码,不知是否介意发送一份您的代码给我?


Comments are closed.