Dense Semantic Image Segmentation with Objects and Attributes
Shuai Zheng1 Ming-Ming Cheng1 Jonathan Warrell1 Paul Sturgess1 Vibhav Vineet1 Carsten Rother2 Philip H. S. Torr1
The concepts of objects and attributes are both important for describing images precisely, since more informative verbal descriptions often contain both adjectives and nouns (e.g. ‘I see a shiny red chair’). In this project, we formulate the problem of joint visual attribute and object class image segmentation as a dense multi-labeling problem, where each pixel in an image can be associated with both an object-class and a set of visual attribute labels.
Fig. 1 Goal of the semantic image segmentation.
Fig.2 Illustration of Factorial-CRF-based Semantic Segmentation for object classes and Attributes. (a) shows the input image. (b) shows the ground truth mask image for object classes. (c) shows the attributes masks. (d) compares various CRF topologies including a grid CRF , a fully-connected CRF , and a hierarchial fully connected CRF
Dataset for Semantic Image Segmentation for Objects and Attributes
Fig. 3 Extra annotation on NYU dataset (aNYU: Attributes augmented NYU Dataset). Attributes annotations on aPascal, CORE dataset.
Fill in a simple survey to let us to notify you when we have new update.
aNYU dataset for semantic image segmentation with objects and visual attributes
In this paper, we augment the NYU dataset with 11 additional visual attributes(1: Wood(Material) 2: Painted(Material) 3: Paper(Material) 4: Glass(Material) 5: Brick(Material) 6: Metal(Material) 7: Flat(Shape) 8: Plastic(Material) 9: Textured(Material) 10: Glossy(Surface) 11: Shiny(Surface)).
We have released this dataset (1449 Images in total, with train/validation split as follows. You can also random shuffle the 1449 images, and then take top 725 images for training, then 100 images for validation, and the rest 624 images for test). You can also use the AttriMarker tool to annotate other attributes you are interested in.
aNYU.tar.gz (md5: 6660ea0d900c51ec14e0122352aa92ae, 111MB)
traintestsplit_aNYU_ImageSpirit.tar.gz (md5: 3a149fafecc82dc11c621c7de87d54bf, 40K)
Data.zip(Annotation files, md5:f159d93f63a7e4289b2ab5145ba08273, 70MB)
(R:random value for visualization, G:object_class_id, B:attribute_id), for the generated annotation, first column is the region_id, second column is the object_class_id, from third to the later are the attribute_id)
CORE dataset for semantic image segmentation with objects and visual attributes
In this paper, we use the CORE dataset for evaluating dense semantic image segmentation for objects and visual attributes (including 1) Bare Metal, 2) Feathers, 3) Fur/Hair,Glass, 4) Painted Metal/Plastic, 5) Rubber,6) Scales,7) Skin, 8) Wood). This dataset contains 1059 Images in total, with train/validation split as follows. There are 594 val images, and 465 training images.
CORE.rar (md5: 8ec5f2357bf3d2301a4a9018f5ca984e, 117MB)
aPascal dataset for semantic image segmentation with objects and visual attributes
In this paper, we transfer the aPascal dataset for detection to a new aPascal dataset for segmentation, by looking into the ground truth masks of train/validation sets in Pascal 2007-2012 datasets. The resulting dataset is a new aPascal dataset to evaluate dense semantic image segmentation for objects and visual attributes. This dataset contains 639 Images in total, with train/validation split as follows. There are 313 val images, and 326 training images. Region-level attributes in this dataset are those attributes annotated in original aPascal dataset.
aPascal.tar(md5: bf3ae4591ae270992a2d2e7eb177339d, 107MB)
Note you might need to use ALE library to create the pixel-wise unary potential.
 Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, Philip H. S. Torr, “Dense Semantic Image Segmentation with Objects and Attributes“, IEEE International Conference Computer Vision and Pattern Recognition (IEEE CVPR), 2014. (accepted)[bib][slides][poster][video]
 Shuai Zheng, Ming-Ming Cheng, Wen-Yan Lin, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip H. S. Torr. “ImageSpirit: Verbal Guided Image Parsing” , ACM Transactions on Graphics, 2014. (* indicates the equal contribution.) [bib][project][youtube][youku]
 A. Adams, J. Baek, and A. Davis. Fast High-Dimensional Filtering Using the Permutohedral Lattice, Eurographics 2010.
 Philipp Krähenbühl and Vladlen Koltun. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. NIPS 2011.
 A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth, “Describing Objects by their Attributes”, CVPR 2009.
 Genevieve Patterson, James Hays. SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes. CVPR 2012.
 Ruiqi Guo and Derek Hoiem. Support Surface Prediction in Indoor Scenes ICCV, 2013.
 Sean Bell, Kavita Bala, Noah Snavely. Intrinsic Image in the wild. ACM TOG, 2014.
 Joseph Tighe and Svetlana Lazebnik “SuperParsing: Scalable Nonparametric Image Parsing with Superpixels,” International Journal of Computer Vision, 2013.
 Derek Hoiem, Alexei A. Efros, and Martial Hebert. Geometric Context from a Single Image. ICCV 2005.
This project is supported by EPSRC EP/I001107/2 ” Scene Understanding using New Global Energy Models”, ERC HELIOS 2013-2018Advanced Investigator Award “Towards Total Scene Understanding using Structured Models”, and Google Research Award 2012-2013. Carsten Rother was awarded an ERC Consolidator Grant.