News: the FGVC iMat-fashion challenge is alive! (in conjunction with CVPR 2019)

What is Fashionpedia Knowledge Graph?

Fashionpedia knowledge graph data representation

The Fashionpedia knowledge graph relies on the notions of object (similar to ‘item’ in Wikidata) and statement. Objects represent common items in apparels. Statements describe detailed characteristics of an object and consist of a relationship (similar to ‘property’ in Wikidata) and an attribute (similar to ‘value’ in Wikidata). For a garment object, we can add a relationship to specify what the silhouette of this garment is by specifying an attribute for the garment silhouette. For a button object, we can assign a material type relationship by specifying a material attribute. In this section, we break down each component of the Fashionpedia knowledge graph (Figure 1) and explain how a large-scale fashion ontology can be built upon the backbone of the Fashionpedia knowledge graph structure.

1. People, main objects, and sub-objects and their segmentations

In the Fashionpedia dataset, all images will be exhaustively annotated with main objects (Figure 2). Each main object will also be exhaustively annotated with its sub-object (Figure 3). For example, general garment types such as jacket, dress, pants are considered as main objects. These garments also consist of several sub-objects such as pockets, collars, sleeves and buttons. Main objects are divided into three main categories: outerwear, intimate and accessories. Sub-objects also have different types: garment parts (e.g. collars, sleeves), bra parts, closures (e.g. button, zipper) and decorations (e.g. embroidery, ruffle). In the current version of Fashionpedia, each image consists of an average of 1 person, 6 main objects, and 12 sub-objects, each delineated by a tight segmentation mask (Figure 4). Furthermore, each object is canonicalized to a synset ID in our Fashionpedia knowledge graph (Figure 6).

2. Fine-grained attributes

Each object and sub-object will be associated with related apparel attributes (Figure 4). For example, in Figure 5, ‘button’ is the sub-object of the main object ‘jacket’. ‘Jacket’ can be linked with the silhouette attribute ‘symmetrical’. Sub-object ‘pocket’ could contain attribute ‘metal’ with relationship of material. Each image in Fashionpedia has an average of 16 attributes. As with objects, we canonicalize all attributes to our Fashionpedia knowledge graph.

3. Relationships

There are three main types of relationships: 1) outfits to objects, objects to sub-objects: meronymy (part-of) relationship (Figure 5); 2) main objects/sub-objects to attributes: these relationships types can be garment silhouette (e.g. peplum), collar nickname (e.g. peter pan collars), textile type (e.g. lace), textile finishing & dyeing (e.g. distressed), or textile-fabric patterns (e.g. paisley), etc; 3) within objects/sub-object/attributes: there are a maximum of four levels of Hyponymy (is-an-instance-of) relationships. For example, weft knit is an instance of knit fabric, and fleece is an instance of weft knit.

4. Apparel graphs

Integrating the objects, attributes and relationships, we create an apparel graph representation for each outfit in an image. Each apparel graph is a structured representation of an outfit ensemble, containing certain types of garments. Nodes in the graph represent main objects, sub-objects, and attributes. Main objects and sub-objects are linked to their respective attributes through different types of relationship. The relationships connecting objects and attributes point from the main objects to the attributes and from the sub-objects to their corresponding attributes. Figure 5 illustrates one example of the apparel graph for jacket.

5. Fashionpedia knowledge graph

While apparel graphs are localized representations of certain outfit ensembles in fashion images, we also create a single Fashionpedia knowledge graph (Figure 6). The Fashionpedia knowledge graph is the union of all apparel graphs and contains entire main/sub objects, attributes, and relationships. By doing so, we are able to combine multiple levels of information in a more coherent way.