The fashion clothing industry is moving towards fast fashion, enforcing the retail markets to design products at a quicker pace, while following the fashion trends and their consumer’s needs. Thus, Artificial Intelligence (AI) techniques are introduced to a company’s entire supply chain, in order to help the development of innovative methods, solve the problem of balancing supply and demand, increase the customer service quality, aid the designers, and improve overall efficiency.
This blog describes building a recommendation system for clothes using content-based recommendations. In this, we are going to discuss a system that is capable of doing large-scale visual recommendations. The visual recommendation model can incorporate visual signals directly into the recommendation objective. Essentially, a user interested in buying a particular item from the screen may want to explore visually similar items before finishing his/her purchase. These could be items with similar colours, patterns, and shapes.
The client expects that the user has an image of clothing that he/she likes and now he/she searches for clothes similar to it on the application. The search results include two types of clothes
Clothes that were similar to the clothing (visually) from the same domain
Clothes that were from other domains could buy based on query results
Pipeline For Visual Recommender
For a visual recommendation system, split the process of recommendation or retrieving top k similar instances (images in the research phase and products in case of deployment phase) into the following subtasks -
An image comes into the system
A comparison system compares this image with all of the images in the databases or with a subset of images (in our case) and gives a similarity score to each of them
The recommender system returns the top k images with the largest similarity scores
Steps in a Comparisons System -
Select an algorithm to extract features from the raw images in the database and query images for comparison.
Select a similarity metric to find the similarity between two extracted feature vectors.
To get good results we have to select a feature extractor and similarity metric.
Dataset
For experimentation of the system pipeline and optimisation, we need actual datasets. To train new models and fine-tune the existing models we need a dataset that has enough images.
The idea behind the training and fine-tuning was to focus models' understanding on clothes only. During work, we used three datasets -
The dataset from the client
Clothing Dataset
Fashion product Images dataset (link)
Uses of above datasets -
Clothing Dataset
This data set contains 5000 images of various clothes. There is 20 type of clothes in the dataset. This dataset was used to train the new models (like Autoencoders), and fine-tuning the famous architectures from image net rankings. We used this dataset to get an accuracy metric for the retrieval system, precision at k metric and a relevance score.
Fashion Product Images Dataset
This data set contains around 41k product images. This dataset contains 40 classes concerning the way of wearing and 23 classes concerning the colour of clothes. The clothes are labelled based on colour. All the models were fine-tuned or trained on this dataset.
Metrics
To compare between two recommender systems we can do two things: either we can look at the results produced or we can compare the mathematics numbers given by a typical metric which is used to compare recommendation systems. There are several metrics available for ranking/ recommender systems but the most famous is mean average precision@k [link]. We are using one more metric which is named as average relevance score.
Relevance score
The problem with preicsion@k is that it only considers the number of correct samples( belong to the same class as a query) in a search result not their ordering but relevance score considers both the number of correct samples and their ordering.
The formula for relevance metric for a search result is
Relevance Score =
X1, X2, X3,.,Xi...Xn are samples with i as their rank [0 means top]
Where Yi = 1 if sample is a correct else it is 0
The average relevance score can be understood as the average relevance score for multiple queries. To get a good score on both metrics we need to get a good feature representation which needs a good model for feature extraction. Let’s discuss the training of the feature extractor.
Training Feature Extractor
Features extractor converts the images of 256X256X3 size to a vector of size 512 or similar. It can be a deep learning model like VGG16, ResNet, EfficientNet, DenseNet with Weights from ImageNet. The model weights can be improved by training a few layers on the clothes dataset.
The feature extractor can be supposed to be the backbone of a classification model or an encoder in an Autoencoder Decoder system. For example, the outputs from the second last layers of a classifier like ResNet50 can be used for feature extraction.
Similarity Metric Selection
The similarity metrics/distance metrics we experimented with are Cosine Similarity, Euclidean Similarity, Manhattan Similarity. We need a similarity score to sort the products for recommendation and give a score that how similar two product images are.
The selection of similarity is a crucial part of this project as it can affect the relevance and precision.
Results
The results are from the top three models and performed on client-dataset. The first image is the query image with 100% match.
VGG16
DeneNet121
Auto Encoder
Performance
At the time of deployment of the model, speed is highly important and a late result is a major concern. For example, a late result on an amazon search can increase the chances of moving customers to Flipkart.
The above plot shows AutoEncoder(CNN Encoder) is 4.7 times faster than VGG.
Future Work
Train base models with triplets (link)
More similarity metrics can be experimented with.
Thank you for reading, we hope you find our article interesting!
If you want to integrate data insights in your business then contact us here - contact@godatainsights.com
Author-
Comments