Feature Spotlight: Query Rules
You’re running an ecommerce site for an electronics retailer, and you’re seeing in your analytics that users keep ...
Technical Writer
You’re running an ecommerce site for an electronics retailer, and you’re seeing in your analytics that users keep ...
Technical Writer
What do OpenAI and DeepMind have in common? Give up? These innovative organizations both utilize technology known as transformer models ...
Sr. SEO Web Digital Marketing Manager
As a successful in-store boutique manager in 1994, you might have had your merchandisers adorn your street-facing storefront ...
Search and Discovery writer
At Algolia, our business is more than search and discovery, it’s the continuous improvement of site search. If you ...
JavaScript Library Developer
Analytics brings math and data into the otherwise very subjective world of ecommerce. It helps companies quantify how well their ...
Technical Writer
Amid all the momentous developments in the generative AI data space, are you a data scientist struggling to make sense ...
Sr. SEO Web Digital Marketing Manager
Fashion ideas for guest aunt informal summer wedding Funny movie to get my bored high-schoolers off their addictive gaming ...
Sr. SEO Web Digital Marketing Manager
Imagine you're visiting an online art gallery and a specific painting catches your eye. You'd like to find ...
Senior Software Engineer
At Algolia, our commitment to making a positive impact extends far beyond the digital landscape. We believe in the power ...
Senior Manager, People Success
In today’s post-pandemic-yet-still-super-competitive retail landscape, gaining, keeping, and converting ecommerce customers is no easy ...
Sr. SEO Web Digital Marketing Manager
There are few atmospheres as unique as that of a conference exhibit hall: the air always filled with an indescribable ...
Marketing Content Manager
To consider the question of what vectors are, it helps to be a mathematician, or at least someone who’s ...
Search and Discovery writer
My first foray into programming was writing Python on a Raspberry Pi to flicker some LED lights — it wasn’t ...
Technical Writer
How well do you know the world of modern ecommerce? With retail ecommerce sales having exceeded $5.7 trillion worldwide ...
Sr. SEO Web Digital Marketing Manager
In a world of artificial intelligence (AI), data serves as the foundation for machine learning (ML) models to identify trends ...
Director of AI Engineering
Imagine you’re a leading healthcare provider that performs extensive data collection as part of your patient management. You’re ...
Search and Discovery writer
In an era where customer experience reigns supreme, achieving digital excellence is a worthy goal for retail leaders. But what ...
Marketing Content Manager
Just a few years ago it would have required considerable resources to build a new AI service from scratch. Of ...
VP, Engineering
It’s common to search by color on an ecommerce website. Unfortunately though, a purely text-based search index might not have all the information it needs to return the most relevant results when a user wants to search by color.
For example, searching for a “white t-shirt” might return results with thumbnails of clearly red or blue t-shirts just because their descriptions mention that the same cut also comes in white. Whether it’s technically correct or not, including those results at least gives the user the impression that our search engine is broken. What can we do about this?
The logical next step is to create some system that can automatically identify the color of the object in the thumbnail. There are some open source scripts that exist specifically for this, like josip/node-colour-extractor or lokesh/color-thief.
However, they largely work by finding the most common pixel value in an image, which comes with a few problems:
So instead in this article, we’re going to make something closer to Vue.ai, which identifies the foreground color in a queryable word. That commercial application is going to work better than what we come up with in this article, but if you’d like to see the process or whether our approach will fit your project’s requirements, read on!
This problem is relatively complex, so the potential solutions are numerous. Most state-of-the-art computer vision frameworks nowadays go down the Deep Learning path, classifying images with Convolutional Neural Networks. This approach leads to some astounding results, but the huge dataset and specialized hardware places it slightly out of the scope of an experiment like this.
Deep learning frameworks are also notoriously hard to set up and run, and we wanted to release this as open-source so you can take a stab at it. Here’s the process we settled on:
Since we were imagining using this on a fashion ecommerce website, we thought that it might make sense to crop the image to just the foreground. That should make it easier for our algorithm to identify which part of the image really matters for our usecase, since it’ll take up more of the image. Also, because we only cared about the primary color of the main object in the picture, detail was probably just going to confuse our algorithm and lengthen the processing time.
So our next step was to shrink all of the images down to about 100x100px. Our tests showed that this was close to optimal for our case. Here’s what that process looked like for this image:
The background could be plain white or very complex, but either way, we don’t want to get any data from it. How do we separate it from the data that we care about? It’d be easy to make rules or assumptions, like mandating that the background be a plain color or where the main object should be in the picture.
But to allow for broader use cases, we’ll try to combine several more common algorithms to handle reasonably complex backgrounds.
Let’s start by thresholding over the image, which involves taking the delta E distance between every pixel and the four corner pixels of the image, and considering them part of the background if the result is below some threshold. This step is particularly useful on more complex backgrounds because it doesn’t use edge detection, but that makes it vulnerable to picking up on gradients and shadows incorrectly.
To fix that, we’ll use flood filling and edge detection to remove any shadows and gradients. The technique is largely inspired by this article on the Lyst engineering blog. The trick here is that backgrounds are usually defined by the absence of sharp edges. So if we smooth the image and apply a Scharr filter, we’ll usually get a clear outline of our foreground.
The caveat is just that complex backgrounds usually fool this test, so we’ll need to combine it with the thresholding from earlier for the best results.
Sometimes one of these two steps messes up and removes way too many pixels and we have to ignore the result of our background detection. They should work well on clean input, though, so we’ll call that part done.
The last thing we’ll want to remove from our image before trying to classify the color is skin. Images of clothing worn by models (especially swimwear) will often contain more pixels representing skin than representing the clothing item, so we’ll need to get it out of the image or our algorithm will just return that as the primary foreground color.
This is another rabbit hole we could easily dive down, but the simpler answer is to just filter out pixels in that general range. We chose a filter with a decent false-positive rate to reduce the risk of seeing orange-tinted clothes entirely as skin, but because that’s a possibility, we made this step completely optional in the final script.
Earlier we mentioned that RGB values weren’t going to do; we need textual output. We need something that a user might search for on a fashion ecommerce website. But now that we’ve isolated the clothing item in the image, how do we isolate such a subjective label from it?
In the blouse the woman is wearing in the image above, you can see how the precise color might change from pixel to pixel because of how it is folding in the wind, where the light source is, and how many layers of fabric are visible at that exact spot. We need to find a cluster of pixels of similar color, average them together, and figure out which color category they belong to.
We don’t know how many clusters we’ll need though, since different pictures could have different amounts of distinct colors in the foreground. We should be able to find that using the jump method, where we set some error threshold and only include pixels in a cluster if they’re (a) connected to that cluster, and (b) below the threshold amount of color distance when compared to the pixel at the center of the cluster. This will create as many clusters as appears to be necessary, and then (if we wanted to) we could go along the borders of the clusters and reevaluate which group they should belong to.
That step would give us finer edges, but it’s not really necessary for our use case, and it would just waste processing time.
This process gives us few enough clusters that similar colors are grouped together (for easy categorization):
While also, distinctly different colors in the foreground are still given separate clusters:
The last step is categorizing our cluster average colors into readable English names. That sounds like a difficult problem, given the subjectivity (not to mention cultural implications) of what counts under each color category. We turned to a K-Nearest-Neighbors algorithm to give color names to RGB values, thanks to the XKCD Color Survey. The survey consists of 200,000 RGB values labeled with 27 different color names (e.g. black, green, teal, etc.) that we use to train a scikit-learn KNeighborsClassifier.
It works fairly well, but there is one big edge case in which it fails: gray. In the RGB system (where each channel is a dimension), the shades of gray form a plane through the space. That makes it hard to define a region where every value we want to be categorized as gray is closer to the center of the gray region than to the center of the regions that represent other colors.
We ended up sticking an extra step in to compute the distance between RGB values and their projection on the gray plane and if that distance is less than a certain threshold, we’ll call it gray and skip the classifier. The overall process isn’t super elegant and it has its drawbacks, but it works well for this case.
This experiment is available for you to mess around with here on GitHub! It’s a standalone Python script that you can use to enrich your search index with extracted color tags. We also made a library and a CLI for your convenience. Everything is completely configurable – we’re using sane defaults but left the option open to tweak all of the steps above. You can even add support for other languages by messing with the color dataset. We’re not using machine learning (yet), but we managed to hit ~84% accuracy even without the dramatic boost that route would get us.
Given this tool is designed to be used in conjunction with textual search, boosting images whose colors match what was searched for, this seems like a great place to leave version 1. Check back in next time for version 2!
Software Engineer
Powered by Algolia Recommend