How We Tackled Color Identification

It’s common to search by color on an ecommerce website. Unfortunately though, a purely text-based search index might not have all the information it needs to return the most relevant results when a user wants to search by color.

For example, searching for a “white t-shirt” might return results with thumbnails of clearly red or blue t-shirts just because their descriptions mention that the same cut also comes in white. Whether it’s technically correct or not, including those results at least gives the user the impression that our search engine is broken. What can we do about this?

The logical next step is to create some system that can automatically identify the color of the object in the thumbnail. There are some open source scripts that exist specifically for this, like josip/node-colour-extractor or lokesh/color-thief.

However, they largely work by finding the most common pixel value in an image, which comes with a few problems:

  1. That just gives us the color of the background in most cases because it takes up more space in the image.
  2. It also returns the value in RGB, which is too precise to be valuable in our context. We need general words that a user might include in a search query.

So instead in this article, we’re going to make something closer to, which identifies the foreground color in a queryable word. That commercial application is going to work better than what we come up with in this article, but if you’d like to see the process or whether our approach will fit your project’s requirements, read on!

Our chosen method

This problem is relatively complex, so the potential solutions are numerous. Most state-of-the-art computer vision frameworks nowadays go down the Deep Learning path, classifying images with Convolutional Neural Networks. This approach leads to some astounding results, but the huge dataset and specialized hardware places it slightly out of the scope of an experiment like this.

Deep learning frameworks are also notoriously hard to set up and run, and we wanted to release this as open-source so you can take a stab at it. Here’s the process we settled on:

  1. Preprocessing
  2. Focusing on what we’re classifying
  3. Finding clusters of similar pixels
  4. Picking names for the colors


Since we were imagining using this on a fashion ecommerce website, we thought that it might make sense to crop the image to just the foreground. That should make it easier for our algorithm to identify which part of the image really matters for our usecase, since it’ll take up more of the image. Also, because we only cared about the primary color of the main object in the picture, detail was probably just going to confuse our algorithm and lengthen the processing time.

So our next step was to shrink all of the images down to about 100x100px. Our tests showed that this was close to optimal for our case. Here’s what that process looked like for this image:


Resizing and cropping (original on the left)

Focusing on what we’re classifying

The background could be plain white or very complex, but either way, we don’t want to get any data from it. How do we separate it from the data that we care about? It’d be easy to make rules or assumptions, like mandating that the background be a plain color or where the main object should be in the picture.

But to allow for broader use cases, we’ll try to combine several more common algorithms to handle reasonably complex backgrounds.

Let’s start by thresholding over the image, which involves taking the delta E distance between every pixel and the four corner pixels of the image, and considering them part of the background if the result is below some threshold. This step is particularly useful on more complex backgrounds because it doesn’t use edge detection, but that makes it vulnerable to picking up on gradients and shadows incorrectly.


Global thresholding struggling with shadows
Global thresholding struggling with shadows

To fix that, we’ll use flood filling and edge detection to remove any shadows and gradients. The technique is largely inspired by this article on the Lyst engineering blog. The trick here is that backgrounds are usually defined by the absence of sharp edges. So if we smooth the image and apply a Scharr filter, we’ll usually get a clear outline of our foreground.

The caveat is just that complex backgrounds usually fool this test, so we’ll need to combine it with the thresholding from earlier for the best results.

Shadows are handled by edge detection and flooding
Shadows are handled by edge detection and flooding

Sometimes one of these two steps messes up and removes way too many pixels and we have to ignore the result of our background detection. They should work well on clean input, though, so we’ll call that part done.

The last thing we’ll want to remove from our image before trying to classify the color is skin. Images of clothing worn by models (especially swimwear) will often contain more pixels representing skin than representing the clothing item, so we’ll need to get it out of the image or our algorithm will just return that as the primary foreground color.

This is another rabbit hole we could easily dive down, but the simpler answer is to just filter out pixels in that general range. We chose a filter with a decent false-positive rate to reduce the risk of seeing orange-tinted clothes entirely as skin, but because that’s a possibility, we made this step completely optional in the final script.


Detecting skin pixels
Detecting skin pixels

Finding clusters of similar pixels

Earlier we mentioned that RGB values weren’t going to do; we need textual output. We need something that a user might search for on a fashion ecommerce website. But now that we’ve isolated the clothing item in the image, how do we isolate such a subjective label from it?

In the blouse the woman is wearing in the image above, you can see how the precise color might change from pixel to pixel because of how it is folding in the wind, where the light source is, and how many layers of fabric are visible at that exact spot. We need to find a cluster of pixels of similar color, average them together, and figure out which color category they belong to.

We don’t know how many clusters we’ll need though, since different pictures could have different amounts of distinct colors in the foreground. We should be able to find that using the jump method, where we set some error threshold and only include pixels in a cluster if they’re (a) connected to that cluster, and (b) below the threshold amount of color distance when compared to the pixel at the center of the cluster. This will create as many clusters as appears to be necessary, and then (if we wanted to) we could go along the borders of the clusters and reevaluate which group they should belong to.

That step would give us finer edges, but it’s not really necessary for our use case, and it would just waste processing time.

This process gives us few enough clusters that similar colors are grouped together (for easy categorization):

Clustering a gray T-shirt
Clustering a gray T-shirt (from left to right: original, background, skin, clusters)


While also, distinctly different colors in the foreground are still given separate clusters:


Rainbow clusters
Rainbow clusters (from left to right: original, background, clusters)

Picking names for the colors

The last step is categorizing our cluster average colors into readable English names. That sounds like a difficult problem, given the subjectivity (not to mention cultural implications) of what counts under each color category. We turned to a K-Nearest-Neighbors algorithm to give color names to RGB values, thanks to the XKCD Color Survey. The survey consists of 200,000 RGB values labeled with 27 different color names (e.g. black, green, teal, etc.) that we use to train a scikit-learn KNeighborsClassifier.

It works fairly well, but there is one big edge case in which it fails: gray. In the RGB system (where each channel is a dimension), the shades of gray form a plane through the space. That makes it hard to define a region where every value we want to be categorized as gray is closer to the center of the gray region than to the center of the regions that represent other colors.

We ended up sticking an extra step in to compute the distance between RGB values and their projection on the gray plane and if that distance is less than a certain threshold, we’ll call it gray and skip the classifier. The overall process isn’t super elegant and it has its drawbacks, but it works well for this case.

The final result

This experiment is available for you to mess around with here on GitHub! It’s a standalone Python script that you can use to enrich your search index with extracted color tags. We also made a library and a CLI for your convenience. Everything is completely configurable – we’re using sane defaults but left the option open to tweak all of the steps above. You can even add support for other languages by messing with the color dataset. We’re not using machine learning (yet), but we managed to hit ~84% accuracy even without the dramatic boost that route would get us.

Given this tool is designed to be used in conjunction with textual search, boosting images whose colors match what was searched for, this seems like a great place to leave version 1. Check back in next time for version 2!

About the authorLéo Ercolanelli

Léo Ercolanelli

Software Engineer

Recommended Articles

Powered by Algolia AI Recommendations

Visual Shopping & Visual Discovery – How image search reimagines online shopping

Visual Shopping & Visual Discovery – How image search reimagines online shopping

Julien Lemoine

Julien Lemoine

Co-founder & former CTO at Algolia
How do I look? — Image recommendation AI in practice

How do I look? — Image recommendation AI in practice

Jaden Baptista

Jaden Baptista

Technical Writer
Paul-Louis Nech

Paul-Louis Nech

Senior ML Engineer
Taking documentation search to new heights with Algolia and Autocomplete

Taking documentation search to new heights with Algolia and Autocomplete

Sarah Dayan

Sarah Dayan

Principal Software Engineer