No matter how well crafted a search engine may be, if the query it is given is incomplete or inaccurate, the search results will always seem a little off.
E-commerce often falls prey to this because large catalogs and innumerable filters complicate curation. Since relevancy is at the very top of our priority list, let us introduce a little experiment of ours, designed especially to enhance the search experience when it comes to image-driven content.
The goal we set for our computer vision experiment was to detect the color of the dominant object of a picture. Searching by color is a common occurence when it comes to e-commerce websites, and while the nature of the object is usually clear in the description, its color is often more of a problem. As an example, let us consider an article containing several colors in its description (for the sake of the example: white and blue), but having a picture clearly showcasing a blue shirt. While searching for a white shirt this article may show up and may even rank very high, because the relevant color – white, is present in the description. However correct this may be, having a number of images showing a blue object while searching for a white one does not give the best impression of relevance to the end user. The idea is certainly not to remove these results (because they are still correct), but to boost the ranking of the ones which more closely represent objects of the searched color.
Some colorful context…
We are obviously not the first to attempt to address this problem, but we’ve taken a slightly different approach from the current solutions in the market. We can divide the applications trying to solve this problem into two different groups that provide different experiences:
- Commercial applications like vue.ai
- Open source scripts like josip/node-colour-extractor or lokesh/color-thief
The open-source scripts listed are going for a different and simpler problem. They extract the palette composed of the principal colors of an image. While they do the job well, they can’t be applied out of the box to fashion images because we are not interested in the colors in the entire image. Using them directly would result in the color of the background being detected as the main color of the given image. Moreover they do not provide a word for the extracted colors, and indexing RGB values would not improve the search experience.
Vue.ai showcases some impressive results and provides many more features than our experiment. We wanted to provide a quick, easy way for users to enrich their records a bit and improve their ranking, especially if they have to deal with poor descriptions in their records. This script doesn’t need you to register or anything: all you have to do is download, install dependencies, enjoy.
The problem we try to address is a relatively complex one and can be solved by a number of different ways. Most state-of-the-art computer vision frameworks these days adopt the Deep Learning path by using techniques such as Convolutional Neural Network to classify images. This is a path we decided not to take. Neural networks offer astounding results, but to do so they also need large datasets in addition to dedicated hardware which shortens the computation time. We started this project as an experiment and we began with simpler methods to see where those would take us. We always knew that we would eventually open-source this project allowing anyone to run it and add information to their records: deep learning frameworks can be somewhat hard to setup and run; they also require a great deal of computing time. The sequence of algorithms we chose was able to process images reasonably fast on a machine with a classic CPU and can be installed as simply as any other python package.
Because we only cared about the color of the dominant object in the picture, a high level of detail was likely to confuse the algorithms. Moreover, smaller images meant less computing. After performing tests on several images we observed that a size of 100×100 was a good compromise.
We really had the use-case of a fashion e-commerce website in mind while designing this tool, and it led to the idea of an additional step to improve our color detection tool: cropping.
The object we are interested in is highly likely to be centered within the image, cropping allows us to reduce the size of the background image. Although cropping does yield better results overall, keeping less than 90% of the original image doesn’t work well with background removal when considering a heterogeneous data set. This is due to the main object, sometimes touching the corners of the picture as a result of which it gets considered as the “background” for subsequent algorithms.
Be it plain white or complex, the background isn’t something we want to interfere with, while detecting the color of the main object of a picture. Separating the “foreground” and the “background” is not an easy task unless you know that it follows a simple convention (i.e. plain white), or exactly where the targeted object is placed in the picture. We make none of these assumptions to allow for broader use cases, and combine several simple computer vision algorithms to handle reasonably complex backgrounds.
The first step can be viewed as a thresholding over the image. We take the pixel values at the four corners, identify all pixels that are close in terms of color-shade and consider them to be a part of the background. We use the metric to determine the distance between two colors. This step is particularly useful on more complex backgrounds because it doesn’t use information such as edges, but can’t discard gradient or shadows efficiently.
The second step uses flood filling and edge detection to remove shadows and gradients. The technique is largely inspired by this article on the Lyst engineering blog. The main goal is to remove the background- generally indicated by the absence of sharp edges. To achieve this, several steps of smoothing are used to maximize the efficiency of the Scharr filter used. Contrary to the first step, this step doesn’t behave well on complex backgrounds that contain edges themselves, or in cases where the object and the background don’t have sharp edges separating them.
By using both of these filters we try to get the best of both worlds. However, this is not always achievable, sometimes one of the two steps does not accurately separate the background and the foreground and ends up removing way too many pixels. When this happens, the result of the step is simply ignored.
A lot of pictures include models and if the targeted object is a swimsuit, the number of pixels representing skin can be greater than the number of pixels we are actually interested in. For cases like this, we implemented a really simple method to discard additional pixels.
Detecting skin is an open problem and the possible solutions range from testing if each pixel color can be considered as skin to training a complex artificial intelligence model. We went for the lower end of the spectrum of complexity: a simple filter using only the color of the pixels to classify them.
There are a lot of skin filters out there, and each has their pros and cons. We chose one based on its false-positive rate. Indeed most of the filters only consider the color of the pixel – in fact it is very common (unfortunately) to see orange-tinted clothes entirely labeled as skin. Choosing a low false-positive rate reduces this risk but doesn’t remove it, and because all data sets don’t contain skin, we made this step completely optional in the final script.
For now, a single filter has been implemented which tries to be as versatile as possible, however we definitely plan to implement several narrower ones to be able to target certain skin types more efficiently.
We have no interest in the multiple shades of colors that images may contain, but only for the canonical, definitive names. To achieve this we use a clustering algorithm to group similar pixels together.
All pictures are not equal in terms of number of colors, and even if we want to group together different shades of the same color, we don’t want to say mix blue and green pixels. Because of this property, we can’t build the same number of clusters regardless of the image. We try to detect the best number of clusters, using the “jump” method for which a simple description can be found here on Wikipedia. This technique is relatively easy to understand. The error in a particular case of clustering can be measured using the following formula where is the center of the cluster that belongs to:
This formula says that the greater the distance from a particular point to its attributed cluster’s center, the higher the error. This value will always decrease when clusters are added, until it reaches the point where which gives an error of 0. However, this error won’t decrease at the same rate every time a cluster is added. At first, each added cluster will decrease the error by a huge amount, and at one point, the gain will become much smaller. The goal is to find the moment when adding a cluster is way less significant.
The algorithm uses a slightly more complex computation of error, but the idea behind it is the same. This way we adapt the number of clusters to have colors close to one another grouped together but still enough clusters not to group distinctly different colors on multi-colored items.
Attributing color names
Last but not the least we have to return readable names in a given language, say English, in the results and not just RGB values. This problem is harder to solve than it looks: the color space is huge (~16 million possible values), and different colors have very heterogeneous sizes. Defining each possible color range by hand is very tedious and yields poor results in a lot of cases. We turned to a K-Nearest-Neighbors algorithm to give color names to RGB values, thanks to the XKCD Color Survey. The XKCD survey consists of 200,000 RGB values labeled with 27 different color names (e.g. black, green, teal, etc.) that we use to train a scikit-learn KNeighborsClassifier.
This method is still far from perfect; for example, it is not able to handle shades of grey. We are working with the RGB system, and the grey colors form a plan in this colorspace. Because of this topology, there are never enough neighbors (using a euclidean distance) to categorize a color as grey. To solve this we added an additional step dedicated to these colors. By computing the distance between RGB values and their projection on the grey plan, it is possible to tell how “close” the color actually is to grey. We then apply an arbitrary threshold to classify the color as grey, or continue with the classifier.
The overall process still has drawbacks. For example it doesn’t provide enough insights to be able to put a color in multiple categories (sometimes it is hard to tell if a color is green or blue for instance, and we may like to label it both green and blue).
The final result
Today this computer vision experiment is available here on github as a stand-alone python script that anyone can run to enrich their records containing an image URL with the extracted color tags. Both a library for more advanced usage and a CLI are available for use as well.
This is as simple as it gets using the CLI:
./color-extractor color_names.npz image.jpg > red,black
We ditched the idea of a dedicated search engine optimized for images for to a simple reason: Algolia already has a state of the art search engine, and not taking advantage of it would be a pity!
A self-serve script fulfills another criterion close to our hearts: configurability. Our script uses sane defaults, but all steps described above can be tweaked to yield better results on a specific catalog. These tweaks range from making certain steps more or less aggressive to deactivating them entirely. We also provide a way to support any language: the dataset used by the script to give color names to images can be changed by the user, thus making room for non-english color names.
Our script doesn’t use any kind of machine learning yet and we’re aware that results could be greatly improved by going down this road. For this first release our tool features ~84% accuracy. While this is still low, our tool isn’t designed to be used on its own, but in conjunction with your textual searches by boosting images whose colors match what is searched for.