Locating hard-to-find information without using a search bar may sound paradoxical, but take the example of package labels. Delivery companies structure a label’s information in many different and unpredictable ways. Normally, we just read the labels ourselves and find the relevant information — in this case, who the package is for. But we’d prefer to integrate OCR into search.
But what if there are thousands of labels, all differently structured — different content, sometimes handwritten, in shades of nearly unreadable grays, in odd directions of text, with random lines, graphics, smudges, and torn-off corners? Can OCR help?
Optical character recognition (OCR) makes a best effort to extract text from images, but the resulting text will often be large and unstructured. Additionally, if the OCR software can’t decipher some letters, the text will have typos.
This is where an integrated search technology comes in. A search engine with a robust, adaptable relevance can match an OCR’s unstructured text against a structured set of data and return the correct results.
We integrated our search engine into two technologies:
Essentially, we scanned a label and used Google Cloud Vision API to convert the label to text. We then fed the unpredictable output into our search engine, which matched it against the structured data of BambooHR, finding and returning the recipient’s name. Importantly, we didn’t need to pre-process or parse the input data. This workflow can also work with stickers, stamps, and even movie posters on a wall.
Online retailers and media companies are leveraging this OCR + search integration to query their back-end systems.
On a daily basis, Algolia employees receive loads of packages at the Paris office. Kumiko, our office coordinator, had been taking care of them. Every time a new package arrived, Kumiko would search the label to find who it’s for, then find the person on Slack and let them know their package is waiting at the front desk.
But Algolia was rapidly growing. Handling package distribution by hand started taking more and more time for Kumiko. During the holiday seasons, it got really out of hand:
Obviously, manual handling doesn’t scale. I thought there should be a faster, easier, scalable way to help dispatch packages. I decided to build a web application for it. My goal was to automate the process as much as possible, from scanning the label to notifying people on Slack.
I initially thought of using the barcode. Unfortunately, I quickly discovered that a barcode doesn’t contain the same kind of data as you have in QR codes. Most of the time, they only contain EAN identifiers. These numbers are intended to query private carrier APIs to fetch the packages’s details.
So I decided to read the package label with an optical character recognition engine (OCR) and send the OCR text to the search engine as-is, matching it against the correct record in the index.
There are several open source libraries for handling the OCR part. The most popular one is Tesseract. However, you typically need to perform some pre-processing on the image before sending it to Tesseract to recognize the characters (e.g., desaturation, contrast, de-skewing) Also, some of the package labels we receive are handwritten! Tesseract is not good at reading handwritten words.
Google’s Vision API offers OCR capabilities, so I decided to give it a go. Among other things, it provides:
We’ll see how this works in step 3. First, let’s look at the code that integrated Algolia search with OCR.
I created a React app, and installed the React Webcam component to access the device’s camera. Internally, this React component leverages the getUserMedia API.
Once the user captures a label using their phone, the app sends it to an Express backend. This takes care of proxying the base64-encoded image to the Google Vision API. Vision then returns a JSON payload containing the data as text.
// Initialize the Google Cloud Vision client visionClient = new vision.ImageAnnotatorClient(); // Ask Vision API to return the text from the label // https://cloud.google.com/vision/docs/ocr const [result] = await visionClient.textDetection({ image: { content: labelImage.data, // Uploaded image data }, }); const detections = result.textAnnotations; // This contains all the text const labelText = detections[0].description.replace(new RegExp("\n", "g"), " "); // Replace the line breaks by a space
Here’s what Google Vision gave us (and what we will eventually send as the query to the search engine):
ORY1\n0.7 KG\nDENOIX Clément\nALGOLIA\n55 rue d'Amsterdam\n75008 Paris, France\nC20199352333\nDIF4\nCYCLE\nlove of boo\nAnod
As you can see, labels aren’t pretty. They contain a lot of noise. The relevant information is baked somewhere in there, surrounded by other data. They contain characters relevant to the delivery person, such as label numbers, the sender’s address, etc. Additionally, the order isn’t consistent and the information isn’t always complete, so we can’t rely on word ordering or element position to extract relevant sections before sending them to Algolia. We’ll do that in Step 5. First, let’s take a look at the back-end data we’ll be searching.
There’s no need to provide any code for this part. Indexing data from other systems is the basis of all search engines. The idea is to take relevant data from one or more systems and push it all into a separate data source called an index. This is run on the back end with a frequency that matches the changing nature of your data. Note that the search engine only needs some data which is relevant to search purposes, for querying, display, sorting, and filtering.
Algolia’s API provides update methods to achieve this. Our documentation offers tutorials on how to send data.
As you saw, Google’s Vision API gave us great information. But how does the search engine locate the name?
Fortunately, the Algolia search API has an interesting parameter: removeWordsIfNoResults
.
When you set this parameter to allOptional
and the engine fails to find any results with the original query, it makes a second attempt, treating all words as optional. This is equivalent to transforming the implicit AND
operators between words to OR
.
// Initialize the Algolia client and the Algolia employees index. const algoliaClient = algoliaearch(process.env.ALGOLIA_APP_ID, process.env.ALGOLIA_API_KEY) const index = algoliaClient.initIndex(process.env.ALGOLIA_INDEX_NAME); // Search our employees index for a match, using the `removeWordsIfNoResults=allOptional` option. // https://www.algolia.com/doc/api-reference/api-parameters/removeWordsIfNoResults/ const algoliaResult = await index.search(labelText, { 'removeWordsIfNoResults': 'allOptional' })
Note that labelText
contains the exact string that Google Vision API sends back without any preprocessing (except to strip away the '\n’
). I’ve highlighted the name (DENOIX Clément
) which the search engine pulls out from the noise on the label – the cherished needle in the haystack:
ORY1 0.7 KG DENOIX Clément ALGOLIA 55 rue d'Amsterdam 75008 Paris, France C20199352333 DIF4 CYCLE love of boo Anod
Usually, this parameter is helpful to improve results when a query is too restrictive. In my case, it allowed me to send the extracted data unprocessed. I was able to trust the Algolia engine to “ignore” the extraneous words from my query and only take the important ones into account.
{ "displayName": "Clement Denoix", "firstName": "Clement", "lastName": "Denoix", "location": "Paris", "slack": { "id": "U0000000", "handle": "clement.denoix", "image": "https://avatars.slack-edge.com/2018-04-03/340713326613_2890719b5a8d4506f30c_512.jpg" }, }
This left only a few steps: extracting the first hit from the list of Algolia search results and displaying it. From there, our office manager could confirm the result, and automatically send a Slack message to the right employee.
Here’s a diagram of the App’s complete process:
As seen here: We take a picture of the package label. The app sends it to the Google Vision API through the Express backend. Google Vision returns a JSON payload with the recognized text, which the back end sends to Algolia as a search query. The search engine uses the removeWordsIfNoResults
option to ensure a successful match. Algolia then returns a list of matching records, from which the back end extracts the first hit and returns it to the React app.
Algolia’s powerful search engine isn’t limited to a search box. With imagination, you can push the usage of Algolia far beyond the box and solve a variety of problems.
Label reading is only one kind of OCR integration. There’s image recognition, where online retailers can recognize the type, style, color, and size of clothing from images. There’s also voice recognition, where a website can interact with the unstructured ways people speak.
There are many ways to do this. In this case, we do this with the search engine’s built-in features that enable it to adapt its relevance algorithm to the variety and unpredictability of unstructured query data. The next step is to couple that with AI and machine learning, making a search engine’s adaptability and use-case scope even greater.
Clément Denoix
Software EngineerPowered by Algolia AI Recommendations
Catherine Dee
Search and Discovery writerJulien Lemoine
Co-founder & former CTO at AlgoliaJaden Baptista
Freelance Writer at Authors Collective