Engineering

Integrate OCR into search in a package label-scanning app
facebooklinkedintwittermail

Locating hard-to-find information without using a search bar may sound paradoxical, but take the example of package labels. Delivery companies structure a label’s information in many different and unpredictable ways. Normally, we just read the labels ourselves and find the relevant information — in this case, who the package is for. But we’d prefer to integrate OCR into search.

But what if there are thousands of labels, all differently structured — different content, sometimes handwritten, in shades of nearly unreadable grays, in odd directions of text, with random lines, graphics, smudges, and torn-off corners? Can OCR help?

Optical character recognition (OCR) makes a best effort to extract text from images, but the resulting text will often be large and unstructured. Additionally, if the OCR software can’t decipher some letters, the text will have typos.

This is where an integrated search technology comes in. A search engine with a robust, adaptable relevance can match an OCR’s unstructured text against a structured set of data and return the correct results. 

Integrate OCR into search

We integrated our search engine into two technologies:

Essentially, we scanned a label and used Google Cloud Vision API to convert the label to text. We then fed the unpredictable output into our search engine, which matched it against the structured data of BambooHR, finding and returning the recipient’s name. Importantly, we didn’t need to pre-process or parse the input data. This workflow can also work with stickers, stamps, and even movie posters on a wall. 

Online retailers and media companies are leveraging this OCR + search integration to query their back-end systems.

Our story: Why we needed to integrate OCR into search

On a daily basis, Algolia employees receive loads of packages at the Paris office. Kumiko, our office coordinator, had been taking care of them. Every time a new package arrived, Kumiko would search the label to find who it’s for, then find the person on Slack and let them know their package is waiting at the front desk.

But Algolia was rapidly growing. Handling package distribution by hand started taking more and more time for Kumiko. During the holiday seasons, it got really out of hand:

image of packages with different labels

Obviously, manual handling doesn’t scale. I thought there should be a faster, easier, scalable way to help dispatch packages. I decided to build a web application for it. My goal was to automate the process as much as possible, from scanning the label to notifying people on Slack.

I initially thought of using the barcode. Unfortunately, I quickly discovered that a barcode doesn’t contain the same kind of data as you have in QR codes. Most of the time, they only contain EAN identifiers. These numbers are intended to query private carrier APIs to fetch the packages’s details.

So I decided to read the package label with an optical character recognition engine (OCR) and send the OCR text to the search engine as-is, matching it against the correct record in the index.

How to integrate OCR into search

Step 1: Finding the right OCR software

There are several open source libraries for handling the OCR part. The most popular one is Tesseract. However, you typically need to perform some pre-processing on the image before sending it to Tesseract to recognize the characters (e.g., desaturation, contrast, de-skewing) Also, some of the package labels we receive are handwritten! Tesseract is not good at reading handwritten words. 

Google’s Vision API offers OCR capabilities, so I decided to give it a go. Among other things, it provides:

  • 1,000 free API calls per month (which is more than enough to start)
  • Handwritten characters detection

We’ll see how this works in step 3. First, let’s look at the code that integrated Algolia search with OCR.

Step 2: Creating the React app

I created a React app, and installed the React Webcam component to access the device’s camera. Internally, this React component leverages the getUserMedia API.

Once the user captures a label using their phone, the app sends it to an Express backend. This takes care of proxying the base64-encoded image to the Google Vision API. Vision then returns a JSON payload containing the data as text.

// Initialize the Google Cloud Vision client
visionClient = new vision.ImageAnnotatorClient();

// Ask Vision API to return the text from the label
// https://cloud.google.com/vision/docs/ocr
const [result] = await visionClient.textDetection({
  image: {
    content: labelImage.data, // Uploaded image data
  },
});

const detections = result.textAnnotations; // This contains all the text
const labelText = detections[0].description.replace(new RegExp("\n", "g"), " "); // Replace the line breaks by a space

Step 3: Reading the label with Google Vision API 

Here’s what Google Vision gave us (and what we will eventually send as the query to the search engine):

ORY1\n0.7 KG\nDENOIX Clément\nALGOLIA\n55 rue d'Amsterdam\n75008 Paris, France\nC20199352333\nDIF4\nCYCLE\nlove of boo\nAnod

As you can see, labels aren’t pretty. They contain a lot of noise. The relevant information is baked somewhere in there, surrounded by other data. They contain characters relevant to the delivery person, such as label numbers, the sender’s address, etc. Additionally, the order isn’t consistent and the information isn’t always complete, so we can’t rely on word ordering or element position to extract relevant sections before sending them to Algolia. We’ll do that in Step 5. First, let’s take a look at the back-end data we’ll be searching.

Step 4: Data indexing BambooHR’s back-end data

There’s no need to provide any code for this part. Indexing data from other systems is the basis of all search engines. The idea is to take relevant data from one or more systems and push it all into a separate data source called an index. This is run on the back end with a frequency that matches the changing nature of your data. Note that the search engine only needs some data which is relevant to search purposes, for querying, display, sorting, and filtering.

Algolia’s API provides update methods to achieve this. Our documentation offers tutorials on how to send data.

Step 5: Searching with Algolia

As you saw, Google’s Vision API gave us great information. But how does the search engine locate the name? 

Fortunately, the Algolia search API has an interesting parameter: removeWordsIfNoResults.

When you set this parameter to allOptional and the engine fails to find any results with the original query, it makes a second attempt, treating all words as optional. This is equivalent to transforming the implicit AND operators between words to OR.

// Initialize the Algolia client and the Algolia employees index.
const algoliaClient = algoliaearch(process.env.ALGOLIA_APP_ID, process.env.ALGOLIA_API_KEY)
const index = algoliaClient.initIndex(process.env.ALGOLIA_INDEX_NAME);

// Search our employees index for a match, using the `removeWordsIfNoResults=allOptional` option.
// https://www.algolia.com/doc/api-reference/api-parameters/removeWordsIfNoResults/
const algoliaResult = await index.search(labelText, {
    'removeWordsIfNoResults': 'allOptional'
})

Note that labelText contains the exact string that Google Vision API sends back without any preprocessing (except to strip away the '\n’). I’ve highlighted the name (DENOIX Clément) which the search engine pulls out from the noise on the label –  the cherished needle in the haystack:

ORY1 0.7 KG DENOIX Clément ALGOLIA 55 rue d'Amsterdam 75008 Paris, France C20199352333 DIF4 CYCLE love of boo Anod

Usually, this parameter is helpful to improve results when a query is too restrictive. In my case, it allowed me to send the extracted data unprocessed. I was able to trust the Algolia engine to “ignore” the extraneous words from my query and only take the important ones into account.

{
  "displayName": "Clement Denoix",
  "firstName": "Clement",
  "lastName": "Denoix",
  "location": "Paris",
  "slack": {
    "id": "U0000000",
    "handle": "clement.denoix",
    "image": "https://avatars.slack-edge.com/2018-04-03/340713326613_2890719b5a8d4506f30c_512.jpg"
  },
}

This left only a few steps: extracting the first hit from the list of Algolia search results and displaying it. From there, our office manager could confirm the result, and automatically send a Slack message to the right employee.

Here’s a diagram of the App’s complete process:

Image of oct label reading process

As seen here: We take a picture of the package label. The app sends it to the Google Vision API through the Express backend. Google Vision returns a JSON payload with the recognized text, which the back end sends to Algolia as a search query. The search engine uses the removeWordsIfNoResults option to ensure a successful match. Algolia then returns a list of matching records, from which the back end extracts the first hit and returns it to the React app.

Conclusion & next steps

Algolia’s powerful search engine isn’t limited to a search box. With imagination, you can push the usage of Algolia far beyond the box and solve a variety of problems.

Label reading is only one kind of OCR integration. There’s image recognition, where online retailers can recognize the type, style, color, and size of clothing from images. There’s also voice recognition, where a website can interact with the unstructured ways people speak.

There are many ways to do this. In this case, we do this with the search engine’s built-in features that enable it to adapt its relevance algorithm to the variety and unpredictability of unstructured query data. The next step is to couple that with AI and machine learning, making a search engine’s adaptability and use-case scope even greater.

About the authorClément Denoix

Clément Denoix

Software Engineer

Recommended Articles

Powered by Algolia AI Recommendations

What is a search query and how is it processed by a search engine?
Product

What is a search query and how is it processed by a search engine?

Catherine Dee

Catherine Dee

Search and Discovery writer
Visual Shopping & Visual Discovery – How image search reimagines online shopping
AI

Visual Shopping & Visual Discovery – How image search reimagines online shopping

Julien Lemoine

Julien Lemoine

Co-founder & former CTO at Algolia
How to responsibly give a chatbot access to a database
AI

How to responsibly give a chatbot access to a database

Jaden Baptista

Jaden Baptista

Freelance Writer at Authors Collective