14-day free trial
Create a full-featured search experience in no time.
Sorry, there is no results for this query
Note: We’ve significantly updated this article and our package label-reading OCR app. Please check out the new article on how to integrate OCR into search.
On a daily basis, Algolia employees receive loads of packages at the Paris office. So far, Kumiko, our office coordinator, has been taking care of them. Every time a new package arrives, Kumiko has to search the label to find who it’s for, then find the person on Slack and let them know their package is waiting at the front desk.
This manual process was working, but Algolia is rapidly growing: only last year, the number of employees in the Paris office has more than doubled. Handling parcel dispatch by hand started taking more and more time for Kumiko. During the holiday season, it got really out of hand.
Obviously, manual handling couldn’t scale.
I work in the Internal Tools squad at Algolia. Our mission is to make Algolia’s teams more efficient by automating inefficient processes, making tools, and providing technical support. I thought there should be a faster, easier, scalable way to help dispatch packages.
I decided to build a web application for it. My goal was to automate the process as much as possible, from scanning the label to notifying people on Slack.
My first idea was to use the barcode that’s on the label. I thought I could extract the employee’s first and last name from it. However, I quickly discovered that a barcode doesn’t contain the same kind of data as you have in QR codes. Most of the time, they only contain EAN identifiers. These numbers are intended to query private carrier APIs to fetch the parcel’s details.
We have an Algolia index with every Algolia employee, which we use on our about page. I thought it could be an interesting starting point. The idea was to “read” the parcel label with an optical character recognition engine (OCR), and match it against the right record in the index.
There are several open source libraries for handling the OCR part. The most popular one is Tesseract. However, you typically need to perform some pre-processing on the image before being able to ask Tesseract to recognize the characters (desaturation, contrast, de-skewing, etc.) Also, some of the parcel labels we receive are handwritten! Tesseract is not good at reading handwritten words, and the preprocessing part was a lot of work. Therefore, I decided that this solution was a no-go.
I knew about Google’s Vision API, which offers OCR capabilities, and wanted to try it, so I decided to give it a go. Among other things, it provides:
Once the user captures a label using their phone, the app sends it to an Express backend. This takes care of proxying the base64-encoded image to the Google Vision API. Vision then returns a JSON payload containing the data as text.
Labels aren’t pretty. They contain a lot of noise. The relevant information is baked somewhere, surrounded by other data: characters that are only relevant to the delivery person, label numbers, the sender’s address, etc. Additionally, the order isn’t consistent, and the information isn’t always complete, so we can’t rely on word ordering or element position to extract relevant sections, before sending them to Algolia.
Obviously, I didn’t want to add an extra manual step for our office manager to select the right parts. This would be cumbersome, and defeat the whole purpose of the app.
Fortunately, the Algolia search API has an interesting parameter:
When you set this parameter to
allOptional and the engine fails to find any results with the original query, it makes a second attempt while treating all words as optional. This is equivalent to transforming the implicit
AND operators between words to
Usually, this parameter is helpful to improve results when a query is too restrictive. In my case, it allowed me to send the extracted data unprocessed; I trusted the Algolia engine to “ignore” the extraneous words from my query, and only take the important ones into account.
This left only a few steps: extracting the first hit from the list of Algolia search results, and displaying it. From there, our office manager could confirm the result, and automatically send a Slack message to the right employee.
When Kumiko takes a picture of the package label, the app sends it to Google Vision through the Express backend. Google Vision returns a JSON payload with the recognized text, which the backend sends to Algolia as a search query, along with the
removeWordsIfNoResults option. Algolia returns a list of matching records, from which the backend extracts the first hit, and returns it to the React app. This allows Kumiko to Slack the person directly, in a single tap.
Algolia is a powerful search engine, but search isn’t limited to a search box. With a bit of imagination, you can push the usage of Algolia far beyond the box, and solve a variety of problems.
This was enabled by Algolia’s strong culture. This project stemmed from one of Algolia’s core values: care. We try to be as helpful as possible with one another. And I did it during Algolia’s monthly engineering off-sprints, which allows employees to experiment!