Search by Algolia
What is online retail merchandising? An introduction
e-commerce

What is online retail merchandising? An introduction

Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

5 considerations for Black Friday 2023 readiness
e-commerce

5 considerations for Black Friday 2023 readiness

It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...

Piyush Patel

Chief Strategic Business Development Officer

How to increase your sales and ROI with optimized ecommerce merchandising
e-commerce

How to increase your sales and ROI with optimized ecommerce merchandising

What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...

Catherine Dee

Search and Discovery writer

Mobile search UX best practices, part 3: Optimizing display of search results
ux

Mobile search UX best practices, part 3: Optimizing display of search results

In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 2: Streamlining search functionality
ux

Mobile search UX best practices, part 2: Streamlining search functionality

In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 1: Understanding the challenges
ux

Mobile search UX best practices, part 1: Understanding the challenges

Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Teaching English with Zapier and Algolia
engineering

Teaching English with Zapier and Algolia

National No Code Day falls on March 11th in the United States to encourage more people to build things online ...

Alita Leite da Silva

How AI search enables ecommerce companies to boost revenue and cut costs
ai

How AI search enables ecommerce companies to boost revenue and cut costs

Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...

Michelle Adams

Chief Revenue Officer at Algolia

What is digital product merchandising?
e-commerce

What is digital product merchandising?

How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...

Catherine Dee

Search and Discovery writer

Scaling marketplace search with AI
ai

Scaling marketplace search with AI

It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...

Bharat Guruprakash

Chief Product Officer

The changing face of digital merchandising
e-commerce

The changing face of digital merchandising

This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...

Reshma Iyer

Director of Product Marketing, Ecommerce

What’s a convolutional neural network and how is it used for image recognition in search?
ai

What’s a convolutional neural network and how is it used for image recognition in search?

A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...

Catherine Dee

Search and Discovery writer

What’s organizational knowledge and how can you make it accessible to the right people?
product

What’s organizational knowledge and how can you make it accessible to the right people?

How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...

Catherine Dee

Search and Discovery writer

Adding trending recommendations to your existing e-commerce store
engineering

Adding trending recommendations to your existing e-commerce store

Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...

Ashley Huynh

Ecommerce trends for 2023: Personalization
e-commerce

Ecommerce trends for 2023: Personalization

Algolia sponsored the 2023 Ecommerce Site Search Trends report which was produced and written by Coleman Parkes Research. The report ...

Piyush Patel

Chief Strategic Business Development Officer

10 ways to know it’s fake AI search
ai

10 ways to know it’s fake AI search

You think your search engine really is powered by AI? Well maybe it is… or maybe not.  Here’s a ...

Michelle Adams

Chief Revenue Officer at Algolia

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?
ai

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?

You looked at this scarf twice; need matching mittens? How about an expensive down vest? You watched this goofy flick ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is cognitive search, and what could it mean for your business?
ai

What is cognitive search, and what could it mean for your business?

“I can’t find it.”  Sadly, this conclusion is often still part of the modern enterprise search experience. But ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

All of us have at some point or the other been stuck in the middle of nowhere, desperately trying to locate an address, with our “fat fingers” for company; or abandoned a purchase in our virtual shopping baskets because it was too much trouble to fill out our entire address.

Algolia being Algolia, we wondered how we could reduce instances of this in our own way and Algolia Places was born.

In case you haven’t heard, Algolia Places allows you to create beautiful address auto-completion menus with just a few lines of code; it can also be personalized and customized as per your use-case, not to mention enriched with additional sources of data. Today, we’d like to share the story of how we built Algolia Places with you.

How did we do it?

Step 1: Data Collection

To build Algolia Places we relied on the open-source datasets provided by OpenStreetMap & GeoNames. These datasets are actually very different from each other and we chose them for precisely this reason.

  • The OpenStreetMap dataset contains map data: it basically constitutes of the geo representation (polygons, lines & points) of about 200 million geographical features.
  • The GeoNames dataset contains geographical names of about 9 million features: it’s a  regular list in the TSV format and contains names of every single city/country/place associated with some meta-data (population, zip codes, …).

Step 2: The Indices

  1. The city & country index

In order to build the city and country search experience, we exclusively used the GeoNames dataset – it’s a pretty exhaustive list and is quite simple to parse.

For every single city & country name on the TSV files, we created an Algolia record. These records not only include the variations and translated names of countries & cities (if available), but also some meta-data such as associated postcodes, populations, location and various tags & booleans we use for the ranking/filtering capabilities of our API.

The result? ~4 million records.

The address indices

To build the address search experience (including countries, cities, streets & notable places), we used both OpenStreetMap and GeoNames.

The OpenStreetMap initiative is wonderful but the underlying data is not always on point. Why?

  • It may not always be exhaustive (especially for non-famous places)
  • It may sometimes include erroneous values (from our experience, postcodes used to be wrong)
  • It may contain duplicates
  • It may not always follow the same conventions across countries
  • It may lack some internationalisation/translations

To convert the OSM map data into Algolia records, we imported the whole OSM planet (40GB compressed) inside a PostgreSQL+PostGIS engine. The resulting database was a whopping 600GB!

We then used Nominatim to export this huge database to an XML format, rebuilding the hierarchy of places. This actually brought an interesting problem to light: the raw map data of OSM doesn’t have any hierarchy, for instance you don’t have an obvious hierarchy/link between San Francisco, California and the United States. Nominatim was used to rebuild that hierarchy in the exported format so another tool could easily process it.

The resulting XML file weighs 150GB and looks somewhat like:

<osmStructured version="0.1" generator="Nominatim">
 <add>
  <feature place_id="914093" type="R" id="71525" key="boundary" value="administrative" rank="12" importance="12" parent_place_id="913159" parent_type="R" parent_id="8649">
   <names>
    <name type="alt_name:fr">Lutèce</name>
    <name type="alt_name:vi">Ba Lê</name>
    <name type="loc_name:fr">Panam</name>
    <name type="name">Paris</name>
    <name type="name:af">Parys</name>
    <name type="name:am">ፓሪስ</name>
    <name type="name:an">París</name>
    <name type="name:ar">باريس</name>
   …
    <name type="old_name:vi">Ba Lê</name>
    <name type="ref">75</name>
   </names>
   <adminLevel>6</adminLevel>
   <address>
    <state rank="8" type="R" id="8649" key="boundary" value="administrative" distance="0.236572214414259" isaddress="t"/>
   </address>
   <tags>
    <tag type="wikipedia">fr:Paris</tag>
   </tags>
   <osmGeometry>POLYGON((2.224122 48.854199,2.224158 48.854615,2.224257 48.855241,2.224317 48.85555,2.224371 ….))</osmGeometry>
  </feature>
  …
 </add>
</osmStructured>

We also built a SAX parser to process the file and rebuild the features hierarchy by following the “parent_id” and “parent_place_id” XML attributes.

It turned out to be a tiny bit more complex than expected

  • A lot of features were associated with duplicates and inconsistency:
  • Some cities have both “boundary/administrative” and “places/city” features, some have just one of them
  • A street is composed of several segments, so you might see several features representing the same street
  • There are some street names which are common to several cities, so we needed to have multiple records within Algolia

…..some hierarchy couldn’t be resolved

    • Some features did not have any parents
    • Some streets were attached to their countries but not cities
    • Some cities are also states, so the hierarchy was very confusing

and some features also lacked some metadata.

    • They weren’t associated with the corresponding population
    • Some counties didn’t have postcodes
    • Some cities didn’t have translations

Deduplication is generally super easy, but when you need to deduplicate 150GB of data, it leads to all sorts of new problems – in our case, we never seemed to have enough RAM! We also wanted to make sure that parsing was fast enough in order to avoid waiting days to build the Algolia records … after all, milliseconds matter!

So we tried to leverage the previously built GeoNames index as much as possible to fix the missing data we could hit with OSM; but since these 2 datasets didn’t share the same IDs, it was obviously way more complex to aggregate.

For performance reasons (more on that under “Search strategy”), we decided to build multiple indices from those records:

  • 1 index for the whole planet (~20M records)
  • 1 index per country (~6M records for the US, ~1.5M records for France, ….)

That’s about 60GB of indices.

Just so you have an idea, the overall parsing + record generation + indexing, takes ~12 hours at a time, today.

The record schema

Here’s what our final record schema looks like:

{
  "is_city": true,
  "is_country": false,
  "_tags": ["boundary", "boundary/administrative", "country/us", "city"],
  "country_code": "us",
  "_geoloc": [{"lng": -122.4192704, "lat": 37.7792768}],
  "country": {
    "zh": "美国",
    "ro": "Statele Unite ale Americii",
    "it": "Stati Uniti d'America",
    "hu": "Amerikai Egyesült Államok",
    "ar": "الولايات المتّحدة الأمريكيّة",
    "de": "Vereinigte Staaten von Amerika",
    "default": "United States of America",
    "pt": "Estados Unidos da América",
    "pl": "Stany Zjednoczone Ameryki",
    "fr": "États-Unis d'Amérique",
    "ru": "Соединённые Штаты Америки",
    "es": "Estados Unidos de América",
    "nl": "Verenigde Staten van Amerika",
    "ja": "アメリカ合衆国"
  },
  "admin_level": 8,
  "locale_names": {
    "zh": ["旧金山"],
    "default": ["San Francisco", "SF"],
    "pt": ["São Francisco"],
    "ru": ["Сан-Франциско"],
    "ja": ["サンフランシスコ"]
  },
  "importance": 16,
  "is_popular": true,
  "county": {
    "default": [ "San Francisco City and County", "San Francisco", "SF"],
    "ru": ["Сан-Франциско"]
  },
  "is_highway": false,
  "administrative": ["California"],
  "population": 815358,
  "postcode": ["94101", "94102", "94103", "94104", ….
  ]
}

 

Step 3: Index configuration

We constructed the underlying Algolia indices with the following configuration:

  • Search works with all localized names, but considers the default name as the most important
  • The names are considered as more important than the city, the postcode, the county, the administrative area, the suburb, the village and even the country.

Screen Shot 2016-08-05 at 10.22.57

  • We make sure to rank countries above cities and cities above streets, ensuring that we get the highest populated places first.
  • In case the population is not set, we fall back on the OSM’s importance field; which reflects the importance of the underlying place.

Screen Shot 2016-08-05 at 10.24.02

Step 4: The search strategy

We built a custom API endpoint to query the indices and implemented a custom querying strategy:

If the query is performed from your backend (and therefore doesn’t specify any aroundLatLng query parameter) or your source IP address isn’t geo-localized, the results will come from all around the world because we target the “planet” index.

Or, if the query is performed from the end-user browser or device (and hence specifies an aroundLatLng query parameter) or has a source IP address that is geo-localized), the results will be composed of:

  • Nearby places
    • Places around you (<10km): this is a query using our aroundRadius feature,
    • Places in your country: this is a query targeting the specific country index,
  • Popular places all around the world using a query targeting the “planet” index.
  • Specifying a country query parameter will override this behavior, restricting the results to a subset of the specified countries only.
  • Numerical tokens in the query string are considered as optional words to make sure we always find the address even if the postcode & house number are wrong.
  • We also defined a list of stopwords and tagged every stopword of the query string as optional.
  • We queried the underlying indices multiple times to ensure that:
  • Popular places will always be retrieved first.
  • Nearby places will always be better than places in your country, which are better than world-wide results.
  • If both a city and an address match the query, the city will be retrieved first.
  • If the query doesn’t retrieve any results, we fallback on a degraded query strategy where all words are considered optional and we only target cities.

The Result

We’ve had a great time building Algolia Places and the results today make all those coffee-fueled sleepless nights absolutely worth it. Here’s a quick look at them:

  • We’re hosting the Algolia Places infrastructure on 1 Algolia Cluster + 9 DSN replicas
    • 3 main servers in Germany
    • 4 DSN servers in France
    • 4 DSN servers in US
    • 1 DSN server in Singapore
  • Our Algolia Places infrastructure is currently processing 60 queries per second and provides answers in 20-30ms on an average.
  • Our infrastructure is still at <3% of its capacity.
  • We’ve reached 3200 stars on the GitHub repository of the frontend JavaScript library and a bunch of positive feedback.

Want to use it?

Go for it, it could be FREE!

We’re providing a non-authenticated free plan limited to 1,000 queries a day and we increase the free plan usage to 100,000 a month if you signup.

Try Algolia Places now!

About the author
Sylvain Utard

VP of Engineering

linkedintwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Part 4: Supercharging search for ecommerce solutions with Algolia and MongoDB — Frontend implementation and conclusion
engineering

Soma Osvay

Full Stack Engineer, Starschema

Build a React app with fast indexing and instant inventory updates
engineering

Julia Seidman

Developer Educator

How to use Algolia as a game engine debugging tool in Rust
engineering

Gyula László

Senior Developer @ Starschema