What is online retail merchandising? An introduction
Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...
Sr. SEO Web Digital Marketing Manager
Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...
Sr. SEO Web Digital Marketing Manager
It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...
Chief Strategic Business Development Officer
What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...
Search and Discovery writer
In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...
Sr. SEO Web Digital Marketing Manager
In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...
Sr. SEO Web Digital Marketing Manager
Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...
Sr. SEO Web Digital Marketing Manager
National No Code Day falls on March 11th in the United States to encourage more people to build things online ...
Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...
Chief Revenue Officer at Algolia
How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...
Search and Discovery writer
It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...
Chief Product Officer
This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...
Director of Product Marketing, Ecommerce
A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...
Search and Discovery writer
How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...
Search and Discovery writer
Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...
Algolia sponsored the 2023 Ecommerce Site Search Trends report which was produced and written by Coleman Parkes Research. The report ...
Chief Strategic Business Development Officer
You think your search engine really is powered by AI? Well maybe it is… or maybe not. Here’s a ...
Chief Revenue Officer at Algolia
You looked at this scarf twice; need matching mittens? How about an expensive down vest? You watched this goofy flick ...
Sr. SEO Web Digital Marketing Manager
“I can’t find it.” Sadly, this conclusion is often still part of the modern enterprise search experience. But ...
Sr. SEO Web Digital Marketing Manager
Aug 5th 2016 product
All of us have at some point or the other been stuck in the middle of nowhere, desperately trying to locate an address, with our “fat fingers” for company; or abandoned a purchase in our virtual shopping baskets because it was too much trouble to fill out our entire address.
Algolia being Algolia, we wondered how we could reduce instances of this in our own way and Algolia Places was born.
In case you haven’t heard, Algolia Places allows you to create beautiful address auto-completion menus with just a few lines of code; it can also be personalized and customized as per your use-case, not to mention enriched with additional sources of data. Today, we’d like to share the story of how we built Algolia Places with you.
To build Algolia Places we relied on the open-source datasets provided by OpenStreetMap & GeoNames. These datasets are actually very different from each other and we chose them for precisely this reason.
In order to build the city and country search experience, we exclusively used the GeoNames dataset – it’s a pretty exhaustive list and is quite simple to parse.
For every single city & country name on the TSV files, we created an Algolia record. These records not only include the variations and translated names of countries & cities (if available), but also some meta-data such as associated postcodes, populations, location and various tags & booleans we use for the ranking/filtering capabilities of our API.
The result? ~4 million records.
To build the address search experience (including countries, cities, streets & notable places), we used both OpenStreetMap and GeoNames.
The OpenStreetMap initiative is wonderful but the underlying data is not always on point. Why?
To convert the OSM map data into Algolia records, we imported the whole OSM planet (40GB compressed) inside a PostgreSQL+PostGIS engine. The resulting database was a whopping 600GB!
We then used Nominatim to export this huge database to an XML format, rebuilding the hierarchy of places. This actually brought an interesting problem to light: the raw map data of OSM doesn’t have any hierarchy, for instance you don’t have an obvious hierarchy/link between San Francisco, California and the United States. Nominatim was used to rebuild that hierarchy in the exported format so another tool could easily process it.
The resulting XML file weighs 150GB and looks somewhat like:
<osmStructured version="0.1" generator="Nominatim">
<add>
<feature place_id="914093" type="R" id="71525" key="boundary" value="administrative" rank="12" importance="12" parent_place_id="913159" parent_type="R" parent_id="8649">
<names>
<name type="alt_name:fr">Lutèce</name>
<name type="alt_name:vi">Ba Lê</name>
<name type="loc_name:fr">Panam</name>
<name type="name">Paris</name>
<name type="name:af">Parys</name>
<name type="name:am">ፓሪስ</name>
<name type="name:an">París</name>
<name type="name:ar">باريس</name>
…
<name type="old_name:vi">Ba Lê</name>
<name type="ref">75</name>
</names>
<adminLevel>6</adminLevel>
<address>
<state rank="8" type="R" id="8649" key="boundary" value="administrative" distance="0.236572214414259" isaddress="t"/>
</address>
<tags>
<tag type="wikipedia">fr:Paris</tag>
</tags>
<osmGeometry>POLYGON((2.224122 48.854199,2.224158 48.854615,2.224257 48.855241,2.224317 48.85555,2.224371 ….))</osmGeometry>
</feature>
…
</add>
</osmStructured>
We also built a SAX parser to process the file and rebuild the features hierarchy by following the “parent_id” and “parent_place_id” XML attributes.
It turned out to be a tiny bit more complex than expected
…..some hierarchy couldn’t be resolved
and some features also lacked some metadata.
Deduplication is generally super easy, but when you need to deduplicate 150GB of data, it leads to all sorts of new problems – in our case, we never seemed to have enough RAM! We also wanted to make sure that parsing was fast enough in order to avoid waiting days to build the Algolia records … after all, milliseconds matter!
So we tried to leverage the previously built GeoNames index as much as possible to fix the missing data we could hit with OSM; but since these 2 datasets didn’t share the same IDs, it was obviously way more complex to aggregate.
For performance reasons (more on that under “Search strategy”), we decided to build multiple indices from those records:
That’s about 60GB of indices.
Just so you have an idea, the overall parsing + record generation + indexing, takes ~12 hours at a time, today.
Here’s what our final record schema looks like:
{
"is_city": true,
"is_country": false,
"_tags": ["boundary", "boundary/administrative", "country/us", "city"],
"country_code": "us",
"_geoloc": [{"lng": -122.4192704, "lat": 37.7792768}],
"country": {
"zh": "美国",
"ro": "Statele Unite ale Americii",
"it": "Stati Uniti d'America",
"hu": "Amerikai Egyesült Államok",
"ar": "الولايات المتّحدة الأمريكيّة",
"de": "Vereinigte Staaten von Amerika",
"default": "United States of America",
"pt": "Estados Unidos da América",
"pl": "Stany Zjednoczone Ameryki",
"fr": "États-Unis d'Amérique",
"ru": "Соединённые Штаты Америки",
"es": "Estados Unidos de América",
"nl": "Verenigde Staten van Amerika",
"ja": "アメリカ合衆国"
},
"admin_level": 8,
"locale_names": {
"zh": ["旧金山"],
"default": ["San Francisco", "SF"],
"pt": ["São Francisco"],
"ru": ["Сан-Франциско"],
"ja": ["サンフランシスコ"]
},
"importance": 16,
"is_popular": true,
"county": {
"default": [ "San Francisco City and County", "San Francisco", "SF"],
"ru": ["Сан-Франциско"]
},
"is_highway": false,
"administrative": ["California"],
"population": 815358,
"postcode": ["94101", "94102", "94103", "94104", ….
]
}
We constructed the underlying Algolia indices with the following configuration:
We built a custom API endpoint to query the indices and implemented a custom querying strategy:
If the query is performed from your backend (and therefore doesn’t specify any aroundLatLng
query parameter) or your source IP address isn’t geo-localized, the results will come from all around the world because we target the “planet” index.
Or, if the query is performed from the end-user browser or device (and hence specifies an aroundLatLng
query parameter) or has a source IP address that is geo-localized), the results will be composed of:
aroundRadius
feature,We’ve had a great time building Algolia Places and the results today make all those coffee-fueled sleepless nights absolutely worth it. Here’s a quick look at them:
Go for it, it could be FREE!
We’re providing a non-authenticated free plan limited to 1,000 queries a day and we increase the free plan usage to 100,000 a month if you signup.
Powered by Algolia Recommend