Engineering

How we tackled internationalization in our Zendesk integration
facebooklinkedintwittermail

Zendesk customers are worldwide, coming from every continent. It’s no surprise that their Help Centers support multiple languages out-of-the-box. When we launched our Algolia for Zendesk integration, it shipped with English support by default, and you could extend it to handle other languages. Today, we’re proud to announce that our integration supports 30 languages.

Algolia has always been language-agnostic. You can search in an English, Arabic or Chinese text without touching the parameters; but each integration comes with some specific features. Searching in help articles is a specific use-case, and since we’re providing some front-end features (e.g. an autocompletion menu), we also had some text that needed to be translated ( e.g. “10 results found in 2ms”).

Our initial release had some flaws that we quickly discovered by talking with our first integration users. We got great feedback from pretty big Zendesk users that needed multiple language support out of the box, like Dashlane, whose Help Center is available in English, French and Spanish.

TL;DR

I’ve learned the hard way that there’s no magic bullet. Languages are too different to expect being able to simply replace parts of your text easily. Some things aren’t obvious – for example, how one thousand is written numerically in English (1,000) vs. French (1 000)) – some languages have multiple plural forms and you can’t expect that the sentence construction will be the same in any other language. As soon as you want to have dynamic content in a sentence, you’ll need to use some form of templating logic.

How did it work before?

When you call our front-end library, you can pass some parameters. We had exposed a simple translations parameter, in which each key held either a string with the desired value for all languages or an object associating a locale with a translation.

translations: {
  placeholder_autocomplete: 'Search',
  found_in: {
    'en-us': 'Found in',
    'fr': 'en'
  }
}

We then decided that we wanted to embed a few languages directly inside the application by default, to ease our users’ life. That’s when we’ve learned that the simple solution I had developed wasn’t sufficient at all.

Flaws of the previous solution

We used Gengo to get our sentences translated in a first batch of 5 languages. Satisfied with the results, we ordered a total of 30 translations (most of the ones supported by Zendesk). Some languages had a grammar similar enough to the English one that the integration was straightforward. Others brought up issues with the current solution, at three levels.

Different ways of displaying the same information (dates, numbers)

We have some great tools to display the same information in multiple languages.

Your browser ships with Number.toLocaleString and Date.toLocaleDateString. Unfortunately, those methods by default use the user’s localization. ECMA2015 has added the support for a new locales parameter, but its support by browser vendors is still too low for us to confidently use it for an integration targeting our customers’ end-users.

At that time, we simply ignored the different ways of displaying numbers depending on the language. However, for dates, we used the great moment.js library.

Plural forms

This one is a very basic one, but I didn’t think about cases where a plural form was not needed in English but needed in other languages (and I’m French):

Form English French
Singular Found in Trouvé en
Plural Found in Trouvés en

Something that came as a real surprise to me though was that languages also have multiple plural forms:

Amount English Czech
1 result výsledek
2 and 3 results výsledky
more than 4 results výsledků
Sentence construction

Sometimes it’s just the sentence construction which is completely different. In some languages, the words positioning can be totally different:

Language Translation
English No results found for “query”
German Keine Ergebnisse für “query” gefunden
Japanese “query” の結果が見つかりませんでした。

Another small difference between languages can simply be on the punctuation side. You have different quotes in different languages:

English Czech
“query” „query“

It’s with all those cards in hand that we’ve started realizing we needed a better framework.

What we ended up with

We kept the same logic for the translation object, but this time added the ability to have logic on top of it.

The static standalone sentences are still simple translated strings:

filter: {
  en: 'Filter results',
  ru: 'фильтр'
  th: 'ผลลัพธ์จากตัวกรอง'
}

Others now are functions. Those function are called with access to the other translations (this) and take the dynamic parts as parameters:

stats: {
  en: function (nbHits, processing) {
    return this.nb_results(nbHits) + ' found in ' + processing + ' ms';
  },
  es: function (nbHits, processing) {
    return this.nb_results(nbHits) + ' encontrado' + (nbHits > 1 ? 's' : '') + ' en ' + processing + ' ms';
  }
}

For each locale (e.g. en-us), you can either override the root translation by using the en key, which will change the translation for every english-speaking locale (i.e. en-au, en-ca, en-gb and en-us), or by providing a locale specific translation by using the full locale as a key (e.g. 'en-us').

The only exception for this is Chinese, where Simplified Chinese and Traditional Chinese are too different to have a common root.

Going further

Once this was done, we started thinking about how we could improve the relevance in each languages.

The first thing we thought about is related to the type of queries the users might send. On a Help Center, it’s not unusual to have users type “how to do …”. The issue with that type of query is the potential noise related to words as common as “how”, “to”, “and” or “my”, that are called stop words in natural language processing.

A simple example speaks thousand words. Consider two articles:

Change your password How to delete your account
Just follow this link to change your password Click on “Delete my account”. In case you change your mind, your old login/password will still work during 14 days.

For the query “How to change my password ?”, without any special handling of those frequent words, the Change your password article would not even show up because of the lack of the words “how”, “to” and “my” in the article. How to delete your account would instead show up.

That’s where stop words handling kicks in. Algolia offers an automatic stop words removal feature, with the query parameter removeStopWords. You can either use it with true to remove all the stop words in all languages or limit yourself to a specific language. Activated with the example above, Change your password would come back first in the results list and How to delete your account second.

While this works great for full queries, its behavior can be a bit strange when used with prefix search, because the words are used as part of the query while they’re used as a prefix (the query “ho” for instance would match “how”, but the query “how to delete” would only match “delete”). That’s why we chose another solution.

Algolia also provides another query parameter called optionalWords, which can be used to specify which words aren’t required for a record to match. At each keystroke, we’re now looking at which stop words in the current language the query contains, and add them as optional words to the query. How to delete my account ? would still show up first because it matches more query words, but Change your password would show up in the results list.

This solution in the end brings a good balance between no stop words handling and the usage of removeStopWords, in terms of relevance but also user experience. This works especially well because the dataset we’re searching in is usually in the 100s of articles, so there aren’t that much relevant results for a query.

Next steps

Now that we have this framework, what could we improve upon?

The first thing would be to have a way to use language specific tags. We’re indeed displaying all the tags on the search result page by default, you can hide them with three lines of CSS, but having only localized tags could be an even better solution.

Another big improvement we will do will be to have one index per locale. This would allow us to move the whole stop words logic from the front-end to the Algolia index settings instead, which would save some space in the front-end library.

Feel free to contribute, the whole code is open-source, and we’d be happy to look at any feature you’d like!

About the authorMatthieu Dumont

Matthieu Dumont

Senior Software Engineer

Recommended Articles

Powered by Algolia AI Recommendations

Handling Natural Languages in Search
Engineering

Handling Natural Languages in Search

Léo Ercolanelli

Léo Ercolanelli

Software Engineer
Algolia's top 10 tips to achieve highly relevant search results
Product

Algolia's top 10 tips to achieve highly relevant search results

Julien Lemoine

Julien Lemoine

Co-founder & former CTO at Algolia
Inside the Algolia Engine Part 3 — Query Processing
Engineering

Inside the Algolia Engine Part 3 — Query Processing

Julien Lemoine

Julien Lemoine

Co-founder & former CTO at Algolia