Algolia LLMs

Re-evaluating LLM encoders for semantic search

Summary

In this study, we explore the transferability of MTEB and publicly available ecommerce dataset benchmark performance to real-world retail search applications.

Algolia LLMs

Algolia builds embedding models that are approximately 500M parameters to ensure desired latency. Each LLM is architecturally optimised and quantised to ensure lower latency (Table 2). The Algolia AI team continuously assesses state-of-the-art LLMs, selecting those with top performance and permissive licenses. Fine-tuning these models for ecommerce contexts ensures superior performance tailored to industry-specific needs. The fine-tuning methodology combines the best practices from cutting-edge research with the AI team's expertise. Leveraging automated AI training and evaluation pipelines, the process optimizes model performance by simultaneously exploring numerous hyperparameters on the same training dataset, resulting in the best possible models. Some of the techniques inspired by the latest research, without delving into exact details, are outlined in the table below:

Table 2: Algolia leverages two-stage training approach: 1. fine-tuning using infoNCE loss 2. hard-negative fine-tuning leveraging synthetic datasets
Stage	Technique	Comments
Fine-tune (infoNCE loss)	Stratified public ecommerce datasets in batches	To ensure the best possible outcome is achieved from infoNCE loss, stratified datasets are curated in the same batch.
Hard-negative fine-tune	Synthetic hard negatives for further fine-tuning	A combination of GenAI labelling and tuned hard negative mining is leveraged to ensure the resultant model can separate the decision boundary between vague samples.

Algolia embedding LLMs (as of December 2024) with their specifications are provided in Table 3. All Algolia LLMs are trained on publicly available ecommerce datasets, and no Algolia customer datasets are used for training purposes. GenAI labeling and hard negative mining are combined to create synthetic datasets for further fine-tuning. Algolia v2410 models are open-sourced under MIT license, and they can be accessed at Hugging Face. Note that latency is computed on a local machine with an i9 CPU.

Table 3: Algolia LLMs are state-of-the-art embedding models fine-tuned and optimised for ecommerce search
Model	License	Base	Datasets	Dimension	Latency
Algolia-Large-EN-Generic-v2410	MIT	gte-large	Public ecommerce (+Syn.)	1024	90 ms / 40 ms (opt.)
Algolia-Large-Multilang-Generic-v2410	MIT	solon-embeddings-large-0.1	Public ecommerce (+Syn.)	1024	90 ms / 40 ms (opt.)
Algolia-Large-All-Generic-v2412	MIT	snowflake-arctic-embed-l-v2.0	Public ecommerce (+Syn.)	1024	90 ms / 35 ms (opt.)

Next Chapter >

Re-evaluating LLM encoders for semantic search

Summary

Algolia LLMs

Enable anyone to build great Search & Discovery