Conclusions and future work

Re-evaluating LLM encoders for semantic search

Summary

In this study, we explore the transferability of MTEB and publicly available ecommerce dataset benchmark performance to real-world retail search applications.

Conclusions and future work

It is essential to evaluate LLMs in the context of a specific problem. The performance difference of Google, JinaAI, and Algolia models on MTEB and Algolia ecommerce benchmarks shows how important it is to make sure internal benchmarks with relevant datasets are available for evaluation purposes. At Algolia, new LLMs are added to our benchmarks typically within 24 hours. This enables our AI team to select the state-of-the-art LLM embedding model, and fine-tune our next generation models based on foundational open-source models with permissive license using our fully automated training and evaluation pipelines.

Algolia v2410 models are state-of-the-art for their size and use cases, and are now available under an MIT licence. Please use them and provide us with feedback at ai-research@algolia.com.

Re-evaluating LLM encoders for semantic search

Summary

Conclusions and future work

Enable anyone to build great Search & Discovery