KD-Boost: Amazon’s Breakthrough in Real-Time Semantic Matching for E-commerce


  • Amazon researchers introduce KD-Boost, a novel knowledge distillation algorithm tailored for real-time semantic matching.
  • Real-time semantic matching is crucial for web and e-commerce product searches, involving two key steps: Product Sourcing and Automated Query Reformulation.
  • KD-Boost utilizes soft labels and ground truth data from a teacher model to train low-latency, accurate student models.
  • Pairwise query-product and query-query signals, derived from audits, user behavior, and taxonomy-based data, are used as soft labels.
  • Custom loss functions guide the learning process and ensure accurate representation capture.
  • Tests on e-commerce datasets demonstrate a significant 2-3% improvement in ROC-AUC compared to direct student model training.
  • KD-Boost outperforms state-of-the-art knowledge distillation benchmarks and teacher models.
  • A/B tests show a 6.31% increase in query-to-query matching, a 2.19% improvement in relevance, and a 2.76% rise in product coverage.

Main AI News:

Web search and e-commerce product search have long relied on the critical concept of real-time semantic matching. In the realm of product searches, the challenge often lies in bridging the semantic gap between user queries and the corresponding search results. This multifaceted matching process typically comprises two essential steps: Product Sourcing (PS) and Automated Query Reformulation. Product sourcing seeks to retrieve relevant results or products based on a user’s query, and subsequent automated query reformulation aims to refine and enhance user queries for better result coverage.

Semantic matching is the cornerstone of this process, enabling search engines to identify and associate items with similar meanings, delivering not just any results but the most contextually relevant ones. Transformer-based models have shown remarkable prowess in encoding user requests and clustering them within an embedding space rich in semantically related elements, including queries and results. However, the inherent computational cost of large transformer models poses latency challenges that hinder real-time matching.

To surmount these obstacles, Amazon’s team of researchers has unveiled KD-Boost, a cutting-edge knowledge distillation technique meticulously crafted to address the demands of real-time semantic matching. KD-Boost leverages ground truth and soft labels obtained from a teacher model to train efficient, low-latency student models. The soft labels are derived from pairwise query-product and query-query signals, meticulously curated through direct audits, user behavior research, and taxonomy-based data. The integration of custom loss functions ensures a precise and guided learning process.

The research team’s comprehensive approach involves utilizing various sources of similarity and dissimilarity signals to cater to both query reformulation and product sourcing requirements. Editorial ordinal relevance labels for query-product pairs, user-behavioral data such as clicks and sales, and product taxonomy are among the diverse array of signals employed. Tailored loss functions are employed to guarantee that the model captures the nuances of relevance and similarity with exceptional accuracy.

The results of extensive testing on both internal and external e-commerce datasets have been nothing short of remarkable. KD-Boost has demonstrated a substantial enhancement of 2-3% in ROC-AUC (Receiver Operating Characteristic – Area Under the Curve) when compared to direct training of student models. Moreover, KD-Boost has outperformed state-of-the-art knowledge distillation benchmarks as well as teacher models.

In simulated online A/B tests, KD-Boost has delivered promising outcomes for automated Query Reformulation. The results reveal a 6.31% increase in query-to-query matching, signifying an enhanced understanding of semantic context. Additionally, there is a notable 2.19% improvement in relevance, resulting in more precise and contextually relevant matches, coupled with a 2.76% rise in product coverage, expanding the spectrum of relevant results.


Amazon’s KD-Boost presents a game-changing solution for real-time semantic matching in the e-commerce market. It enhances search efficiency, improves relevance, and broadens the scope of relevant results, ultimately delivering a superior user experience and strengthening Amazon’s competitive edge in the industry.