Next-Gen Embeddings: The Foundation of Intelligent AI

At Crossing Minds, we create nuanced, context-rich vector representations.

An Embedding Training system that goes beyond conventional approaches, leveraging cutting-edge machine learning techniques to create nuanced, context-rich vector representations tailored to your specific use case.

RAG-Optimized Embeddings

Crossing Minds has developed embeddings specifically optimized for Retrieval-Augmented Generation (RAG) systems. These specialized vector representations are crucial for superior information retrieval and language model integration.

Our RAG-optimized embeddings ensure precise and relevant information retrieval, maintaining efficiency even with vast datasets.They capture nuanced meanings and intent, adapting through continuous learning from interactions.

These embeddings also enable cross-modal understanding, connecting information across text, images, and structured data. By focusing on these advanced embeddings, we provide a solid foundation for next-generation RAG systems, significantly enhancing AI's ability to process and utilize information effectively.

Organaization

Model

ERR@10

nDCG@10

Crossing Minds

cm-ragsys-rlaif-mini-v1

0.860

0.701

Salesforce

SFR-Embedding-2_R

0.775

0.610

Cohere

rerank-english-v3.0

0.773

0.618

Snowflake

snowflake-artic-embed-m-v1.5

0.751

0.596

OpenAI

text-embedding-3-small

0.751

0.606

Nvidia

NV-Embed-v2

0.741

0.612

Adaptive Learning Architecture

Our embedding models employ a flexible architecture that adapts to the unique characteristics of your data. Whether you're working with user behavior, product attributes, or multimedia content, our system optimizes the embedding process to capture the most relevant features and relationships.

This adaptive approach ensures that your embeddings reflect the subtle nuances and complex interactions within your data, providing a solid foundation for downstream tasks such as recommendation systems, search engines, and personalization algorithms.

Illustration of embedding models using a flexible architecture that adapts to various data types, such as user behavior, product attributes, and multimedia content. The system optimizes embeddings to capture relevant features and relationships, enabling effective recommendation systems, search engines, and personalization algorithms.

User & Session Embedding

Our adaptive learning architecture excels at capturing the complexities of user behavior and preferences. User embeddings encode long-term preferences, demographic information, and historical interactions, creating a comprehensive profile of each individual.

Session embeddings, on the other hand, capture real-time intent and context from current user activities, allowing for immediate responsiveness to user needs.

These embeddings evolve dynamically as user behavior changes, ensuring up-to-date representations that reflect the latest trends and individual preferences. This dynamic nature allows for personalized experiences that adapt in real-time, significantly enhancing user engagement and satisfaction.

Multi-Modal Item and Content Embeddings

Valuable information often spans multiple data types. Our Embedding Training system excels at integrating diverse data modalities:

  • Text: Capture semantic meaning and context from product descriptions, user reviews, and other textual content.
  • Images: Extract visual features and styles from product images, user-generated content, and more.
  • Categorical Data: Efficiently represent discrete attributes and hierarchical relationships within your data.

By fusing these different modalities into unified embeddings, we enable your AI systems to make more informed decisions based on a holistic view of your data ecosystem.

Customization and Interpretability

Your business has unique needs. Our Embedding Training system offers extensive customization options, allowing you to fine-tune the embedding process to align with your specific objectives.

Whether you're optimizing for recommendation accuracy, search relevance, or user engagement, we provide the tools and expertise to tailor your embeddings for maximum impact.

Our embeddings are designed with interpretability in mind, allowing you to trace back from high-level model outputs to the underlying data features that influenced those decisions. This transparency builds trust and provides valuable insights into your data and AI processes.

Embedding Training system with customization options to align embeddings with your business needs, ensuring recommendation accuracy, search relevance, and transparency for interpretability and trust.

ICLEB

Organaization

Model

ERR@10

nDCG@10

Crossing Minds

cm-ragsys-rlaif-mini-v1

0.860

0.701

Salesforce

SFR-Embedding-2_R

0.775

0.610

Cohere

rerank-english-v3.0

0.773

0.618

Snowflake

snowflake-artic-embed-m-v1.5

0.751

0.596

OpenAI

text-embedding-3-small

0.751

0.606

Nvidia

NV-Embed-v2

0.741

0.612

Model Distillation in FinTech

Context: FinTech Data

  • One of our client is a large FinTech Enterprise
  • Billions of credit card transactions
  • Unstructured use of abbreviations

Example

New York Home Hardware Distributors

NY HOME HARDWARE D

NEW YORK HHDW DISTR

Task: Entity Deduplication with LLM

  • Extract and clean the Merchant Name using a Strong LLM (Claude 3.5 Sonnet)
  • Distill the Strong Teacher LLM into a Cheap Student LLM while preserving accuracy (Claude 3 Haiku)
  • Generate 4k training data from Claude 3.5 Sonnet (Teacher)
  • Fine-tune the student using this 4k dataset

Method

Accuracy

Total Time (min)

Static Few Shots

0.65

5

Fine Tuning

0.88

660

RAGSys

0.91

6

eCommerce Product Catalog Enrichment

Context

  • Leading B2C Marketplace
  • Millions of end-users are creating hundreds of millions of products
  • Extract products tags to boost search and recommendation

Example

  • Use an LLM to extract Product Tags based on the product details
  • Leverage a Manually Curated Set of Tags for Train and Validation

LLM Tags completion

Stone Nail File, Nail Art, Manicure
Amazing Stone Nail File
"best nail file i have ever used".
its never wears outs, and the tapered chiseled end is
wonderful
Measures 4" long x 1/4" wide
you get 1
Pink or Green

Hidden Tags:

stone nail file, nail art, nail polish, nail tools, manicure, manipedi, pedicure, gift, nail health, nail file

Generated Tags:

stone nail file, nail art, manicure, nail tools, nail care, nail file, pink nail file, green nail file, nail grooming, nail accessories, nail health, pedicure

Method

Precision

Recall

Zero Shot

0.1196

0.1326

Static Few Shots

0.1295

0.1415

RAGSys

0.2286

0.2559

LLM Tags Completion: Results.  Average over 1k items in the test set. LLM: gpt-4o

Live Tuning for Entity Deduplication

Context: B2B Marketplace Catalog

  • One of our client is a leading B2B marketplace in their industry
  • 5000+ merchants, 1M+ items
  • All merchants are creating items manually, creating many duplicates
  • They need to consolidate the items catalog

Task: Entity Deduplication with LLM

Given two items titles, are they the same product?

Example 1

Product A:

Google Chromecast Ultra 4K Streaming Media Player

Product B:

Chromecast Ultra

Answer:

Yes

Example 2

Product A:

Dell Alienware M15

Product B:

Alienware M17

Answer:

No

Get an overview of Crossing Minds and its features.
Find out how to take personalized experiences to the next level.
A/B test and customize the smartest recommendations for your unique scenario.
CB Insights Awards Retail Tech 100 in 2022CB Insights Top AI 100 companies in 2022Martech Breakthrough Awards 2022
trusted by brands like

Request a demo

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.