arrow_left_alt

Blog

Product-to-Product Recommendations in Sylius with Embedding-Based Filtering

August 21, 2025

Imagine that you enter an online electronics store: you’re looking for headphones, and under the product description you see “similar models” perfectly tailored to your needs. The very same magic you’ve been enjoying on Spotify or Netflix for years - that’s what we want to bring today to a Sylius-based online shop.

At the core of such services lie millions of vectors - numerical representations of products or tracks, typically spanning hundreds or even thousands of dimensions. Each vector is a condensed “signature” of an item, capturing its features, description, tags, or even user reviews. Thanks to nearest-neighbor search algorithms, you can instantly retrieve a list of the most similar items.

In our solution, we’ll employ two main technologies:

  • Embedding Generation: We’ll use the ChatGPT API to create high-quality text vectors. As a local alternative, we’ll also cover the SentenceTransformer("all-MiniLM-L6-v2") model.
  • Qdrant Vector Engine: A lightweight, open-source database optimized for vector search with low latency and high scalability.

Technologies and Approach

To build an effective “product-to-product” recommendation system, we need two key components: a way to transform a product’s description into a comparable form, and a performant engine to search those representations.

The first step is vectorization - that is, converting text like a product’s name, description, or tags into a sequence of numbers (an embedding). We have many options, but we’ll focus on two: the remote ChatGPT API (model text-embedding-ada-002) and a locally hosted lightweight SentenceTransformer model (all-MiniLM-L6-v2). The OpenAI model offers the highest-quality semantic representations—it can distinguish subtle differences between products and handles multiple languages well. On the other hand, running all-MiniLM-L6-v2 on your own server avoids network latency and usage costs, producing somewhat shorter (384-dimensional) embeddings.

When we have our embeddings, they go into the vector engine - in our case, Qdrant. This open-source project has been gaining traction since 2020 thanks to its simple REST API and excellent performance. Qdrant uses the HNSW (Hierarchical Navigable Small World) algorithm, which lets us find nearest “neighbors” in milliseconds even with millions of vectors. Furthermore, you can attach arbitrary metadata to each vector - like the product ID, price, or category - enabling additional filtering at query time.

In short, the whole process looks like this:

  1. Data Extraction – we fetch the product’s textual attributes from Sylius.
  2. Embedding Generation – we call the model (remote or local) for each description.
  3. Indexing in Qdrant – we upsert the embeddings plus metadata into the products collection.
  4. Similar-item Search – for a given product, we retrieve its vector and ask Qdrant for its nearest neighbors.
  5. Presentation – we pass the results to the Sylius template to render a “Similar Products” section on the product page.

3. Data Extraction

The first task is to extract all the textual information from Sylius that effectively describes a product. The better the “raw material” we use for vectorization, the more accurate the recommendations will be. Typically, it’s worth collecting:

  • Product name
  • Description
  • Category list
  • Selected attributes (e.g., color, size, material)
  • Any other custom fields we consider useful for recommendations

Next, we perform simple text normalization:

  1. Remove unnecessary whitespace
  2. Convert to lowercase – this helps maintain consistency
  3. Optional: remove HTML/Markdown, replace multiple spaces with single ones

We then concatenate all the text into one long sequence, ready to be sent to the embedding model. We’ll use the Strategy + Factory pattern, which allows us to easily register new data sources.

Let’s define a DTO (Data Transfer Object):

class EmbeddingRequest
{
    public function __construct(
        public int $productId,
        public string $combinedText,
    ) {}
}

The EmbeddingRequest stores only:

  • productId
  • combinedText - merged, normalized text

Implement some simple extractors:.

interface FieldExtractorInterface
{
    /** Checks if this extractor supports the field with the given name */
    public function supports(string $field): bool;

    /** Returns the text for the given product and field */
    public function extract(ProductInterface $product): string;
}

final class NameExtractor implements FieldExtractorInterface
{
    public function supports(string $field): bool
    {
        return 'name' === $field;
    }

    public function extract(ProductInterface $product): string
    {
        return $product->getName();
    }
}

final class TaxonsExtractor implements FieldExtractorInterface
{
    public function supports(string $field): bool
    {
        return 'taxons' === $field;
    }

    public function extract(ProductInterface $product): string
    {
        $taxons = $product->getTaxons()
            ->map(static fn (TaxonInterface $taxon): string => $taxon->getName())
            ->toArray()
        ;
        return implode(' ', $parts);
    }
}

final class AttributeExtractor implements FieldExtractorInterface
{
    public function supports(string $field): bool
    {
        return 'attributes' === $field;
    }

    public function extract(ProductInterface $product): string
    {
        $parts = [];
        foreach ($product->getAttributes() as $attr) {
            $parts[] = $attr->getName() . ' ' . $attr->getValue();
        }
        return implode(' ', $parts);
    }
}

// ... analogicznie DescriptionExtractor itp.

We combine everything in the factory.

The ProductEmbedRequestFactory accepts a list of available extractors and a list of fields in the order we want to combine them:

final class ProductEmbedRequestFactory
{
    public function __construct(
        private iterable $extractors,
        private array $fieldsOrder = [
            'name',
            'description',
            'taxons',
            'attributes',
            // ... your custom fields
        ],
    ) {}

    public function create(ProductInterface $product): EmbeddingRequest
    {
        $texts = [];

        foreach ($this->fieldsOrder as $field) {
            foreach ($this->extractors as $ext) {
                if ($ext->supports($field)) {
                    $texts[] = $this->normalize($ext->extract($product));
                    break;
                }
            }
        }

        $combined = implode(' ', array_filter($texts, fn($t) => $t !== ''));

        return new EmbeddingRequest($product->getId(), $combined);
    }

    private function normalize(string $text): string
    {
        $clean = trim($text);
        $clean = mb_strtolower($clean, 'UTF-8');
        $clean = preg_replace('/\\s+/', ' ', $clean);

        return $clean;
    }
}

4. Generating Embeddings

With a ready EmbeddingRequest object containing the concatenated product text, we can move on to the core task - converting it into a vector of numbers. Below, you'll see how to create a simple EmbeddingGenerator service, which takes the data from an EmbeddingRequest and returns an embedding ready to be inserted into Qdrant.

The EmbeddingGenerator Service

use App\\Dto\\EmbeddingRequest;

use App\\Dto\\EmbeddingRequest;

final class EmbeddingGenerator
{
    public function __construct(
        private OpenAiClientInterface $openAiClient
    ) {}

    /**
     * Returns the embedding as an array of floats using the OpenAI API.
     */
    public function generateEmbedding(EmbeddingRequest $request): array
    {
        $response = $this->openAiClient->embeddings([
            'model' => 'text-embedding-ada-002',
            'input' => $request->combinedText,
        ]);

        return $response['data'][0]['embedding'];
    }
}

5. Vector Indexing in Qdrant

Once we have the embeddings, it’s time to create a collection in Qdrant and insert each vector along with the product ID and any metadata (such as category or price) that can later help filter the results.

The QdrantService.

A simple service that wraps the most important operations on the products collection in qdrant.

use Qdrant\\Models\\Request\\VectorParams;

final class QdrantService
{
    public function __construct(
        private QdrantClientInterface $client, 
        private readonly int $vectorSize
    ) {}

    /**
     * Creates a 'products' collection if it doesn't already exist.
     */
    public function ensureProductsCollection(): void
    {
        if (!$this->client->collectionExists('products')) {
            $this->client->createCollection('products', [
                'vectorSize' => $this->vectorSize,
                'distance' => VectorParams::DISTANCE_COSINE, // or 'Euclidean'
            ]);
        }
    }

    /**
     * Sends a single embedding to Qdrant.
     */
    public function upsertProductVector(
	      int $productId,
        array $vector,
        array $metadata = [],
    ): void {
        $this->client->upsert('products', [
            [
                'id' => $productId,
                'vector' => $vector,
                'payload' => $metadata,
            ]
        ]);
    }
}

Updating all products.

In the EmbeddingUpdater class we now combine all the steps: from preparing the text, to generating the embedding, to uploading. This service can be used, for example, in a CLI command that will run it while loading all vectors into qdrant

use App\\Dto\\EmbeddingRequest;

final class EmbeddingUpdater
{
    public function __construct(
        private ProductRepositoryInterface $productRepository,
        private ProductEmbedRequestFactory $requestFactory,
        private EmbeddingGenerator $generator,
        private QdrantService $qdrantService,
    ) {}

    public function updateAll(): void
    {
        $this->qdrantService->ensureProductsCollection();

        foreach ($this->productRepository->findAll() as $product) {
            // a) Build an object with ID and merged text
            $request = $this->requestFactory->create($product);

            // b) Generate the embedding by OpenAI
            $vector = $this->generator->generateEmbedding($request);

            // c) Prepare metadata (this is simplified code for the example, although ideally the metadata should be built similarly to extractors i.e. with a strategy pattern)
            $metadata = [
                'name' => $product->getName(),
                'main_taxon' => $product->getMainTaxon()?->getName() ?? '',
                'code' => $product->getCode(),
                'taxons' => $$product->getTaxons()
                    ->map(
                        static fn (TaxonInterface $taxon): string => 
                            $taxon->getName()
                    ->toArray(),
                //...    
            ];

            // d) Send to Qdrant
            $this->qdrantService->upsertProductVector(
                $request->productId,
                $vector,
                $metadata
            );
        }
    }
}

In the next part of the article (part 2) we will go over “nearest neighbor” queries in Qdrant and show how to retrieve a list of the most similar ones based on the embedding of any product and display them under a panel post in the product details view.

<div class="rtb-text-box is-blue-50">If you have any questions about Sylius, please contact us, our team will be happy to help you.</div>

{{cta-technology-sylius="/comp/cta"}}