building multilingual search indexes in jekyll

Why Multilingual Search Matters

If you're serving a global audience, your knowledge base should speak their language—literally. While Jekyll is not a CMS with built-in i18n (internationalization) support, it can be extended to support multiple languages through clever use of collections, data files, and routing. The challenge lies in making your search functionality multilingual too, so users can find answers in their preferred language.

Common Use Cases

  • Product documentation in English, Spanish, and French
  • Localized support pages for international markets
  • Multilingual blog posts with different structures per region

We'll walk through how to create separate search indexes per language and load them dynamically based on user preference or site structure.

Setting Up Language-Specific Collections

Let’s say you have two languages: English and Spanish. You’ll want mirrored collections for each:


collections:
  guides_en:
    output: true
  guides_es:
    output: true
  faqs_en:
    output: true
  faqs_es:
    output: true

This allows full separation of language-specific content while maintaining organizational clarity. Create directories like _guides_en and _guides_es for each collection.

Creating Search Indexes Per Language

Next, generate separate search_en.json and search_es.json files using custom templates. For English:



---
layout: none
---

[
  {% assign en_docs = site.guides_en | concat: site.faqs_en %}
  {% for doc in en_docs %}
  {
    "title": "{{ doc.title | escape }}",
    "url": "{{ doc.url | relative_url }}",
    "collection": "{{ doc.collection }}",
    "content": {{ doc.content | strip_html | strip_newlines | jsonify }}
  }{% unless forloop.last %},{% endunless %}
  {% endfor %}
]

Repeat this structure for Spanish with search_es.json, substituting guides_es and faqs_es.

Detecting Language Context

You have several options to determine which language index to load:

  • URL path: Use /en/ or /es/ prefixes.
  • Subdomains: Like es.example.com
  • User preference: Detect via browser language or selector

For example, with path-based routing:


const lang = window.location.pathname.startsWith("/es/") ? "es" : "en";
const indexUrl = `/search_${lang}.json`;

This script will fetch the correct language index dynamically.

Loading Language-Specific Index in JavaScript

Here’s how your Lunr initialization might look with language switching support:


let idx = null;
let documents = [];

fetch(indexUrl)
  .then(res => res.json())
  .then(data => {
    documents = data;

    idx = lunr(function () {
      this.ref("url");
      this.field("title");
      this.field("content");

      data.forEach(function (doc) {
        this.add(doc);
      }, this);
    });
  });

document.getElementById("search-input").addEventListener("input", function () {
  const query = this.value;
  const results = idx.search(query);
  const output = document.getElementById("search-results");
  output.innerHTML = "";

  results.forEach(result => {
    const match = documents.find(doc => doc.url === result.ref);
    const item = document.createElement("li");
    item.innerHTML = `${match.title}`;
    output.appendChild(item);
  });
});

Optimizing Content for Language Indexes

When creating multilingual content, consistency matters. Make sure that:

  • Each language version includes similar structure and metadata
  • You use the same layout files if applicable
  • Content in non-Latin scripts (e.g. Arabic, Japanese) is UTF-8 encoded

Also, avoid mixing languages in the same index to keep search relevance high.

Optional: Switch Index Dynamically with a Language Selector

If you're allowing users to change the site language manually via a dropdown, you can also switch indexes accordingly:


document.getElementById("language-select").addEventListener("change", function () {
  const selectedLang = this.value;
  window.location.href = `/${selectedLang}/`;
});

This navigates users to the correct path and triggers the correct index load automatically.

Adding Localized UI Elements

Make sure your search interface elements—placeholders, labels, and feedback messages—are also localized. Use _data/lang/en.yml and _data/lang/es.yml to manage language keys:


# _data/lang/en.yml
search_placeholder: "Search..."
no_results: "No results found"

# _data/lang/es.yml
search_placeholder: "Buscar..."
no_results: "No se encontraron resultados"

Then inject the appropriate language strings in your templates using Liquid:



SEO Considerations for Multilingual Search

To keep your site Google-friendly across languages:

  • Use hreflang tags in your HTML head to define language versions
  • Ensure all versions are crawlable and linked from a central sitemap
  • Don’t rely solely on JavaScript to expose content

Limitations of Lunr for i18n

Lunr has limited support for languages with complex grammar or non-Latin alphabets. Consider the following:

  • Use language-specific stemmers or tokenizers (Lunr supports some via plugins)
  • Keep indexes small and focused per language
  • Fallback to fuzzy search or tag-based navigation for low-resource languages

Conclusion

Adding multilingual search to a Jekyll-based knowledge base isn’t as difficult as it sounds. With language-specific collections, separate search indexes, and smart JavaScript routing, you can deliver a seamless multilingual experience to your users. This method also scales well as you add more languages over time.

Next, we’ll dive into using taxonomy filters—like tags and categories—to enhance your interactive search interface with more precise results.