building multilingual search indexes in jekyll
Why Multilingual Search Matters
If you're serving a global audience, your knowledge base should speak their language—literally. While Jekyll is not a CMS with built-in i18n (internationalization) support, it can be extended to support multiple languages through clever use of collections, data files, and routing. The challenge lies in making your search functionality multilingual too, so users can find answers in their preferred language.
Common Use Cases
- Product documentation in English, Spanish, and French
- Localized support pages for international markets
- Multilingual blog posts with different structures per region
We'll walk through how to create separate search indexes per language and load them dynamically based on user preference or site structure.
Setting Up Language-Specific Collections
Let’s say you have two languages: English and Spanish. You’ll want mirrored collections for each:
collections:
guides_en:
output: true
guides_es:
output: true
faqs_en:
output: true
faqs_es:
output: true
This allows full separation of language-specific content while maintaining organizational clarity. Create directories like _guides_en and _guides_es for each collection.
Creating Search Indexes Per Language
Next, generate separate search_en.json and search_es.json files using custom templates. For English:
---
layout: none
---
[
{% assign en_docs = site.guides_en | concat: site.faqs_en %}
{% for doc in en_docs %}
{
"title": "{{ doc.title | escape }}",
"url": "{{ doc.url | relative_url }}",
"collection": "{{ doc.collection }}",
"content": {{ doc.content | strip_html | strip_newlines | jsonify }}
}{% unless forloop.last %},{% endunless %}
{% endfor %}
]
Repeat this structure for Spanish with search_es.json, substituting guides_es and faqs_es.
Detecting Language Context
You have several options to determine which language index to load:
- URL path: Use
/en/or/es/prefixes. - Subdomains: Like
es.example.com - User preference: Detect via browser language or selector
For example, with path-based routing:
const lang = window.location.pathname.startsWith("/es/") ? "es" : "en";
const indexUrl = `/search_${lang}.json`;
This script will fetch the correct language index dynamically.
Loading Language-Specific Index in JavaScript
Here’s how your Lunr initialization might look with language switching support:
let idx = null;
let documents = [];
fetch(indexUrl)
.then(res => res.json())
.then(data => {
documents = data;
idx = lunr(function () {
this.ref("url");
this.field("title");
this.field("content");
data.forEach(function (doc) {
this.add(doc);
}, this);
});
});
document.getElementById("search-input").addEventListener("input", function () {
const query = this.value;
const results = idx.search(query);
const output = document.getElementById("search-results");
output.innerHTML = "";
results.forEach(result => {
const match = documents.find(doc => doc.url === result.ref);
const item = document.createElement("li");
item.innerHTML = `${match.title}`;
output.appendChild(item);
});
});
Optimizing Content for Language Indexes
When creating multilingual content, consistency matters. Make sure that:
- Each language version includes similar structure and metadata
- You use the same
layoutfiles if applicable - Content in non-Latin scripts (e.g. Arabic, Japanese) is UTF-8 encoded
Also, avoid mixing languages in the same index to keep search relevance high.
Optional: Switch Index Dynamically with a Language Selector
If you're allowing users to change the site language manually via a dropdown, you can also switch indexes accordingly:
document.getElementById("language-select").addEventListener("change", function () {
const selectedLang = this.value;
window.location.href = `/${selectedLang}/`;
});
This navigates users to the correct path and triggers the correct index load automatically.
Adding Localized UI Elements
Make sure your search interface elements—placeholders, labels, and feedback messages—are also localized. Use _data/lang/en.yml and _data/lang/es.yml to manage language keys:
# _data/lang/en.yml
search_placeholder: "Search..."
no_results: "No results found"
# _data/lang/es.yml
search_placeholder: "Buscar..."
no_results: "No se encontraron resultados"
Then inject the appropriate language strings in your templates using Liquid:
SEO Considerations for Multilingual Search
To keep your site Google-friendly across languages:
- Use
hreflangtags in your HTML head to define language versions - Ensure all versions are crawlable and linked from a central sitemap
- Don’t rely solely on JavaScript to expose content
Limitations of Lunr for i18n
Lunr has limited support for languages with complex grammar or non-Latin alphabets. Consider the following:
- Use language-specific stemmers or tokenizers (Lunr supports some via plugins)
- Keep indexes small and focused per language
- Fallback to fuzzy search or tag-based navigation for low-resource languages
Conclusion
Adding multilingual search to a Jekyll-based knowledge base isn’t as difficult as it sounds. With language-specific collections, separate search indexes, and smart JavaScript routing, you can deliver a seamless multilingual experience to your users. This method also scales well as you add more languages over time.
Next, we’ll dive into using taxonomy filters—like tags and categories—to enhance your interactive search interface with more precise results.