Self-Hosted Search Engine Setup Guide

What Is a Self-Hosted Search Engine?

A self-hosted search engine runs on your own server, giving you full control over your search data and infrastructure. There are two main categories:

  1. Application search engines (Meilisearch, Typesense, Elasticsearch, OpenSearch, ManticoreSearch, Sonic) — add search functionality to your websites and applications. They index your data and serve search queries through APIs.

  2. Web metasearch engines (SearXNG, Whoogle) — aggregate results from public search engines (Google, Bing, DuckDuckGo) without tracking. They replace Google as your daily search engine.

This guide covers the concepts, architecture, and setup patterns common to all self-hosted search engines, helping you make the right choice and get running quickly.

Prerequisites

  • A Linux server (Ubuntu 22.04+ recommended)
  • Docker and Docker Compose installed (guide)
  • Basic understanding of REST APIs (for application search)
  • At least 256 MB free RAM (more for Elasticsearch/OpenSearch)

Choosing the Right Search Engine

Application Search Decision Matrix

If You Need…ChooseWhy
Simple, fast app searchMeilisearchZero-config, typo-tolerant, instant results
Lowest possible latencyTypesenseIn-memory indexes, sub-millisecond search
Complex queries + analyticsElasticsearchFull query DSL, aggregations, ELK stack
Elasticsearch but open sourceOpenSearchApache 2.0 fork, API-compatible
SQL-based searchManticoreSearchMySQL protocol, familiar syntax
Minimal resourcesSonic20 MB RAM, returns IDs only

Web Search Decision Matrix

If You Need…ChooseWhy
Private multi-engine searchSearXNG70+ engines, zero tracking
Simple Google without trackingWhoogleGoogle results, no ads/tracking

Core Concepts

Indexes and Documents

Application search engines store data in indexes (also called collections or tables depending on the engine). An index holds documents — JSON objects with fields.

{
  "id": 1,
  "title": "Getting Started with Docker",
  "content": "Docker is a containerization platform...",
  "category": "foundations",
  "date": "2026-02-15"
}

You push documents into an index, then query that index. The search engine tokenizes text, builds inverted indexes, and returns results ranked by relevance.

Indexing vs Querying

  • Indexing = pushing data into the search engine. This happens when you create or update content. Most engines handle this asynchronously — they accept the data and index it in the background.
  • Querying = searching the indexed data. This happens on every user search. Latency here matters most — users expect results in under 100ms.

Relevance and Ranking

Search engines rank results by relevance. The default ranking typically considers:

  • Term frequency — how often the search term appears in the document
  • Field weighting — matches in title rank higher than matches in body
  • Typo tolerance — “dokcer” still finds “docker”
  • Exact vs prefix match — “docker” ranks higher than “dockerize”

Most engines let you customize ranking rules. Meilisearch and Typesense provide sensible defaults that work for most applications.

Schemas

Some search engines require a schema (field definitions and types) before indexing:

EngineSchema Required?
MeilisearchNo — infers from first document
TypesenseYes — define collection schema upfront
ElasticsearchOptional — auto-maps, but explicit is better
OpenSearchOptional — same as Elasticsearch
ManticoreSearchYes — CREATE TABLE with types
SonicNo — schema-less, text only

Recommendation: Even when schemas are optional, define them explicitly. Auto-inference can mistype fields (a "123" field might be mapped as text or integer depending on the engine).

Common Setup Pattern

All application search engines follow the same basic setup:

1. Deploy with Docker

Every engine in this guide has an official Docker image. The setup is always a docker-compose.yml with volumes for persistent data:

services:
  search:
    image: [engine-image]:[version]
    ports:
      - "[port]:[port]"
    volumes:
      - search_data:/var/lib/[engine]/data
    restart: unless-stopped

volumes:
  search_data:

See the individual guides for complete Docker Compose configurations:

2. Create an Index

After deployment, create an index (or collection/table):

Meilisearch:

curl -X POST http://localhost:7700/indexes \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"uid": "articles", "primaryKey": "id"}'

Typesense:

curl -X POST http://localhost:8108/collections \
  -H "Content-Type: application/json" \
  -H "X-TYPESENSE-API-KEY: YOUR_API_KEY" \
  -d '{
    "name": "articles",
    "fields": [
      {"name": "title", "type": "string"},
      {"name": "content", "type": "string"},
      {"name": "category", "type": "string", "facet": true}
    ]
  }'

Elasticsearch:

curl -X PUT http://localhost:9200/articles \
  -H "Content-Type: application/json" \
  -d '{
    "mappings": {
      "properties": {
        "title": {"type": "text"},
        "content": {"type": "text"},
        "category": {"type": "keyword"}
      }
    }
  }'

3. Index Documents

Push your data into the search engine:

Meilisearch:

curl -X POST http://localhost:7700/indexes/articles/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '[
    {"id": 1, "title": "Docker Basics", "content": "...", "category": "foundations"},
    {"id": 2, "title": "Nginx Proxy Manager", "content": "...", "category": "reverse-proxy"}
  ]'

All engines accept JSON arrays for bulk indexing. For large datasets (100K+ documents), batch your imports in chunks of 10,000-50,000 documents.

Query your indexed data:

Meilisearch:

curl http://localhost:7700/indexes/articles/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"q": "docker compose"}'

Typesense:

curl "http://localhost:8108/collections/articles/documents/search?q=docker+compose&query_by=title,content" \
  -H "X-TYPESENSE-API-KEY: YOUR_API_KEY"

Elasticsearch:

curl -X POST http://localhost:9200/articles/_search \
  -H "Content-Type: application/json" \
  -d '{"query": {"match": {"title": "docker compose"}}}'

5. Integrate with Your Application

Use official SDKs to integrate search into your application:

EngineJavaScriptPythonPHPGoRuby
Meilisearchmeilisearchmeilisearchmeilisearch-phpmeilisearch-gomeilisearch-ruby
Typesensetypesensetypesensetypesense-phptypesense-gotypesense-ruby
Elasticsearch@elastic/elasticsearchelasticsearchelasticsearch-phpgo-elasticsearchelasticsearch-ruby
OpenSearch@opensearch-project/opensearchopensearch-pyopensearch-phpopensearch-goN/A

Frontend Search UI

For user-facing search interfaces, InstantSearch.js (Algolia’s open-source frontend library) works with Meilisearch, Typesense, and Elasticsearch through adapters:

Meilisearch:

npm install @meilisearch/instant-meilisearch instantsearch.js

Typesense:

npm install typesense-instantsearch-adapter instantsearch.js

These provide ready-made components: search boxes, hit lists, facet filters, pagination — with minimal custom code.

Security

Authentication

All application search engines support API key authentication. Always enable it in production:

EngineAuth Method
MeilisearchMaster key + generated API keys (search-only, admin)
TypesenseAPI keys with scoped permissions
ElasticsearchBuilt-in security (username/password + role-based)
OpenSearchSecurity plugin with RBAC (enabled by default)
ManticoreSearchNo built-in auth (use network-level security)
SonicPassword-based (config file)

Network Security

Search engines should never be directly exposed to the internet. Standard security setup:

  1. Bind to localhost or Docker network only. Don’t expose ports 7700, 8108, 9200, etc. on 0.0.0.0 unless behind a reverse proxy.
  2. Use a reverse proxy for external access. See Reverse Proxy Setup.
  3. Separate search keys from admin keys. Frontend search uses read-only keys. Admin keys stay server-side.
  4. Firewall rules. Only allow access from your application servers.

Backup and Recovery

Search indexes should be backed up alongside your application data:

  • Meilisearch: Built-in dump/snapshot feature via API
  • Typesense: Snapshot API for point-in-time backups
  • Elasticsearch: Snapshot and restore API with repository support
  • OpenSearch: Same snapshot API as Elasticsearch
  • ManticoreSearch: BACKUP SQL command
  • Sonic: Back up the data volume directly

For all engines, the simplest backup is a Docker volume backup:

docker run --rm -v search_data:/data -v $(pwd):/backup alpine \
  tar czf /backup/search-backup-$(date +%Y%m%d).tar.gz /data

See Backup Strategy for a comprehensive backup approach.

Common Mistakes

1. Using :latest Docker Tags

Pin your search engine to a specific version. A surprise major version upgrade can break your index format, API compatibility, or configuration.

# Bad
image: getmeili/meilisearch:latest

# Good
image: getmeili/meilisearch:v1.35.1

2. No Authentication in Production

Every search engine defaults to either no auth or weak auth. Before exposing any search endpoint, configure proper API keys and restrict access.

3. Not Setting Resource Limits

Elasticsearch and OpenSearch will consume all available memory if not constrained. Always set JVM heap size:

environment:
  - "ES_JAVA_OPTS=-Xms512m -Xmx512m"

Meilisearch and Typesense self-manage memory but can still grow with large indexes. Monitor usage.

4. Exposing Admin APIs Publicly

Separate your search (read-only) and admin (write) API keys. Never expose admin keys to the frontend. A leaked admin key lets anyone modify or delete your index.

5. Not Re-Indexing After Schema Changes

Changing field types or adding new searchable fields often requires a full re-index. Plan for this — keep your source data accessible so you can rebuild indexes when needed.

Next Steps

  1. Choose your engine using the decision matrices above
  2. Deploy it following the individual setup guide
  3. Index your data using the engine’s SDK or REST API
  4. Add a search UI with InstantSearch.js or a custom frontend
  5. Secure it with API keys and network restrictions
  6. Back it up on a regular schedule