Skip to content

Quickstart

Get the MCP server running locally in a few minutes.

  • Qdrant — a running Qdrant instance (gRPC port 6334). The quickest path is Docker:
    Terminal window
    docker run -d -p 6334:6334 qdrant/qdrant
  • Embeddings endpoint — an OpenAI-compatible embeddings endpoint. Options:
    • LiteLLM in front of any model
    • OpenAI API directly (set MEM_LITELLM_URL=https://api.openai.com/v1 and MEM_EMBED_MODEL=text-embedding-3-small)
  • OIDC issuer (optional) — if you want bearer-token enforcement, an OIDC issuer URL. Without one, the server accepts all requests (logged loudly).

Pull and run the latest image from GHCR:

Terminal window
docker run -d \
--name engram \
-p 8080:8080 \
-e MEM_QDRANT_ADDR=host.docker.internal:6334 \
-e MEM_LITELLM_URL=http://host.docker.internal:4000 \
-e MEM_EMBED_MODEL=ollama/bge-m3 \
ghcr.io/seanb4t/engram:latest

(host.docker.internal resolves on macOS and Windows; Linux users need --add-host host.docker.internal:host-gateway or replace with the host IP.)

The MCP endpoint is served at http://localhost:8080/mcp by default. Set MEM_MCP_PATH=/ to restore the pre-0.7 behavior where the transport answered at the bare root.

Key environment variables (see Configure for the full list):

VariableWhat it does
MEM_QDRANT_ADDRQdrant gRPC address (host:port); default localhost:6334
MEM_LITELLM_URLEmbeddings endpoint (OpenAI-compatible); default http://localhost:4000
MEM_EMBED_MODELModel name forwarded to the endpoint; default ollama/bge-m3

Once the server is running, add it to Claude Code with /engram-setup. See the Claude Code Plugin guide for details.

With the server registered, use store_memory to persist a fact and search_memory to retrieve it. See the Tools reference for full parameter docs.

store_memory — persist a decision, convention, preference, or gotcha
search_memory — semantic search over stored memories