I’ve been thinking about adding this to my “Fuck it, I’ll do it myself” / SHTF pile. I have a spare 10-15GB for a good selection of basic articles (across sciences, history, pop culture trivia etc).
https://get.kiwix.org/en/solutions/hotspots/content-bundles/
https://get.kiwix.org/en/solutions/hotspots/imager-service/
There’s something inherently cool about having wikipedia in a box (yes, you’d likely need to refresh it once a year) but I’ve never heard of anyone actually self hosting a Kiwix instance.


Do you actually train the LLM or use RAG? I have been looking for a local LLM + Wikipedia RAG solution for a while now.
For now I just have kiwix-serve + searxng doing a simple search but the Kiwix search is…questionable.
Found it. Old chat with ChatGPT. Lemme know if you want the how to
Here’s the clean handoff note.
ZIM / Kiwix idea — summary
Core idea: use Wikipedia ZIMs as a local retrieval substrate to make a small 4B model act smarter without touching weights. The win is not “the model now knows Wikipedia.” The win is that the system can consult a large local corpus before priors, under deterministic rules, with no internet dependency. Kiwix uses ZIM as compressed offline content, and
kiwix-servecan expose that content over HTTP;libzimcan also address entries directly by title or path. (Kiwix)Important distinction: Kiwix/ZIM can absolutely be part of an always-on layer, but the true hot path should not be “read full article over HTTP every turn.” The lowest-latency design is to query the archive directly in-process via
libzim/python-libzimusing exact title/path resolution first, keep the archive handle hot, and lean on the built-in title index plus dirent/cluster caches.kiwix-serveis still useful beside that as a human-facing browser/search layer. (libzim Documentation)Minimal-latency tricks
getEntryByTitle/getEntryByPathfirst. That avoids HTTP overhead and keeps lookup deterministic.libzimsupports exact title/path fetches, andpython-libzimexposes whether the archive has a title index or full-text index. (libzim Documentation)kiwix-serveexposes/searchand/rawas public endpoints, plus/suggest,/content, and/vieweras private endpoints. That means you can answer fast from direct lookup, then attach a browser link to the full page for the human. (Kiwix Tools)kiwix-serve’s/suggestuses the title index and can add a full-text-search option when the ZIM includes a full-text index./searchperforms full-text search and returns links with snippets, which is useful for ambiguous/descriptive queries, but it should not be the first move on the hot path. (Kiwix Tools)--nodatealiasesfor cleaner stable links. That letswikipedia_en_all_2026-03also resolve aswikipedia_en_all, which is handy if you want stable URLs across snapshot refreshes. (Kiwix Tools)How the model should use it
Do not let the model answer by regurgitating a full page. That is the dumb path.
The adapter should turn a page into bounded evidence units, for example:
full_article_urlThen the model answers from those bounded units. That keeps ctx use sane and forces the system to populate an answer rather than vomiting page text back at the operator.
In other words:
That is how you get useful “smarts” from a 4B model instead of article-scale mush.
Corpus reality
miniis not full article text. Kiwix definesminias only the introduction plus infobox.nopicis full articles without images.maxiis full fat. Also, Kiwix currently says incremental ZIM updates are not available; operationally, updates are snapshot-swap, not rolling in-place refreshes. (Kiwix)For English Wikipedia right now:
wikipedia_en_all_mini_2026-03is about 12.4 GB raw bytes, roughly 11.6 GiB, and only gives you intros + infoboxes. (Wikimedia Downloads)wikipedia_en_all_nopic_2026-03is about 51.9 GB raw bytes, roughly 48.4 GiB, and gives you full article text without images. (Wikimedia Downloads)wikipedia_en_wp1-0.8_nopic_2026-04is about 2.0 GB and is the nicest “broad but sane” full-text English subset we discussed.wikipedia_en_top_nopic_2026-03is about 2.22 GB and is another broad compact option. (Wikimedia Downloads)Best bang-for-buck bundles
~5 GB class, full text, no images
A very sane bundle is:
That totals about 4.68 GB decimal. Practical read: broad core plus medicine/history/hard science/geography, all as full article text without images. (Wikimedia Downloads)
~10 GB class
A clean upgrade is the same bundle plus movies:
That brings the bundle to about 6.81 GB decimal, leaving plenty of room under a 10 GB target for one or two more topic packs later. (Wikimedia Downloads)
Operational model
The sane maintenance model is:
That is because Kiwix currently says no incremental updates for Wikipedia ZIMs. So your biannual plan is viable, but it is a snapshot-swap regime, not “keep matching pages and only patch deltas.” (Kiwix)
Bottom line
Yes, this can increase the effective intelligence of your 4B model without touching weights. But only if you treat ZIM/Kiwix as a deterministic local corpus with a thin adapter, not as “giant article dump goes straight into context.”
The winning design is:
/vieweror/rawnopictopic packsPS: ZIM files are here btw
https://dumps.wikimedia.org/kiwix/zim/wikipedia/
Somewhere in my documents, I have a scoped ticket for how to use kiwix as the source for the LLM to pull information directly from, populate its answer organically, and naturally respond to question at hand, without word-vomiting a wiki entry complete. The last I looked, you can poll the kiwix DB directly without using the search engine.
I can dig that up for you if it still exists; it’s actually why I’m looking at kiwix (back burner project for now but the spirit moved me).
PS: You’re aware of LLM-wiki? That might suit your purposes better, if your corpus is bespoke and updating. Works nicely.
https://tinyurl.com/llmwiki
So this is actively in progress, and right now I’m having trouble getting my tesla P4’s working in my proxmox environment. The P4 is supported for vgpu out of the box, allegedly, but the installer I used is forcing a kernel version pin which isn’t making me happy:
https://github.com/anomixer/proxmox-vgpu-installer/issues/16
So at this time, I’m just connecting API’s.