Offline Wikipedia (and more!) with Kiwix

Kiwix is a multi-platform content browser that is designed to support offline access to large content websites, like Wikipedia or Stack Overflow, in under-developed countries. These sites have a significant amount of content, and are invaluable for researchers, professionals, or hobbyists. Offline access guarantees that the wealth of knowledge they contain is available in unpredictable circumstances, or when online access is not guaranteed, such as during power outages, long airplane trips, remote ventures, etc. More importantly, to the paranoid data hoarder, Kiwix offers an opportunity to scratch the itch of possessing the Library of Alexandria, served from a spare budget computer squirreled away in the dark corner of your basement.

The Kiwix server reads ZIM) files (a file format designed to store large-scale Wiki websites) and serves their content over HTTP to any connecting browser. The Kiwix organization provides pre-packages ZIM files in an online library. Because content size can be quite large (for example, a full Wikipedia ZIM file is ~100GB), different ZIM configurations exist for various sources. For example, a paired down Wikipedia exists without images which reduces the file size while preserving valuable written content. A configured Kiwix server can serve content from a variety of ZIM files.

Kiwix installers exist for Android, Windows, Mac, iOS, Linux and Raspberry Pi. A small suite of command line tools can be downloaded for headless servers, and a Docker image is available for machines running Docker containers.

For my personal setup I installed the command line tools on a spare Debian server I run for experimental projects. In my setup:

  • the Kiwix server is run automatically when the system is fully booted, launched as a systemd service
  • the server runs under a special kiwix system user account, and only has access to the directory that contains the Kiwix library file (an XML catalog of available ZIMs) and the downloaded ZIM files

In Debian it is relatively easy to create a system user and directory for the Kiwix libraries:

1
2
3
4
$ sudo groupadd --gid 200 kiwix
$ sudo adduser --system --no-create-home --disabled-password --disabled-login --uid 200 --gid 200 kiwix
$ sudo mkdir -p /var/local/kiwix
$ sudo chown -R kiwix:kiwix /var/local/kiwix

Now to download some ZIMs. I browsed the Kiwix online library and found the Wikipedia ZIM, and downloaded the file to my data directory. Since the download was sizable, I used aria2 to download the ZIM file because of its support for resume-able downloads. I throttled the download to keep bandwidth usage low (the download server is provided gratis, after all).

1
2
3
$ cd /var/local/kiwix
$ sudo -u kiwix aria2c --max-download-limit=512K \
https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim

Once the ZIM file finished downloading, I created a library file with kiwix-manage and added the Wikipedia file as an entry. (By convention the library file created by kiwix-manage should be named library_zim.xml.)

1
2
$ sudo -u kiwix kiwix-manage /var/local/kiwix/library_zim.xml \
add /var/local/kiwix/wikipedia_en_all_maxi_2024-01.zim

Later I experimented with adding additional ZIM in exactly the same way. (To remove a ZIM file from the library, the remove keyword is used instead of add.)

Before setting up my systemd service, I ran the kiwix-serve command directly to see if I could actually connect to my Kiwix process.

1
2
$ sudo -u kiwix kiwix-serve --library --port=8000 --verbose \
/var/local/kiwix/library_zim.xml

I pointed my desktop web browser at my Debian server (http://192...:8000) and verified that my Wikipedia instance could be browsed.

Wikipedia, along with other ZIMs I installed later.

Browsing the Wikipedia ZIM file.

Having verified that my Kiwix instance was configured correctly, I created a systemd service file to launch the instance at machine boot, running as the kiwix user…

1
$ sudo vim /etc/systemd/system/kiwix.service

…which looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Unit]
Description=Kiwix service
After=network.target network-online.target

[Service]
Type=simple
Restart=always
RestartSec=15
User=kiwix
Group=kiwix
WorkingDirectory=/var/local/kiwix
ExecStart=/usr/bin/kiwix-serve --library --port=8000 --verbose /var/local/kiwix/library_zim.xml

[Install]
WantedBy=multi-user.target

The User, Group, and ExecStart options control how the kiwix-serve process starts. The After and WantedBy options control when, and under what conditions, the service is actually started – in this case after the server has network access, and when it is running in multi-user mode (so normal operating conditions).

The systemd daemon then needed to be told about this new service, and the service itself, enabled.

1
2
3
4
$ sudo systemctl daemon-reload
$ sudo systemctl enable kiwix.service
$ sudo systemctl start kiwix.service
$ sudo journalctl -u kiwix

The journalctl command shows the systemd journal output of the service during initialization. If the kiwix-serve command failed for any reason, or if there was a path/permission issue, it was recorded here. (I did have a few false starts.) Refreshing my desktop browser confirmed that the service was actually running.

Overall Kiwix is a very neat project, and I’m particularly impressed that service runs so fast while reading data from a single (or multiple!), enormous file. Unfortunately the data refresh period seems to be a bit slow. The latest full Wikipedia ZIM file is over a year old, which is an eternity in the information age. But for someone who loves to read and references Wikipedia often, having offline information available in a pinch is nevertheless a luxury.