Even if it is not necessarily the intended use case, Docker makes it very easy to package large amounts of data with applications for simplified provisioning (such as APIs). This can be particularly helpful if, for example, standardisation data from different sources is to be merged for data enrichment.
GND
The Gemeinsame NormDatei (GND) contains information on persons, corporate bodies etc. that are deposited with the National Library.
The image is based on Apache Jena and utilises the HDT dump. However, the necessary module does not yet support Jena / Fuseki 5.0.
The container can be started easily:
docker run -it -p3030:3030 ghcr.io/cmahnke/data-containers/gnd:latest /bin/sh
After starting, the database can be conveniently searched in the browser: http://localhost:3030/#/dataset/gnd/query
GeoNames
GeoNames contains a lot of information on many geographical entities, including coordinates, spelling variants and hierarchisation by area.
However, the container only contains the coordinates and spelling variants, e.g. to be able to retrieve the coordinates for a location. The data is uploaded to an Apache Solr instance for this purpose and can be accessed after starting
docker run -p 8983:8983 -it ghcr.io/cmahnke/data-containers/geonames
Simply query with curl
and the result is returned as JSON:
curl http://localhost:8983/solr/geonames/query?debug=query&q=n:G%C3%B6ttingen
Further use
The commands for starting can also be combined / automated using docker-compose
, e.g. together with tools for further analysis or enrichment.