For the video metadata of 911 TV there was an interesting challenge to solve…
As a rule, I try to design applications to be as static as possible: Fewer moving parts / dependencies greatly reduce potential problems.
This also includes payload and metadata that doesn’t change that often. In the case of 911TV, this is the assignment of the channel and time code to the video file, but when describing 3,000 hours of international TV news from 20 channels over 7 days1 (usually split into blocks of 30 minutes), some data is generated.
For the application, this data is extracted from the website of the Internet Archive and the headers of the videos are also read out, as the material has many small gaps. At the end there is a 1.62MB JSON file, which in turn becomes part of the application bundle in the next step…
That’s a lot, and every bundler will throw up a few warnings. Of course, you can remove a few more spaces and line breaks, but the difference is not very big. The attempt to try JSON Crush failed: The algorithm scales very poorly and was not able to compress the JSON in a realistic time, plus it doesn’t seem to be able to work on multiple CPUs.
The second idea was to use a more classic compression format and then package the result in a JSON snippet so that you can continue to work with the classic tools of a bundler.
Methods
lz-string
is the JavaScript variant of the LZ algorithm Since the compressed data is binary, it must be encoded using BASE64 for storage as a character string.brotli-unicode
is a JavaScript variant of the Brotli algorithm with a very clever addition, it uses the complete Unicode character set to encode binary data.
Results
Method | Größe |
---|---|
Uncompressed | 1.62 MB |
lz-string |
216 KB |
brotli-unicode |
87.9 KB |
Integration
Preparation
As there is a preparatory step to collect the data in the first place, the data can also be created in this step. Here is a simple script for execution with NodeJS, the necessary modules must of course be installed:
import fs from 'fs';
// Needed to import legacy code
import { createRequire } from "module";
import { parseArgs } from "node:util";
const require = createRequire(import.meta.url);
const LZString = await import(require.resolve("../site/node_modules/lz-string"));
const lzCompress = LZString.default.compressToBase64;
const Brotli = await import(require.resolve("../site/node_modules/brotli-unicode"));
const brotliCompress = Brotli.default.compress;
const defaultMethod = "lz-string";
const {
values: { input, output, type },
} = parseArgs({
options: {
input: {
type: "string",
short: "i",
},
output: {
type: "string",
short: "o",
},
type: {
type: "string",
short: "t",
},
},
});
let method = type;
if (type === undefined) {
method = defaultMethod;
}
if (input !== undefined) {
console.error(`Reading file ${input}`);
const fileContents = fs.readFileSync(input).toString()
let compressed
if (method == "jsoncrush") {
console.error(`Crushing JSON`);
compressed = crush(fileContents);
} else if (method == "lz-string") {
console.error(`Compressing JSON`);
compressed = lzCompress(fileContents);
} else if (method == "brotli") {
console.error(`Compressing JSON`);
compressed = await brotliCompress(Buffer.from(fileContents));
} else {
console.error(`unknown compression type ${method}`);
process.exit(1);
}
const out = {'type': method, 'content': compressed}
if (output !== undefined) {
console.error(`Writing file ${output}`);
fs.writeFileSync(output, JSON.stringify(out));
} else {
console.log(out)
}
}
In principle, however, it is also conceivable to realise the whole thing as a Vite or Rollup plugin…
In the application
The JSON for the integration contains a very simple header and the compressed content:
{"type":"brotli","content":"[COMPRESSED DATA]"}
In der Anwendungsdatei, die das JSON dann braucht:
import {decompress} from "brotli-unicode/js";
import { decompressFromBase64 } from "lz-string";
import jsonImport from "./assets/json/json-compressed.json";
function parseJson(json) {
if (typeof json == "object" && Object.keys(json).length == 2) {
if ("type" in json && json.type === "lz-string") {
return JSON.parse(decompressFromBase64(json["content"]));
} else if ("type" in json && json.type === "brotli") {
console.log("'brotli' isn't supported yet!");
return import("brotli-unicode/js").then((Brotli) => {
const decompressed = Brotli.decompress(json["content"]);
return JSON.parse(TextDecoder.decode(decompressed));
});
}
}
return json;
}
//
let json = parseJson(jsonImport);
Summary
JSON files can be compressed and bundled very efficiently using unicode-brotli
. However, it is important to note that the data is decompressed at runtime and therefore takes up all the space in the client’s working memory. However, this is tolerable up to a certain size of the output file compared to the other advantages.
-
Understanding 9/11 - A Television News Archive, Internet Archive. ↩︎