Latest Post

Processing 3GiB of JSON in the browser in 2 seconds via WebAssembly

2024-06-24 ⋅ Comments

There was a bit of a buzz recently about Discord analyzing your data to guess your age and gender (an unfortunately rather common practice). You can "easily" view the resulting predictions in your data dump’s analytics files:

  "user_id": "000000000000000000",
  "predicted_gender": "male",
  "probability": 0.85245013236999512,
  "prob_male": 0.85245013236999512,
  "prob_female": 0.066451840102672577,
  "prob_non_binary_gender_expansive": 0.081098064780235291,
  "model_version": "2023-03-22T00:00:00.000000Z",
  "day_pt": "2023-03-29 00:00:00 UTC"

Well, okay, "easily" might be a bit of a lie: these analytics files clock in at multiple GiBs large (mine is 3.2GiB) and have millions of events, which can make trying to examine them a bit tricky. This left me thinking about the prospects of a tool that could digest these into pretty charts and lines for your amusement, ideally without taking too long or using too many resources.

After all, how hard can it be to parse 3GiB of JSON anyway?