---
title: "Package architecture"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Package architecture}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(eval = FALSE)
```

This article documents tidybrreg's internal architecture. It is
intended for contributors and users who want to understand the
data flow from the brreg API through parsing, storage, and panel
construction.

For the full technical specification (390 lines), see
[ARCHITECTURE.md](https://github.com/sondreskarsten/tidybrreg/blob/main/ARCHITECTURE.md)
in the repository root.

## Data flow overview

```
┌─────────────────────────────────────────────────────────────────┐
│                     brreg API (data.brreg.no)                   │
│                                                                 │
│  /enheter/{orgnr}          Single entity JSON                   │
│  /enheter?navn=...         Filtered search (paginated)          │
│  /enheter/lastned/csv      Bulk CSV  (152 MB, ~90 cols)         │
│  /enheter/lastned          Bulk JSON (196 MB, ~67 cols)         │
│  /roller/totalbestand      Bulk roles JSON (131 MB)             │
│  /oppdateringer/enheter    CDC stream (HAL format)              │
│  /oppdateringer/roller     CDC stream (CloudEvents format)      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                ┌──────────────┴──────────────┐
                │         HTTP layer          │
                │  R/request.R: brreg_req()   │
                │  httr2 + retry + throttle   │
                └──────────────┬──────────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
   ┌──────┴──────┐     ┌──────┴──────┐     ┌──────┴──────┐
   │ Pipeline 1  │     │ Pipeline 2  │     │ Pipeline 2  │
   │ Single JSON │     │ Bulk CSV    │     │ Bulk JSON   │
   │             │     │             │     │             │
   │ flatten_json│     │ readr::     │     │ jsonlite::  │
   │ rename_dict │     │ read_csv    │     │ fromJSON    │
   │ coerce_types│     │             │     │ flatten_    │
   │             │     │             │     │ list_cols   │
   │             │     │             │     │ drop_links  │
   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
          │                   │                    │
          │            ┌──────┴────────────────────┘
          │            │
          │     ┌──────┴──────┐
          │     │ rename_and_ │
          │     │ coerce()    │
          │     │ field_dict  │
          │     │ + to_snake  │
          │     └──────┬──────┘
          │            │
          └────────────┤
                       │
              ┌────────┴────────┐
              │  Flat tibble    │
              │  atomic cols    │
              │  English names  │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          │            │            │
   ┌──────┴──────┐    │     ┌──────┴──────┐
   │  Return to  │    │     │  Snapshot   │
   │  user       │    │     │  engine     │
   │             │    │     │  write_     │
   │             │    │     │  parquet    │
   └─────────────┘    │     │  + manifest │
                      │     └──────┬──────┘
                      │            │
               ┌──────┴──────┐    │
               │  Label      │    │
               │  system     │    │
               │  brreg_     │    │
               │  label()    │    │
               └─────────────┘    │
                                  │
                     ┌────────────┴────────────┐
                     │     Snapshot store       │
                     │  Hive-partitioned        │
                     │  Parquet files           │
                     │                          │
                     │  enheter/                │
                     │    snapshot_date=.../     │
                     │      data.parquet        │
                     │      raw/*.gz            │
                     │  manifest.json           │
                     └────────────┬─────────────┘
                                  │
                 ┌────────────────┼────────────────┐
                 │                │                │
          ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
          │ brreg_panel │ │ brreg_      │ │ brreg_      │
          │ multi-snap  │ │ events()    │ │ series()    │
          │ LOCF        │ │ diff two    │ │ arbitrary   │
          │             │ │ snapshots   │ │ aggregation │
          └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
                 │               │               │
                 └───────────────┼───────────────┘
                                 │
                        ┌────────┴────────┐
                        │  as_brreg_      │
                        │  tsibble()      │
                        │  → tidyverts    │
                        └─────────────────┘
```

## Key design decisions

**Flat tidy output.** JSON nested objects and list columns are
algorithmically flattened to atomic types. Character vectors
are collapsed with `"; "`, data frames serialized to JSON strings,
HAL links dropped. Both CSV and JSON paths share `rename_and_coerce()`.

**Zero-drop policy.** Unknown API fields pass through with auto
snake_case names. `arrow::open_dataset(unify_schemas = TRUE)`
handles schema evolution across snapshot dates.

**Raw file provenance.** Every snapshot stores the original `.gz`
alongside processed Parquet. The JSON manifest records HTTP headers
(`Last-Modified` for data vintage, ETag for change detection),
file hashes, and record counts.

**Two panel paths.** Multi-snapshot diff (`brreg_panel()`) for
historical analysis with old+new values. CDC replay
(`brreg_replay()`) for forward reconstruction from a single base.

See [ARCHITECTURE.md](https://github.com/sondreskarsten/tidybrreg/blob/main/ARCHITECTURE.md)
for the complete specification.