datasus-etl
Stable v0.1.11 GPL-3.0

Brazilian public-health data,
from FTP to queryable parquet.

An open-source ETL for DATASUS. Downloads from the FTP, converts DBC to DBF to DuckDB, enriches with IBGE and CID-10 references, and writes partitioned parquet — in one command. Built for researchers who need accurate data, easily.

or pip install datasus-etl
Subsystems 02
States 27
Municipalities 5,571
Output parquet · duckdb
License GPL-3.0
What it does

Four stages, one command.

01

Downloads

Pulls DBC files straight from the DATASUS FTP. Selects by subsystem, date range, and state — nothing more, nothing less.

02

Converts

DBC → DBF → DuckDB → Parquet, all in-process with no CSV intermediates. Streaming inserts keep memory predictable on multi-GB reads.

03

Enriches

Joins IBGE municipal codes (5,571 municipalities), CID-10 validation, and categorical mappings automatically. Output ships with clean schemas.

04

Exposes

Browse through the local web UI, query with DuckDB SQL, or read the partitioned parquet from anywhere — polars, pandas, R, Arrow.

Audiences

Built for two kinds of user.

Researcher · clinician

Click, pick, go.

Install the app, click the desktop shortcut, pick a folder. Choose a subsystem and a date range. The app downloads and processes everything locally. Query through a dropdown-driven web UI — no SQL needed.

Read the tutorial →
Developer · data team

A real CLI, real parquet.

The same installer exposes the full datasus CLI. Scriptable pipelines, a Python API, and DuckDB as the query surface. The output is Hive-partitioned parquet — pipe it into your existing stack.

Read the docs →
Research context

Built inside a CNPq research group.

Developed by Nycholas Maia in technical collaboration with Paulo Alves Maia (FUNDACENTRO) within the CNPq research group "Mudanças Climáticas e Segurança e Saúde no Trabalho" (Climate Change and Occupational Safety and Health).

CNPq research group ↗
Install

Ready when you are.

The button detects your OS. Full platform table, checksums, and install notes on the download page. Each release is cut from the VERSION file in the repo — the same number appears in the app footer and in datasus version.

Heads-up
  • First launch: ~20s to warm up parquet indexes.
  • Installer is unsigned; a one-time "Run anyway" click on Windows, "right-click Open" on macOS.
  • SIM publishes with a ~2-year lag; the app tells you.