datasus-etl
Tutorial Beginner-friendly

From zero to your first query.

Eight steps. No terminal. No SQL. Written for researchers and clinicians who want the data without the data-engineering detour.

  1. 01

    Install the app

    Go to the Download page, pick the installer for your system, and run it. On Windows, click Next a few times. On macOS (Apple Silicon), drag the app to the Applications folder. On Linux, give the AppImage execute permission and open it. No Python or database needs to be installed separately.

  2. 02

    Open it from the shortcut

    After installing, find DataSUS ETL in your Start Menu (Windows), Applications folder (macOS), or app launcher (Linux). Click it. A window opens in your default browser — that is the app.

  3. 03

    Choose where your data will live

    The first screen asks where to store the downloaded files. Pick a folder with plenty of space (a full subsystem can be 10–40 GB over many years). External drives work. The app will create a subfolder named datasus_db/ inside the folder you choose.

  4. 04

    Pick a subsystem

    The app supports SIHSUS (hospital admissions) and SIM (mortality). Choose whichever matches the question you are trying to answer. For a quick first experiment, SIM for a single state is the smallest and fastest.

  5. 05

    Pick dates and states

    Choose a date range (start and end) and one or more Brazilian states. The app shows you an estimate of how many files will be downloaded and their total size before you commit to anything.

  6. 06

    Watch the pipeline work

    Click Start. A progress screen shows each step in real time: downloading the raw files, converting from DBC to a readable format, loading them into DuckDB, transforming and enriching, and finally writing Parquet files. You can leave it running in a background tab.

  7. 07

    Where your files end up

    When the pipeline finishes, your data lives in the folder you picked, inside datasus_db/<subsystem>/. Files are saved in the Parquet format — organized by state so you can grab just the slice you need.

  8. 08

    Explore through the app — or by SQL

    The Query page lets you explore tables, filter by column, and export to CSV or Excel without writing SQL. If you are comfortable with SQL, open the DuckDB shell from the Status page and write your own queries against the Parquet files directly.

Something broke?

The app is under active development. If a screen doesn't match this tutorial, or an error message makes no sense, open an issue on GitHub — include what you clicked and the exact text of the error. You will get a response.

Going deeper

When you are ready to automate — cron jobs, batch processing, direct parquet access from R or Python — read the technical documentation. The same app exposes a full command-line interface.