Tutorial Beginner-friendly

From zero to your first query.

Eight steps. No terminal. No SQL. Written for researchers and clinicians who want the data without the data-engineering detour.

01

Install the app

Go to the Download page, pick the installer for your system, and run it. On Windows, click Next a few times. On macOS (Apple Silicon), drag the app to the Applications folder. On Linux, give the AppImage execute permission and open it. No Python or database needs to be installed separately.

note If your Mac has an Intel processor, there is no native installer — follow the Intel Mac walkthrough on the Download page instead (it uses Python from source). Unsigned-app warnings on Windows/macOS are explained on the same page.
02

Open it from the shortcut

After installing, find DataSUS ETL in your Start Menu (Windows), Applications folder (macOS), or app launcher (Linux). Click it. A window opens in your default browser — that is the app.

note The address bar will show 127.0.0.1:8787. That means the app is running on your own computer, not on the internet.
03

Choose where your data will live

The first screen asks where to store the downloaded files. Pick a folder with plenty of space (a full subsystem can be 10–40 GB over many years). External drives work. The app will create a subfolder named datasus_db/ inside the folder you choose.

note You can change this later from the Settings page.
04

Pick a subsystem

The app supports SIHSUS (hospital admissions) and SIM (mortality). Choose whichever matches the question you are trying to answer. For a quick first experiment, SIM for a single state is the smallest and fastest.

note SIM is published with about a two-year delay. The app tells you which months are actually available before starting.
05

Pick dates and states

Choose a date range (start and end) and one or more Brazilian states. The app shows you an estimate of how many files will be downloaded and their total size before you commit to anything.

note Narrow first, widen later. Two months of SIM for one state takes under a minute on a typical connection.
06

Watch the pipeline work

Click Start. A progress screen shows each step in real time: downloading the raw files, converting from DBC to a readable format, loading them into DuckDB, transforming and enriching, and finally writing Parquet files. You can leave it running in a background tab.

note If something goes wrong (network drop, disk full), the log panel tells you which file and why. Restart the app and run the same range again — it picks up where it left off.
07

Where your files end up

When the pipeline finishes, your data lives in the folder you picked, inside datasus_db/<subsystem>/. Files are saved in the Parquet format — organized by state so you can grab just the slice you need.

note Parquet is the standard columnar format used by pandas, polars, Excel (via Power Query), R, and DuckDB. It is not a DataSUS-ETL proprietary format.
08

Explore through the app — or by SQL

The Query page lets you explore tables, filter by column, and export to CSV or Excel without writing SQL. If you are comfortable with SQL, open the DuckDB shell from the Status page and write your own queries against the Parquet files directly.

note The app never uploads your data anywhere. Everything stays on your computer.

Something broke?

The app is under active development. If a screen doesn't match this tutorial, or an error message makes no sense, open an issue on GitHub — include what you clicked and the exact text of the error. You will get a response.

Going deeper

When you are ready to automate — cron jobs, batch processing, direct parquet access from R or Python — read the technical documentation. The same app exposes a full command-line interface.

From zero to your first query.

Install the app

Open it from the shortcut

Choose where your data will live

Pick a subsystem

Pick dates and states

Watch the pipeline work

Where your files end up

Explore through the app — or by SQL