Running Apache Airflow 3.2.0 with PostgreSQL and Executing Your First DAG
Contents
A post on DEV Community walked through installing Apache Airflow 3.2.0 with PostgreSQL and running a first DAG.
It covers pip-installing Airflow on a Linux server, trying both standalone and individual process startup, reconfiguring airflow.cfg for PostgreSQL, and loading a DAG.
The original guide is screenshot-heavy and easy to follow, but cross-referencing the official docs revealed that the “just get it running” steps and the “use PostgreSQL as your metadata DB going forward” steps were somewhat interleaved. This article reorganizes the flow and adds Airflow 3.2.0-specific notes.
standalone is the SQLite entry point
The official Airflow installation page suggests pipx run apache-airflow standalone or uvx apache-airflow standalone for a quick first look.
This spins up a minimal SQLite-backed configuration automatically, which is fast for local verification.
pipx run apache-airflow standalone
# or
uvx apache-airflow standalone
The official docs explicitly state that standalone is not for production.
The reason to switch to PostgreSQL is not to store DAG data itself, but to move Airflow’s metadata database (DAG Runs, Task Instances, connections, variables, scheduler state) off SQLite.
flowchart TD
A[DAG files<br/>Python code] --> B[dag-processor<br/>parses DAGs]
B --> C[metadata DB<br/>PostgreSQL]
D[scheduler<br/>creates execution plans] --> C
D --> E[executor<br/>runs tasks]
F[api-server<br/>Web UI and API] --> C
What goes into PostgreSQL here is Airflow’s own state, not your business data.
If you want Airflow to operate on a separate PostgreSQL or analytics database, you use Connections and Providers for that.
pip install with matching constraints
When installing with pip, Airflow must be installed with a constraints file.
The original guide used the following for Python 3.12.
python -m venv airflow_venv
source airflow_venv/bin/activate
pip install --upgrade pip
pip install "apache-airflow[postgres]==3.2.0" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.12.txt"
The original guide uses apache-airflow[celery], but for just running your first DAG locally, CeleryExecutor is overkill.
If the goal is PostgreSQL as your metadata DB, starting with the postgres extra keeps dependencies aligned with the purpose.
CeleryExecutor brings in broker design decisions (Redis, RabbitMQ, etc.) that are separate from the PostgreSQL question.
Match the constraints file to your Python version.
Python 3.11 uses constraints-3.11.txt, Python 3.12 uses constraints-3.12.txt. A mismatch here can let installation succeed while causing Provider dependency breakage later.
Create the PostgreSQL user and database first
The official Airflow database setup page lists PostgreSQL 13-17 as supported.
Create a dedicated database and user for Airflow.
CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
\c airflow_db
GRANT ALL ON SCHEMA public TO airflow_user;
On PostgreSQL 15+, the public schema permission is easy to forget.
If airflow db migrate fails with a permission error, check schema-level grants in addition to database-level ones.
You can edit airflow.cfg directly, but environment variables are more reproducible.
export AIRFLOW_HOME=~/airflow
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN="postgresql+psycopg2://airflow_user:airflow_pass@localhost:5432/airflow_db"
export AIRFLOW__CORE__LOAD_EXAMPLES=False
export AIRFLOW__CORE__DAGS_FOLDER=~/workflows
Verify current values with:
airflow config get-value database sql_alchemy_conn
airflow config get-value core dags_folder
After changing the DB connection, run the metadata migration.
airflow db migrate
If you previously ran standalone with SQLite, leftover ~/airflow/airflow.cfg and database files can cause confusion.
For a clean PostgreSQL setup, either use a separate AIRFLOW_HOME from the start or verify what’s actually being read with airflow config get-value.
Airflow 3 splits api-server and dag-processor
Airflow 2-era articles still assume webserver as the main component.
In Airflow 3, the UI/API side is airflow api-server and DAG parsing is airflow dag-processor.
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com
airflow api-server --port 8080
airflow scheduler
airflow dag-processor
airflow triggerer
As the original guide notes, the user creation command requires the Flask AppBuilder Auth Manager.
If you get ModuleNotFoundError: No module named 'airflow.providers.fab', install the FAB provider.
pip install apache-airflow-providers-fab
airflow db migrate
This is specifically about using legacy user management in Airflow 3.
For just testing DAGs locally, the auto-created user from standalone is sufficient.
What is a DAG anyway
DAG stands for Directed Acyclic Graph. In graph theory, it means a structure with directed edges and no cycles.
Airflow uses this structure to define workflows.
Nodes are tasks, edges are dependencies between tasks.
”Run task B after task A finishes” is expressed as a graph, and because there are no cycles, execution order is deterministic.
flowchart LR
A[Download CSV] --> B[Validate]
B --> C[Load to DB]
B --> D[Generate report]
C --> E[Slack notification]
D --> E
In this diagram, CSV download runs first, then validation, then DB load and report generation run in parallel.
Once both finish, Slack notification fires.
If you tried to add an edge from Slack notification back to CSV download, that would create a cycle, and Airflow rejects it.
Each Airflow DAG corresponds to one Python file.
Place a .py file in dags_folder, and dag-processor parses it and registers it in the metadata DB.
The DAG itself only defines tasks and their ordering; it holds no actual data.
Two concepts tied to DAGs are DAG Runs and Task Instances.
A DAG Run represents one execution of a DAG, whether triggered by schedule or manually.
A Task Instance is one task within one DAG Run.
In the diagram above, one DAG Run creates 5 Task Instances.
The Grid view in the Web UI shows tasks vertically and DAG Runs chronologically, with each cell showing a Task Instance’s status.
The minimum parameters when defining a DAG are dag_id, start_date, schedule, and catchup.
dag_id is a string that uniquely identifies the DAG within Airflow, independent of the filename.
start_date is the point from which the scheduler starts generating DAG Runs. Combined with schedule, it means “run every 5 minutes starting from this date.”
schedule sets the interval between DAG Runs. It accepts timedelta(minutes=5) for fixed intervals, cron expressions, or Timetable objects.
catchup controls whether to backfill unexecuted runs between start_date and now. The default is True, so setting start_date to six months ago and forgetting catchup=False will generate a flood of DAG Runs on first startup.
Task dependencies use Python’s >> operator.
task_a >> task_b >> task_c
This means sequential execution: A, then B, then C.
For parallel execution, use a list.
task_a >> [task_b, task_c] >> task_d
After task_a, task_b and task_c run in parallel; once both finish, task_d runs.
set_upstream / set_downstream methods do the same thing, but >> is shorter.
Write your first DAG with airflow.sdk
The Airflow 3.2.0 release notes position airflow.sdk as the stable interface for DAG authoring.
Older samples use from airflow import DAG, but for new code, prefer the SDK.
Example placed at ~/workflows/simple_dag.py:
from datetime import datetime, timedelta
from airflow.providers.standard.operators.python import PythonOperator
from airflow.sdk import DAG
def say_hello():
print("Hello from Airflow")
def say_goodbye():
print("Goodbye from Airflow")
with DAG(
dag_id="simple_dag",
start_date=datetime(2026, 1, 1),
schedule=timedelta(minutes=5),
catchup=False,
) as dag:
hello_task = PythonOperator(
task_id="hi",
python_callable=say_hello,
)
goodbye_task = PythonOperator(
task_id="bye",
python_callable=say_goodbye,
)
hello_task >> goodbye_task
Check whether the DAG is loaded via CLI, not just the UI.
airflow dags list
airflow dags test simple_dag 2026-01-01
If it doesn’t appear in airflow dags list, check dags_folder and the dag-processor logs first.
It’s usually a Python import error, missing Provider, or wrong file location.
3.2.0: the boundary that matters is operational, not procedural
Major additions in Airflow 3.2.0 include Asset Partitioning, Multi-Team Deployments, Deadline Alerts, improved DAG processing visibility, and Grid view performance improvements.
None of these need to be understood just to run your first DAG.
However, once you’ve switched to a PostgreSQL configuration, the surface area increases.
The official database setup page notes that Airflow opens many connections to the metadata DB, so production PostgreSQL configurations should use PGBouncer.
On managed PostgreSQL services (RDS, Cloud SQL, Azure Database for PostgreSQL), idle connections may be terminated, resulting in SSL SYSCALL error: EOF detected, so keepalive settings also need attention.
| Configuration | Why it matters |
|---|---|
sql_alchemy_conn | Verify Airflow is actually pointing to PostgreSQL |
dags_folder | Align file location with what dag-processor watches |
executor | LocalExecutor, CeleryExecutor, KubernetesExecutor each bring different infrastructure |
load_examples | Remove sample DAGs so only your DAGs are visible |
| DB connection count | scheduler, api-server, dag-processor, triggerer all hit the same metadata DB |
| Auth and exposure | Don’t expose port 8080 directly to the internet |
I previously wrote about Dagu’s unauthenticated API leading to RCE in CISA KEV additions.
Airflow and Dagu are different products, but the risk of exposing a management UI/API that accepts workflow definitions is the same.
If running Airflow on a VPS, set up a reverse proxy, authentication, firewall, and non-root execution first.
For PostgreSQL performance and kernel-level considerations, the Linux kernel 7.0 PREEMPT_NONE removal halving PostgreSQL throughput article is relevant.
Airflow’s metadata DB won’t be as large as a production business database, but DAG count, Task Instance volume, and history retention duration add up.
When cron stops being enough
If the job is “download a CSV at 2am daily and load it into a DB,” a single cron line with a shell script handles it fine.
No need for Airflow.
DAGs become worthwhile when workflows have branching and convergence.
CSV download, validation, DB load, report generation, Slack notification in sequence, where a validation failure should prevent DB load, and a report generation failure should allow re-running just that step.
Trying to do this with a cron script means growing if and exit statements inside the script, with success tracking dependent on log parsing.
”Resume from step 3 after a failure” requires building that mechanism yourself.
Airflow holds this structure in the DAG itself.
Success, failure, and retry states are recorded per task in the metadata DB, so you can re-run just the failed task from the Web UI.
The cron-script pattern of “start everything over from scratch” goes away.
Another advantage is parallel execution.
In the earlier Mermaid diagram, DB load and report generation run in parallel after validation.
Parallelizing in a cron script means backgrounding with & and wait, but handling a failure in just one branch immediately gets complicated.
With a DAG, write dependencies with >> and the scheduler handles parallel execution and convergence automatically.
Backfill is also quietly useful.
When you need to reprocess all of last month’s data, airflow dags backfill simple_dag --start-date 2026-04-01 --end-date 2026-04-30 generates DAG Runs for the specified period.
With a cron script, you’d write date parameters into the script and loop over them yourself.
Conversely, for a single straight-line process where full restart on failure is acceptable, no backfill is needed, Airflow is overkill.
Running scheduler, api-server, dag-processor, and PostgreSQL as four or more always-on processes is far more maintenance than a single cron line.