Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions db-cleanup/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,22 @@ A maintenance workflow that you can deploy into Airflow to periodically clean ou

3. Copy the airflow-db-cleanup.py file to this dags directory

a. Here's a fast way:
a. Here's a fast way:
```
wget https://raw.githubusercontent.com/teamclairvoyant/airflow-maintenance-dags/master/db-cleanup/airflow-db-cleanup.py
```
5. Update the global variables (SCHEDULE_INTERVAL, DAG_OWNER_NAME, ALERT_EMAIL_ADDRESSES and ENABLE_DELETE) in the DAG with the desired values

$ wget https://raw.githubusercontent.com/teamclairvoyant/airflow-maintenance-dags/master/db-cleanup/airflow-db-cleanup.py

4. Update the global variables (SCHEDULE_INTERVAL, DAG_OWNER_NAME, ALERT_EMAIL_ADDRESSES and ENABLE_DELETE) in the DAG with the desired values

5. Modify the DATABASE_OBJECTS list to add/remove objects as needed. Each dictionary in the list features the following parameters:
6. Modify the DATABASE_OBJECTS list to add/remove objects as needed. Each dictionary in the list features the following parameters:
- airflow_db_model: Model imported from airflow.models corresponding to a table in the airflow metadata database
- age_check_column: Column in the model/table to use for calculating max date of data deletion
- keep_last: Boolean to specify whether to preserve last run instance
- keep_last_filters: List of filters to preserve data from deleting during clean-up, such as DAG runs where the external trigger is set to 0.
- keep_last_group_by: Option to specify column by which to group the database entries and perform aggregate functions.

6. Create and Set the following Variables in the Airflow Web Server (Admin -> Variables)
7. Create and Set the following Variables in the Airflow Web Server (Admin -> Variables)

- airflow_db_cleanup__max_db_entry_age_in_days - integer - Length to retain the log files if not already provided in the conf. If this is set to 30, the job will remove those files that are 30 days old or older.

7. Enable the DAG in the Airflow Webserver
8. Enable the DAG in the Airflow Webserver