10 releases

new 0.3.2	May 17, 2024
0.3.1	May 15, 2024
0.2.3	May 10, 2024
0.1.2	May 5, 2024

#221 in Database interfaces

855 downloads per month

MIT license

195KB
4.5K SLoC

Careful with That Lock, Eugene

eugene is a linter and command line tool for reviewing SQL migration scripts for postgres.

Careful with That Lock, Eugene: Part 2 is a blog post that serves as an introduction to the tool and the problem it is trying to solve. Since that was written, eugene has gained SQL parser support and can do syntax tree analysis to discover some problematic migration patterns without running the SQL scripts.

Installation

You can install eugene from crates.io using cargo from rustup:

cargo install eugene --bin

Releases are published to github as a binary with no dependencies, so you can also download the binary from the release page

The binary isn't signed and notarized, so on macos it'll give you a warning. If you want to proceed anyway, you can use xattr -d com.apple.quarantine eugene to remove it.

To perform a local installation of your checkout out repository, you can use:

cargo install --path . --bin

Usage

eugene has a help command that should be fairly intuitive and can show you how to use the tool:

eugene help

Docker images

You can use the docker image ghcr.io/kaaveland/eugene to run the tool. For example:

docker run --rm -it \
  -ePGPASS=postgres \
  -v./examples/:/examples \
  ghcr.io/kaaveland/eugene:latest \
  lint /examples/alter_column_not_null.sql

Viewing migration hints

eugene knows about some common migration patterns that can cause problems with locks and in many cases, it can suggest workarounds. To see what hints are available, run:

eugene hints

I have shamelessly stolen many such hints from inspirational projects like strong_migrations and blog posts like PostgreSQL at Scale: Database Schema Changes Without Downtime.

Explaining lock modes

eugene knows about the lock modes in postgres, and can explain them to you:

eugene modes

eugene explain AccessExclusive

Use eugene modes or refer to the postgres documentation to learn more about lock modes.

Lint reports

You can use eugene lint to check a SQL script for common problems that can cause unexpected downtime when running the script in a production database, in this repository:

eugene lint --ignore E9 examples/add_check_constraint.sql

Which should produce output like this:

{
  "lints": [
    {
      "statement_number": 1,
      "sql": "alter table books add constraint check_title_not_null check (title is not null) not valid",
      "lints": []
    },
    {
      "statement_number": 2,
      "sql": "alter table books validate constraint check_title_not_null",
      "lints": [
        {
          "id": "E4",
          "name": "Running more statements after taking `AccessExclusiveLock`",
          "condition": "A transaction that holds an `AccessExclusiveLock` started a new statement",
          "effect": "This blocks all access to the table for the duration of this statement",
          "workaround": "Run this statement in a new transaction",
          "help": "Running more statements after taking `AccessExclusiveLock`"
        }
      ]
    }
  ]
}

eugene lint works by parsing the SQL script with pg_query.rs, which is the actual postgres parser. However, it can't know the state of your database, since it only does syntax tree analysis, so it is more prone to false positives than the other option, which is to run the migration script and investigate its effects, before rolling back.

Lints can also be ignored by commenting the SQL statement that triggers it:

-- eugene: ignore E3, E4
alter table books add column meta json; -- would normally trigger E3 adding a json column

Note that such comments must appear before the statement they're affecting.

Lock tracing reports

eugene can produce reports in a verbose markdown that is suitable for human reading. Take a look at the examples to see how the output looks.

eugene can also produce a json output that is more suitable for machine processing.

To review a SQL script for locks, you will need to run eugene trace and provide some connection information to a database. For example, for the local docker-compose setup:

# You can use ~/.pgpass for the password, or set it in the environment
export PGPASS=postgres 
docker compose up -d
sleep 5 # wait for the database to start
eugene trace --database example-db \
  --format json \ # or markdown
  examples/add_authors.sql

You should see some output that looks like this, but with a lot more details:

{
  "name": "add_authors.sql",
  "start_time": "2024-05-05T21:27:09.739410+02:00",
  "total_duration_millis": 1015,
  "all_locks_acquired": []
}

Note that eugene only logs locks that target relations visible to other transactions, so it does not log any lock for the author table in this instance. By default, eugene trace will roll back transactions, and you should pass -c or --commit if this is not what you want.

Like eugene lint, eugene trace can also ignore lints by commenting the SQL statement that triggers it, or by providing --ignore E3,E4 on the command line.

Complex SQL scripts and variables

eugene trace supports simple placeholders in the SQL script, so that statements like set role ${dba}; can be used by providing --var dba=postgres on the command line. Any number of vars may be provided by repeating the option.

Compatibility

eugene is tested with postgres versions >= 12 on linux, and is also tested on macos and windows for a narrower range of versions. It doesn't intentionally use any platform specific features or new features and should work with all of those. We build images for linux on debian:slim with the gnu toolchain.

Contributing

Contributions are welcome, but there's no roadmap for this project yet. Feel free to open an issue, ideas and discussion are very welcome. If you see an issue you'd like to fix, but don't know where to start, feel free to ping @kaaveland to ask for help, or just to let him know you're working on it.

Building

You can build the project with cargo build and run the tests with cargo test. The tests need to connect to a postgres database. The easiest way to do this is to use the docker-compose setup at the root of the repository:

docker-compose up -d
cargo test

Documentation

You can browse this locally with:

cargo doc --open

Docs are also hosted at docs.rs.

Releasing

To release a new version:

Update the version in Cargo.toml
Make sure to build so that Cargo.lock is updated
Commit the changes and push to the main branch
Tag the commit and push the tag
GitHub Workflows pick up the tag and build and release the new version to crates.io

High level design

Lock tracing

The central idea is to run the SQL script statements in a transaction, and check what effects they have on the state of the database:

What locks are taken
What changes are done tables, constraints, columns
What indexes are created or dropped

The tracing module is responsible for storing this kind of state after running SQL statements in a transaction. Other principles are:

src/bin/eugene.rs should contain only code related to the command line interface and standard in/err/out.
Structs that are serializable go in output
Structs that have public fields go somewhere in output::output_format
We prefer not to expose public fields of anything in tracing
That means we need to map from tracing to output to serialize output or expose fields
- We .clone() liberally for this purpose, because eventually we'd like make the structs Deserialize.

Linting

pg_query.rs breaks the script into statements and we convert each statement into its syntax tree. These trees are pretty complex, because they can contain all possible syntax in postgres, so they're converted to a more lightweight representation that fits better for writing linting rules, see src/linting/ast.rs. Lint rules need a context, which is built gradually from statements within each script, in addition to the lightweight syntax tree to work. This avoids some false positives, by allowing the lints to skip checking statements that affect objects that can't be visible to concurrent transactions. This means that eugene lint will not trigger on a create index statement where the table was created in the same transaction.

Tests

Tests are welcome and come in two flavors:

Unit tests go in the same file as the code they test. They are allowed to use a database connection, corresponding to the docker-compose setup or the github workflow for the tests
Integration tests go in the tests directory. These can only access public interfaces and therefore would the the right place to gauge how dependents would see the tool. In particular, we take snapshots of markdown reports that go in the examples directory, which we can use to track changes in the output format.

Migration tool

eugene is not a migration tool like flyway or liquibase, and isn't intended to be one. There are many excellent migration tools already, and the scope of eugene is only to help develop migrations that are less likely to cause problems in a database that is in use by application code.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Dependencies

~30–44MB
~810K SLoC