Does anyone have experience making tests against real databases fast? I resonate...

TobiasBales · on Dec 30, 2024

I think one helpful thing is what rails calls transactional tests (https://guides.rubyonrails.org/testing.html#transactions). It basically does the database setup (migrations and seeds) and then executes the tests in a transaction that is rolled back at the end (and thus never committed). This helps with speed and also ensuring that tests don't have as much accidental codependence as they might otherwise.

If you use read replicas in production code this can become tricky though since the transactions don't commit they never become visible to the reader or even different connections to the same database

lbriner · on Dec 30, 2024

A real database should not be slow. Even with our tests running against a hosted SQL Server on a separate server, the database is never the slow part. For other tests, we run with the same database in a local Docker container with Docker Compose and it is fast and isolated/resettable.

Most tests should be unit tests, which are super fast. Integration and UI tests that might use the database should be fewer and if the database is slow, it might be related to your specific application or unoptimized database queries, our database calls are usually < 10ms

nerdponx · on Dec 30, 2024

What I've done is make a clone of the real database, with a sample of data that has enough variety/size to test whatever it is you need to test, but no bigger. It definitely takes some thinking, planning, and writing of code, but it's worth doing.

Unfortunately I maintain an app where the database (read-only) is Snowflake, and being more of a "warehouse" than "database" there's always a lot of overhead in running any query at all. Even just `select 1` can take a few seconds. So there's only so much you can do with that, but setting up your data so that tests can be parallelized helps as well.

However your tests against a proper OLTP database should be plenty fast, unless your app itself is slow or your test fixtures require some really complicated setup.

pdimitar · on Dec 30, 2024

On my machine -- which is quite middle range at this point, not even high end -- I get by just fine up to 3000 tests, with Elixir at least. When I was at that contract the ~3200 tests ran in something like 40 seconds.

What kinds of troubles do you have with using a real DB for testing?

avg_dev · on Dec 30, 2024

Don't know what language or database you use, but check this out: https://github.com/peterldowns/pgtestdb

If you happen to use Postgres, the approach is ultimately portable: it uses Pg database templates (also, regarding perf, the author recommends using a ramdisk and turning off fsync on your test DBs; you'll see this in the project readme). But you’ll have to write the code yourself.

peterldowns · on Dec 31, 2024

Author here, thanks for linking my project — I hope it's been working well for you!

jph · on Dec 31, 2024

This is so great Peter-- first I've heard of pgtestdb and it's immediately useful for me. How can people donate money to the pgtestdb project? Or hire you for consulting for pgtestdb? I'm joel@joelparkerhenderson.com and would love to help fund your work.

peterldowns · on Dec 31, 2024

I sincerely appreciate the sentiment and the offer — but pgtestdb is MIT license, actual, for real, not kidding, open source. No payment necessary; please enjoy.

(I'm always open to discuss potential contracts or consulting opportunities. If you have one that you think might be a good fit, my email is in my profile here and on github and on my homepage.)

avg_dev · on Dec 31, 2024

How much of a commission do I make? ;)

thom · on Dec 30, 2024

This is what I do, it has an overhead of about 10-20ms per test and I’ve had zero flakiness. Absolute no brainier from my point of view.

peterldowns · on Dec 31, 2024

Really glad to hear it's been working for you with zero flakiness! If you ever do run into any trouble, or have any suggestions for improvements, come on over to the github issues page :)

t-writescode · on Dec 30, 2024

  My app: Kotlin, Ktor, Exposed
  Databases:
    - Production/Dev: Postgresql
    - Test suite/CI: SQLite
  Performance:
    - ~1000 tests
    - ~5 seonds

For testing anything below the ktor layer, I create and roll back transactions for everything, including db table creation (though I should probably fix that, just bigger fish to fry in this project)

For the SQLite / PostgreSQL differences that Exposed doesn't naturally handle, namely jsonb, during the CREATE TABLE phase, I have my test harness create a regular json version of the table rather than a jsonb one. Exposed's ORM seemlessly handles that swap out after table creation. There's a bit of manual upkeep in making sure the *Json test version of the tables are kept up-to-date to the production non-Json version; but that's the sort of thing that's caught on the very first test and fixed in minutes, most of the time (sometimes cascading table creation bites me).

I will eventually probably add a flag or something so the test suite can run against a separate testing partition in my PostgreSQL docker container, but I haven't done that yet.

jasonjmcghee · on Jan 1, 2025

I've been bitten by using an in memory database instead of the one production uses. If you're using GitHub Actions it's as easy as using "hoverkraft-tech/compose-action" and providing a docker compose yaml with a postgres image a step.

t-writescode · on Jan 1, 2025

Oh certainly! The problem is that it's yet another thing to add to the pile of things I need to configure, etc, at this point and it's of lower priority than many, many, many other things - and will easily take as long as so many other of the steps, which are all taking a long time due to learning curve.

maxmcd · on Dec 30, 2024

Tests are usually embarrassingly parallel. Instead of creating one test db (app_test) create many (app_test_0, app_test_1, ...). Run tests in many threads/processes, db per thread.

This works in a lot of cases. In some cases this might not address your bottleneck.

Also someone should write a real, performant, in-memory postgres storage driver. Then we can all be happy (with pg at least).

pitah1 · on Dec 31, 2024

I've taken a stab at making a solution for it via https://github.com/data-catering/data-caterer. It focuses on making integration tests easier by generating data across batch and real-time data sources, whilst maintaining any relationships across the datasets. You can automatically set it to pick up the schema definition from the metadata in your database to generate data for it. Once your app/job/data consumer(s) use the data, you can run data validations to ensure it runs as expected. Then you can clean up the data at the end (including data pushed to downstream data sources) if run in a shared test environment or locally. All of this runs within 60 seconds.

It also gives you the option of running other types of tests such as load/performance/stress testing via generating larger amounts of data.

theogravity · on Dec 30, 2024

We use Testcontainers (https://testcontainers.com/) in our node.js / vitest / kysely stack. Really fast to spin up a temporary postgres instance, and we use kysely migrations to init and seed the db.

fatbird · on Dec 30, 2024

Not sure if this is still a valid approach, but on a large Django site with a lot of unit tests, the cumulative setup/teardown/reset cycles was killing us. We found that setting each test to be wrapped with a transaction that was aborted on teardown, caused the per test cleanup to drop radically. We also kept a canned database for testing so that running the test suite didn't have a large startup penalty to populate the database with test fixtures. Keeping that ready db between runs also sped things up a lot for devs.

shayonj · on Dec 30, 2024

Doesn’t directly answer your question but at least in Postgres I am curious about UNLOGGED mode and see if it results in faster specs. Trade off being, crash recovery doesn’t work but that’s fine in CI.

There is also something to be said about keeping database transactions atomic (no 3rd party network calls, etc) to keep flakey specs to none. I have some ad hoc thoughts on this, will try to frame it proper in a post.

peterldowns · on Dec 31, 2024

Here's what I've found — https://github.com/peterldowns/pgtestdb#how-do-i-make-it-go-...

If you come up with any better options please let me know so I can update this readme!

peibye · on Dec 30, 2024

Depends on how fast you need them to be. I've had success using docker postgres , set up the same as production, where each test creates a connection to it's own test specific database, or is written to run inside a transaction that can clean up after itself using a shared database. In my experience, slower tests that use a real db almost always save net positive dev time.

3pt14159 · on Dec 30, 2024

Tests are embarrassingly parallel. If you can split up the load for CI then at the limit the slowest response is your slowest running test. I wish more tooling would expose this out of the box, but with some effort it is possible.

dinosaurdynasty · on Dec 30, 2024

https://eradman.com/ephemeralpg/ (for Postgres) plus making sure your test suite is parallel works wonders.

peterldowns · on Dec 31, 2024

This is better than nothing but if you have to run migrations before your database is usable, you want an approach like the one I use in pgtestdb — use one server for all your tests, but give them each different databases by cloning from one that's already set up with all the schemas correctly migrated.

https://github.com/peterldowns/pgtestdb

dinosaurdynasty · on Jan 1, 2025

I did something similar with pg_tmp as well. I also had a lot of tests around migrations, so there was a lot of tests that would 1) ask for a DB of version x 2) insert things into the DB, etc 3) migrate to version y 4) ensure the DB is appropriate (note: this was not a SaaS).

jayd16 · on Dec 30, 2024

Setup tooling to start postgres in a local docker container and point at that. Works very well and there are some libraries that abstract it for you.

I think it was Testcontainers.