Fixing data corruption in a PostgreSQL database running in kubernetes as a non-root container
I am running the Dependency Track application in my homelab microk8s cluster. One day I started seeing error messages on INSERT’s like this:
ERROR: invalid page in block 3693 of relation base/16384/16531
Take note of the final number (16531 in my case), this is the file inode causing a problem. You’ll need this later.
Bitnami PostgreSQL background
But first - a little background on postgresql in the bitnami container:
- superuser account is
postgres
- Doesn’t have a password
- Configured to use md5 to protect passwords
- Can’t login
- What to do?
Finding out what’s wrong
This is how I resolved my problem.
First we need to reconfigure the PostgreSQL server to allow unauthenticated connections.
Unauth’d postgresql connections
WARNING: This will open up your PostgreSQL database for all. Please be careful.
This is a bit heavy handed, but it’s simple and it works.
Even though the pod is running as a non-root user AND I was connected to the pod as the running user (UID 1001) AND the pg_hba.conf
file was owned by root
with 664
permissions, I was still able to update the file with the following sed
command. Perhaps I’m missing something, but I didn’t expect that to work - yet it did. I’m content that it worked, but I’m not happy (and more than a little confused/worried) that it worked.
- Connect to the pod
kubectl --namespace <namespace> exec -ti <pgsql pod> -- /bin/bash
- Update the
pg_hba.conf
file to trust all connections
sed -i 's/md5/trust/g' /opt/bitnami/postgresql/conf/pg_hba.conf
- Now reload postgresql to apply the changes
pg_ctl reload
Now you can connect locally
psql -U <username> <database>
Querying which objects are corrupted
Connect to the database using a local connection from within the PostgreSQL pod
psql -U postgres postgres
- Now to find out which objects are affected (you don’t need to do this, but it is good to know)
SELECT c.oid, n.nspname AS schema, c.relname,
CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'i' THEN 'index'
WHEN 't' THEN 'TOAST table'
WHEN 'm' THEN 'materialized view'
WHEN 'S' THEN 'sequence'
ELSE 'other'
END AS type
FROM pg_class AS c
JOIN pg_namespace AS n ON c.relnamespace = n.oid
WHERE c.relfilenode = <file inode id from error message>;
Fixing the problem
Right, now we know where the problem is, we can start to repair things.
- First - tell the database there’s no damaged pages, so the repair work (and application) can continue
set zero_damaged_pages to on;
This allows you to actually repair (vacuum) the affected objects without throwing the same error again
- Second - vacuum. This does cleanup (as the name implies)
vacuum full verbose analyse;
- Third - reindex things
reindex database <your database>
Bringing it back to normal
Finally, set your pg_hba.conf
file to do md5 auth again
sed -i 's/trust/md5/g' /opt/bitnami/postgresql/conf/pg_hba.conf
pg_ctl reload
Everything should be working just fine again. Assuming you don’t have any other issues :/
References
PostgreSQL Tutorial: Dealing with corrupted blocks - Redrock Postgres
Explore other related articles:
k8s /
postgresql /
data corruption /
Penned by Paul Macdonnell on 2025-04-19
Things do, stuffs get