As a test, I ran this on a very early backup of lemm.ee images from when we had very little federation and very little uploads, and unfortunately it is finding a whole bunch of false positives. Just some examples it flagged as CSAM:
- Calvin and Hobbes comic
- The default Lemmy logo
- Some random user’s avatar, which is just a digital drawing of a person’s face
- a Pikachu image
Do you think the parameters of the script should be tuned? I’m happy to test it further on my backup, as I am reasonably certain that it doesn’t contain any actual CSAM
If I have several backends that more or less depend on each other anyway (for example: Lemmy + pict-rs), then I will create separate databases for them within a single postgres - reason being, if something bad happens to the database for one of them, then it affects the other one as well anyway, so there isn’t much to gain from isolating the databases.
Conversely, for completely unrelated services, I will always set up separate postgres instances, for full isolation.