Very understandable and valid. I find that Prometheus’ query language makes a lot of sense to me, so, I like it. Have you tried Cacti or Nagios?
Very understandable and valid. I find that Prometheus’ query language makes a lot of sense to me, so, I like it. Have you tried Cacti or Nagios?
What about switching to Prometheus for metrics and snagging some premade dashboards in Grafana? Since it’s pull-based, up
is a freebie, especially if you expose the node_exporter via your reverse proxy.
Midriff and motorcycles don’t mix. ATGATT.
The only reason that I tend to use it is because of the included webserver. It’s not bad but the paywalling of functionality needed for it to be a proper LB left a bad taste in my mouth. That and HAProxy blows out of the water in all tests that I’ve done over the years where availability is at all a concern. HAProxy also is much more useful when routing TCP.
Honestly, from your description, I’d go with Debian, likely with btrfs. Would be better if you had 3 slots so that you can swap a bad drive but, 2 will work.
If you want to get adventurous, you can see about a Fedora Atomic distro.
Previously, I’ve recommended Proxmox but, not sure that I still can at the moment, if they haven’t fixed their kernel funkiness. Right now, I’m back to libvirt.
Hope they don’t start gutting budgets, loading up the balance sheet with debt, and siphoning off profits.
Oh, they will.
Bitte. Mein Deutschverständnis ist nicht so gut. Was ist “TechnikNein” auf Englisch? Ist das “techno” musik?
Try a traceroute to something like 9.9.9.9 and google’s IP. You’re able to resolve things ok. So, not DNS. Need to find out where the traffic is going to die.
Also, try a curl https://google.com -vvv
. This should give some more info on what is happening to TCP traffic.
Why not turn it into a Desktop app via Electron or a similar solution?
This is a feature for me that makes me inclined to try it. I really don’t like Electron.
This right here. Just found out about this last week after a long debug.
If you don’t need UI, I prefer Podman. Rancher Desktop is good though.
I’ve found that a single database service uses less total resources (especially memory) than running separate DB stacks for each service.
This should indeed be the expected result. Each DB server will have a set amount of overhead from the runtime before data overhead comes in. Ex (made up numbers):
storage subsystem=256MB
config subsystem=128MB
auth subsystem=280MB
api subsystem=512MB
user tables=xMB
The subsystem resource usage would be incurred by every instance of the DB server. Additionally, you have platform-level overhead, especially if you are running as VMs or containers as that requires additional resources to coordinate with the kernel, etc.
It’s very much like micro-kernels vs monoliths. On the surface a lean micro-kernel seems like it should be more performant since less is happening during kernel time but, the significant increase in operations to perform basic tasks. For example, if storage access was in userspace, an application would need to call back to the kernel to request communication, which would need to call up to the storage driver, then back… and it becomes a death of a thousand cuts. In a monolithic kernel, the application just tells the kernel that it wants to access storage, what mode, and provide either the input or a buffer receive data.
My recommendation, if practical, is a single, potentially containerized DB server that is backed by storage that provides high availability and redundancy. This is supposing that you are using the same sort of DB (ex SQL, NoSQL, etc) and that you are targeting a smallish number of services that are on-premise.
My reasoning here is that you can treat the DB server effectively as a storage API service and run it via some orchestration service like K8S. This lets you offload your DB stability and data integrity to the FS and/or other low-level stuff that is simple to configure once and only dirty about when hardware fails. This in turn greatly reduces DB server configuration and deployment as well as treat them like livestock, not pets.
Now, if you are using a public cloud provider, my view changes slightly. Generally, I’d suggest offloading the DB to a provided service that is compatible with a FOSS alternative so that you can avoid vendor lock-in. This means that you get the HA, etc without having to worry about maintenance and configuration overhead. Just be aware of cost modeling - it’s easy to run up large bills.
Observability stack?
That makes sense. I like the idea of combining physical key with physical/KVM access so that there is no password auth (at least, not without a second factor).
I’m still trying to cover up with a good one to allow more self-hosting. Probably a SHTF security key kept in a safe that can be used with physical access.
My “plan” is to SSH in and figure out what’s wrong.
The problem here being that you have a circular dependency:
I’d suggest something like Keycloak or earning the wizard robe and beard by buckling down and learning OpenLDAP. The biggest suggestion that I have though is to have a disaster recovery plan for even your auth system goes down. Don’t be like Facebook and lock yourself out without any hope of regaining entry (or, if you’re a fan of Russian Roulette, do).
I maintained a CEPH cluster a few years back. I can verify that speeds under 10GbE will cause a lot of weird issues. Ideally, you’ll even want a dedicated 10GbE purely for CEPH to do its automatic maintenance stuff and not impact storage clients.
The PGs is a separate issue. Each PG is like a disk partition. There’s some funky math and guidelines to calculate the ideal number for each pool, based upon disks, OSDs, capacity, replicas, etc. Basically, more PGs means that there are more (but smaller) places for CEPH to store data. This means that balancing over a larger number of nodes and drives is easier. It also means that there’s more metadata to track. So, really, it’s a bit of a balancing act.