skip to Main Content
How Two Characters Reminded Me Of Fragility

I was performing regularly scheduled maintenance today on one of our client’s servers. Due to the length of time we’ve been building and managing servers for clients our build-outs can differ in slight ways as we evolve our processes.

I had decided that I was going to start shoring in those differences and get each server we manage to a stable and predictable place.

What happened?

In this instance our maintenance user was NOT a member of the adm group. I wanted them to be so that I didn’t have to login as the root user (or sudo it) to read system logs.

The command is supposed to be this: usermod -a -G newgroup username

The -a stands for “append” and the -G sets the supplementary groups for the user.

What I ended up entering was usermod -G newgroup username which removed the users access to all the other groups it had been a part of (root, sudo, staff, admin, etc). I could now read the logs I wanted but I couldn’t do anything else on the system, most importantly administrate the system.

Clarity

This mistake shown a bright light on a glaring misstep that we’ve taken setting up these small servers for clients for years. There was NO way to recover from this error. The root user wasn’t allowed to login to the system remotely leaving sudo the only way to access root privileges. I couldn’t use sudo anymore.

Backups stored on the server were inaccessible due to permissions (we also save them nightly to AWS which saved me here).

The fix

I had to recreate the server from scratch. Using backups from our off-server storage (AWS S3).

This was the only fix for this situation.

Takeaways

This was a glaring business continuity issue for our client. Luckily, in this instance, the site is fairly low traffic currently and hadn’t had any transactions that day since the nightly backup got stored off-site.

I’ve now updated every server we are actively or passively responsible for the server installation to have a password specifically for root. This will allow, in dire situations like this, us to login via our hosting provider’s console. At the very least we can rebuild and regain access for normal operations.

This also helps our clients in the case of Maje Media going *poof*. Since our clients all pay their own hosting bills and give us team access to their accounts to manage we CANNOT be a single point of failure.

The next step is to figure out a way to securely share these credentials with clients in a  way that they’ll remember how to do it if the time comes that they need it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back To Top