NOTICE: This wiki will be down for maintenance on Tuesday May 22. For more information, please see http://opensourceecology.org/wiki/CHG-2018-05-22

OSE Server

From Open Source Ecology
Jump to: navigation, search

Introduction[edit]

The OSE Server is a critical piece of the OSE Development Stack - thus making the (1) OSE Software Stack and the OSE Server Stack the 2 critical components of OSE's development infrastructure.

Uptime & Status Checks[edit]

If you think one of the OSE websites or services may be offline, you can verify their status at the following site:

* http://status.opensourceecology.org/

Note that this URL is just a convenient CNAME to uptime.statuscake.com, which is configured to redirect our CNAME to our Public Reporting Dashboard here:

* https://uptime.statuscake.com/?TestID=itmHX7Pfj2

It may be a good idea to bookmark the above URL in the event that our site goes down, possibly including DNS issues preventing the CNAME redirect from status.opensourceecolgy.org

Note that Statuscake also tracks uptime over months, and can send monthly uptime reports, as well as immediate email alerts when the site(s) go down. If you'd like to receive such alerts, contact the OSE System Administrator.

Adding Statuscake Checks[edit]

To modify our statuscake checks, you should login to the statuscake website using the credentials stored in keepass.

If you want the test to be public (appearing on http://status.opensourceecology.org), you should add it by editing the Public Reporting Dashboard.

OSE Server Management[edit]

Working Doc - edit

Working Presentation -

edit

2016[edit]

Ordered with CentOS 7.2, and installing Webmin for server admin.

New OSE server, June 2016. Older server had 4 GB RAM compared to the 64 GB here.
Filezilla login and directory structure on Hetzner 2016. September 2016.

Assessment of Server Options[edit]

  • 6/16 setup on Hetzner 2011 is shit and needs updating - AMD Athlon 64 X2 5600+ Processor, 4 GB RAM, 2x 400 GB Harddisks, 1 Gbit/s Connection
  • Main figure of merit - RAM space - which is how many pages it can store in memory before having to use hard disks - where RAM access is instantaneous, and hard disk access is slow.

Proposed Solution[edit]

  • Upgrade hardware/plan on Hetzner
  • Document sysadmin to do sysadmin in house

Provisioning[edit]

OSE does not use any provisioning software to manage, for example, the users/packages/files on our server. This is intentional.

As of 2018, we have no need to scale beyond 1 server; this makes both the benefits & complexity of a load balancer & stateless web servers that can be spun-up as needed (which is the best use-case for provisioning solutions) irrelevant.

The biggest con of not using provisioning tool is that rebuilding our server from backups after catastrophic failure is an annoyingly manual & time-consuming process. However, with our current architecture, the reality is that--if we were to put our configs in a provisioning tool--it would be just as manual & time-consuming (if not worse!). This is because of config rot. Unless nodes are actively being destroyed & launched with the provisioning tool, there will end up being changes made to the node directly, which will not be checked-into the provisioning tool. Unfortunately, this configuration drift is highly likely in a small nonprofit organization with sysadmins coming & going and when managing a single server that is never destroyed & re-provisioned.

I (Michael Altfield) am very familiar with provisioning tools. I've written one from scratch. I've used Puppet, Chef, Ansible, etc. I love them. But the inevitable config rot/drift described above would mean that use of a provisioning tool would make our maintenance *more* complex, not less.

Therefore, the source of truth for our server's users/packages/files is our backups.

If our server experiences a catastrophic failure requiring a rebuild, the restore will necessarily be time-consuming (taking maybe a few days of work), but the data will be in exactly 1 trustworthy place. This is better than trying to restore from provisioning files, finding that things are broken because some files were missing (or different because someone just commented-out the puppet cron to "make it work") from the provisioned configs, trying to diff the backups from the provisioner's files, and then just giving up & going with the backups anyway.

If we get to the point where we actually autoscale stateless servers behind a load balancer, and we can ensure that our stateless servers are being intentionally destroyed & rebuilt at least a few times per week to prevent provisioning config rot/drift, then we *should* use a provisioning tool.

In the meantime, rebuilding our server after catastrophic failure means:

  1. Downloading the most recent full backup of our server (Hopefully nightlies are available. Maybe we have to fall-back on our once-a-month backups stored to Glacier)
  2. Installing a fresh server with the OS matching the previous server's OS (ie: CentOS), perhaps using a newer version
  3. Installing the packages needed on the new server
  4. Copying the config files from the backups to the new server
  5. Copying & restoring the db contents to the new server
  6. Copying & restoring the web roots to the new server
  7. Test, fix, reboot, etc. Until it can reboot & work as expected.

Here are some package hints that you'll want to ensure are installed (probably in this order). Be sure to grep the backups for their config files, and restore the configs. But, again, this doc itself is going to rot; the source-of-truth is the backups.

  1. sshd
  2. iptables
  3. OSSEC
  4. our backup scripts
  5. crond
  6. mariadb
  7. certbot (let's encrypt)
  8. nginx
  9. varnish
  10. php
  11. apcu
  12. httpd (apache)
  13. logrotated
  14. awstats
  15. munin

Don't forget to test & verify backups are working!

SSH[edit]

Our server has ssh access. If you require access to ssh, contact the OSE System Administrator with subject "ssh access request," and include the following information in the body of the email:

  1. An explanation as to why you need ssh access
  2. What you need access to
  3. Provide a link to a portfolio of prior experience working with linux over command line that demonstrates your experience & competency using the command line safely
  4. Provide a few references for previous work in which you had experience working with linux over command line

Add new users[edit]

The following steps will add a new user to the OSE Server.

First, create the new user. Generate & set a temporary, 100-character, random, alpha-numeric password for the user.

useradd <new_username>
passwd <new_username>

Only if it's necessary, send this password to the user through a confidential/encrypted medium (ie: the Wire app). They would need it if they want to reset their password. Note that they will not be able to authenticate with their password over ssh, and this is intentional. In fact, it is unlikely they will need their password at all, unless perhaps they will require sudo access. For this reason, it's best to set this password "just in case," not save it, and not send it to the user--it's more likely to confuse them. If they need their password for some reason in the future, you can reset it to a new random password in the future as the root user, and send it to them over an encrypted medium.

If the user needs ssh access, add them to the 'sshaccess' group.

gpasswd -a <new_username> sshaccess

Have the user generate a strong rsa keypair using the following command. Make sure they have it encrypted with a strong passphrase--to ensure they have 2FA. Then have them send you their new public key. The following commands should be run on the new user's computer, not the server:

ssh-keygen -t rsa -b 4096 -o -a 100
cat /home/<username>/.ssh/id_rsa.pub

The output from the `cat` command above is their public key. Have them send this to you. They can use an insecure medium such as email, as there is no reason to keep the public key confidential. They should never, ever send their private key (/home/<username>/.ssh/id_rsa) to anyone. Moreover, the private key should not be copied to any other computer, except in an encrypted backup. Note this means that the user should not copy their private key to OSE servers--that's what ssh agents are for.

Now, add the ssh public key provided by the user to their authorized_keys file on the OSE Server, and set the permissions:

cd /home/<new_username>
mkdir /home/<new_username>/.ssh
vim /home/<new_username>/.ssh/authorized_keys
chown -R <new_username>:<new_username> /home/<new_username>/.ssh
chmod 700 /home/<new_username>/.ssh
chmod 644 /home/<new_username>/.ssh/authorized_keys

If the user needs sudo permissions, edit the sudoers file. This should only be done in very, very, very rare cases for users who have >5 years of experience working as a Linux Systems Administrator. Users with sudo access must be able to demonstrate a very high level of trust, experience, and competence working on the command line in a linux environment.

Backups[edit]

We actively backup our server's data on a daily basis.

Important Files & Directories[edit]

The following files/directories are related to the daily backup process:

  1. /root/backups/backup.sh This is the script that preforms the backups
  2. /root/backups/sync/ This is where backup files are stored before they're rsync'd to the storage server. '/root/backups/sync*' is explicitly excluded from backups itself to prevent a recursive nightmare.
  3. /root/backups/sync.old/ This is where the files from the previous backup are stored; they're deleted by the backup script at the beginning of a new backup, and replaced by the files from 'sync'
  4. /root/backups/backup.settings This holds important variables for the backup script. Note that this file should be on heavy lockdown, as it contains critical credentials (passwords).
  5. /etc/cron.d/backup_to_dreamhost This file tells the cron server to execute the backup script at 07:20 UTC, which is roughly midnight in North America--a time of low traffic for the OSE Server
  6. /var/log/backups/backup.log The backup script logs to this file
  7. /root/.ssh/id_rsa The private ssh key used to rsync files to the dreamhost server. This file should be on lockdown, as it's a critical credential that allows read/write access to our dreamhost server over ssh.

What's backed-up[edit]

Here is what is being backed-up:

  1. mysqldump of all databases
  2. all files in /etc/*
  3. all files in /home/*
  4. all files in /var/log/*
  5. all files in /root/* (except the 'backups/sync*' dirs)
  6. all files in /var/www/*

Backup Server[edit]

As a nonprofit, we're eligible for "unlimited" storage account with dreamhost. Therefore, we rsync our backup files to our dreamhost server at the end of the backup script.

Note that we don't actually have unlimited storage on this server, and archives of TBs of data would surely be a violation of their policy. Therefore, we should be respectful of this free service & keep our total usage below 500G.

The following files/directories are related to the daily backup process on the backup server:

  1. /home/marcin_ose/backups/hetzner2/ This directory holds a set of dirs that are timestamped & hold the contents of the 'sync' directory from the hetzner2 server
  2. /home/marcin_ose/bin/cleanLocal.pl This script deletes files older than a specified age from a specified directory
  3. /home/marcin_ose/logs/cleanBackups.log This is the log file that cleanLocal.pl writes to
  4. /home/marcin_ose/.ssh/authorized_keys This file lists the public key as found in /root/.ssh/id_rsa.pub on the hetzner 2 server, and permits the backup script to write files to the dreamhost server over ssh (rsync).

Because we don't have root access to the dreamhost backup server, the cron responsible for deleting old backups is stored in the crontab. Execute `crontab -l` to see the cron config.

Note that the cleanLocal.pl script does *not* delete backup files that were created on the 1st of every month. These should periodically be cleared out manually, if space becomes an issue. Otherwise, cron is configured to call cleanLocal.pl to preserve backups for 3 days back, deleting files older than this.

Restore from Glacier[edit]

We use Amazon Glacier for cheap long-term backups. Glacier is one of the cheapest options to store about a TB of data, but also can be very difficult to use. And data retrieval costs are high.

Glacier has no notion of files & dirs. Archives are uploaded to Glacier into vaults. Archives are identified by a long UID & a description. At OSE, we use the tool 'glacier-cli' to simplify large uploads; this tool uses the description field as a file name. For each tarball, I uploaded a cooresponding metadata text file that lists all the files that were uploaded (this should save costs if someone doesn't know which archive to download, since the metadata file is significantly smaller than the tarball archive itself).

Archives >4G require splitting into multiple parts & providing the API with a tree checksum of the parts. This is a very nontrivial process, and most of our backups are >4G. Therefore, we use the tool glacier-cli, which does most of this tedious work for you.

Install glacier-cli[edit]

If you don't already have this installed (try executing `glacier.py`), you can install the glacier-cli tool as follows

# install glacier-cli prereqs
yum install python-boto python2-iso8601 python-sqlalchemy

# install glacier-cli
mkdir -p /root/sandbox
cd /root/sandbox
git clone git://github.com/basak/glacier-cli.git
cd glacier-cli
chmod +x glacier.py
./glacier.py -h

# create symlink in $PATH
mkdir -p /root/bin
cd /root/bin
ln -s /root/sandbox/glacier-cli/glacier.py

Sync Vault Contents[edit]

The AWS console will show you the vaults you have, the number of archvies it has, and the total size in bytes. It does *not* show you the archives you have in your vault (ie: their IDs, descriptions, & individual sizes). In order to get this, you have to pay (and wait ~4 hours) for an inventory job. glacier-cli keeps a local copy of this inventory data, but--if you haven't updated it recently--you should probably refresh it anyway. Here's how:

# set creds (check keepass for 'ose-backups-cron')
export AWS_ACCESS_KEY_ID='CHANGEME'
export AWS_SECRET_ACCESS_KEY='CHANGEME'

# query glacier to get an up-to-date inventory of the given vault (this will take ~4 hours to complete)
# note: to determine the vault name, it's best to check the aws console
glacier.py --region us-west-2 vault sync --max-age=0 --wait <vaultName>

# now list the contents of the vault
glacier.py --region us-west-2 archive list <vaultName>

Restore Archives[edit]

The glacier-cli tool uses the archive description as the file name. You cannot restore by the archive id using glacier-cli. Here's how to restore by the "name" of the archive:

# create tmp dir (make sure not to download big files into dirs that are themselves being backed-up daily!)
stamp=`date +%Y%m%d_%T`
tmpDir=/var/tmp/glacierRestore.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir

# download the encrypted archive
time glacier.py --region us-west-2 archive retrieve --wait <vaultName> <archive1> <archive2> ...

The above command will take many hours to complete. When it does, the file(s) will be present in your cwd.

Decrypt Archive Contents[edit]

OSE's backup data holds very sensitive content (ie; passwords, logs, etc), so they're encrypted before being uploaded to 3rd parties.

Use gpg and the 4K 'ose-backups-cron.key' keyfile (which can be found in keepass) to decrypt this data as follows:

Note: Depending on the version of `gpg` installed, you may need to omit the '--batch' option.

[[email protected] glacierRestore]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --output hetzner1_20170901-052001.fileList.txt.bz2 --decrypt hetzner1_20170901-052001.fileList.txt.bz2.gpg
gpg: AES encrypted data
gpg: encrypted with 1 passphrase
[[email protected] glacierRestore]# 

There should now be a decrypted file. You can extract it to view the contents using `tar`.

https[edit]

In 2017, Michael Altfield migrated OSE sites to use https with Let's Encrypt certificates.

Nginx's https config was hardened using Mozilla's ssl-config-generator and the Qualys ssllabs.com SSL Server Test.

For more information on our https configuration, see Web server configuration#Nginx

Keepass[edit]

Whenever possible, we should utilize per-user credentials for logins so there is a user-specific audit trail and we have user-specific authorization-revocation abilities. However, where this is not possible, we should store usernames & passwords that our OSE Server infrastructure depends on in a secure & shared location. At OSE, we store such passwords in an encrypted keepass database that lives on the server.

passwords.kdbx file[edit]

The passwords.kdbx file is encrypted; if an attacker obtains this file, they will not be able to access any useful information. That said, we keep it in a central location on the OSE Server behind lock & key for a few reasons:

  1. The OSE Server already has nightly backups, so keeping the passwords.kdbx on the server simplifies maintenance by reusing existing backup procedures for the keepass file
  2. By keeping the file in a central location & updating it with sshfs, we can prevent forks & merges of per-person keepass files, which would complicate maintenance. Note that writes to this file are extremely rare, so multi-user access to the same file is greatly simplified.
  3. The keepass file is available on a need-to-have basis to those with ssh authorization that have been added to the 'keepass' group.

The passwords.kdbx file should be owned by the user 'root' and the group 'keepass'. It should have the file permissions of 660 (such that it can be read & written by 'root' and users in the 'keepass' group, but not accessible in any way from anyone else).

The passwords.kdbx file should exist in a directory '/etc/keepass', which is owned by the user 'root' and the group 'keepass'. This directory should have permissions 770 (such that it can be read, written, & executed by 'root' and users in the 'keepass' group, but not accessible in any way from anyone else).

Users should not store a copy of the passwords.kdbx file on their local machines. This file should only exist on the OSE Server (and therefore also in backups).

Unlocking passwords.kdbx[edit]

In order to unlock the passwords.kdbx file, you need

  1. Keepass software on your personal computer capable of reading Keepass 2.x DB files
  2. sshfs installed on your personal computer
  3. ssh access to the OSE Server with a user account added to the 'keepass' group
  4. the keepass db password
  5. the keepass db key file

Note that the "Transform rounds" has been tuned to '87654321', which makes the unlock process take ~5 seconds. This also significantly decreases the effectiveness of brute-forcing the keys if an attacker obtains the passwords.kdbx file.

KeePassX[edit]

OSE Devs are recommended to use a linux personal computer. In this case, we recommend using the KeePassX client, which can be installed using the following command:

sudo apt-get install keepassx

sshfs[edit]

OSE Devs are recommended to use a linux personal computer. In this case, sshfs can be installed using the following command:

sudo apt-get install sshfs

You can now create a local directory on your personal computer where you can mount directories on the OSE Server locally on your personal computer's filesystem. We'll also store your personal keepass file & the ose passwords key file in '$HOME/keepass', so let's lock down the permisions as well:

mkdir -p $HOME/keepass/mnt/ose
chown -R `whoami`:`whoami` $HOME/keepass
find $HOME/keepass/ -type d -exec chmod 700 {} \;
find $HOME/keepass/ -type f -exec chmod 600 {} \;

ssh access[edit]

If you're working on a task that requires access to the passwords.kdbx file, you'll need to present the case to & request authorization from the OSE System Administrator asking for ssh access with a user that's been added to the 'keepass' group. Send an email to the OSE System Administrator explaining

  1. Why you require access to the OSE passwords.kdbx file and
  2. Why you can be trusted with all these credentials.

The System Administrator with root access can execute the following command on the OSE Server to add a user to the 'keepass' group:

gpasswd -a <username> keepass

Once you have an ssh user in the 'keepass' group on the OSE Server, you can mount the passwords.kdbx file to your personal computer's filesystem with the following command:

sshfs -p 32415 <username>@opensourceecology.org:/etc/keepass $HOME/keepass/mnt/ose

keepass db password[edit]

OSE Devs are recommended to use a linux personal computer & store their personal OSE-related usernames & passwords in a personal password manager, such as KeePassX.

If you don't already have one, open KeePassX and create your own personal keepass db file. Save it to '$HOME/keepass/keepass.kdbx'. Be sure to use a long, secure passphrase.

After being granted access to the OSE shared keepass file from the OSE System Administrator, they will establish a secure channel with you to send you the keepass db password, which is a long, randomly generated string. When you receive this password, you should store it in your personal keepass db.

This password, along with the key file, is a key to unlocking the encrypted passwords.kdbx file. You should use extreme caution to ensure that this string is kept secret & secure. Never give it to anyone through an unencrypted channel, write it down, or save it to an unencrypted file.

keepass db key file[edit]

After being granted access to the OSE shared keepass file from the OSE System Administrator, they will establish a secure channel with you to send you the keepass db key file, which is a randomly generated 4096 byte file.

This key file is the most important key to unlocking the encrpted passwords.kdbx file. You should use extreme caution to ensure that this file is kept secret & secure. Never give this key file to anyone through an unencrypted channel, save it on an unencrypted storage medium, or keep it on the same disk as the passwords.kdbx file.

This key file should never be stored or backed-up in the same location as the passwords.kdbx file. It would be a good idea to store it on an external USB drive kept in a safe, rather than keeping it stored on your computer.

TODO[edit]

As of 2017-06, the goal in the next few months is to migrate all services off of Hetzner 1, and terminate our Hetzner 1 plan entirely. The following is a set of tasks to reach this goal:

  1. Backups
  2. Harden SSH
  3. Document how to add ssh users to Hetzner 2
  4. Statuscake
  5. Awstats
  6. OSSEC
  7. Harden Apache
  8. Harden PHP
  9. Harden Mysql
  10. iptables
  11. Let's Encrypt for OBI
  12. Organize & Harden Wordpress for OBI
  13. Qualys SSL labs validation && tweaking
  14. Varnish Cache
  15. Disable Cloudflare
  16. Fine-tune Wiki config
  17. time series data graphs (RRDTool? Cacti?)
  18. Keepass solution + documentation
  19. Migrate forum to hetzner2
  20. Migrate oswh to hetzner2
  21. Migrate fef to hetzner2
  22. Migrate wiki to hetzner2
  23. Migrate osemain to hetzner2
  24. Block hetzner1 traffic to all services (though easily revertible)
  25. Harden forum
  26. Harden oswh
  27. Harden fef
  28. Harden osemain
  29. Harden wiki
  30. End Hetzner1 contract
  31. Install Jitsi Videobridge
  32. Better alert (nagios?). At least email alerts when backup files haven't been written to the backup server in >=48 hours.
  33. Update backup solution to use duplicity
  34. LibreOffice Online (CODE) POC

Links[edit]