Amazon Glacier

From Open Source Ecology
Jump to: navigation, search

OSE briefly used Amazon Glacier to store some old backups of our data when Dreamhost notified us on 2018-03-20 that we were violating their ulimited storage policy by storing backups on their servers.

In 2019, we left Amazon Glacier for Backblaze for the following reasons:

  1. Backblaze is cheaper after considering that Glacier has minimum archive retention requirements in their fine-print
  2. Backblaze is way, way easier to use

Actual Storage Quotas and Costs

Account

  • ops at opensourceecology.org

Restore from Glacier

We use Amazon Glacier for cheap long-term backups. Glacier is one of the cheapest options to store about a TB of data, but also can be very difficult to use. And data retrieval costs are high.

Glacier has no notion of files & dirs. Archives are uploaded to Glacier into vaults. Archives are identified by a long UID & a description. At OSE, we use the tool 'glacier-cli' to simplify large uploads; this tool uses the description field as a file name. For each tarball, I uploaded a cooresponding metadata text file that lists all the files that were uploaded (this should save costs if someone doesn't know which archive to download, since the metadata file is significantly smaller than the tarball archive itself).

Archives >4G require splitting into multiple parts & providing the API with a tree checksum of the parts. This is a very nontrivial process, and most of our backups are >4G. Therefore, we use the tool glacier-cli, which does most of this tedious work for you.

Install glacier-cli

If you don't already have this installed (try executing `glacier.py`), you can install the glacier-cli tool as follows

# install glacier-cli prereqs
yum install python-boto python2-iso8601 python-sqlalchemy

# install glacier-cli
mkdir -p /root/sandbox
cd /root/sandbox
git clone git://github.com/basak/glacier-cli.git
cd glacier-cli
chmod +x glacier.py
./glacier.py -h

# create symlink in $PATH
mkdir -p /root/bin
cd /root/bin
ln -s /root/sandbox/glacier-cli/glacier.py

Sync Vault Contents

The AWS console will show you the vaults you have, the number of archvies it has, and the total size in bytes. It does *not* show you the archives you have in your vault (ie: their IDs, descriptions, & individual sizes). In order to get this, you have to pay (and wait ~4 hours) for an inventory job. glacier-cli keeps a local copy of this inventory data, but--if you haven't updated it recently--you should probably refresh it anyway. Here's how:

# set creds (check keepass for 'ose-backups-cron')
export AWS_ACCESS_KEY_ID='CHANGEME'
export AWS_SECRET_ACCESS_KEY='CHANGEME'

# query glacier to get an up-to-date inventory of the given vault (this will take ~4 hours to complete)
# note: to determine the vault name, it's best to check the aws console
glacier.py --region us-west-2 vault sync --max-age=0 --wait <vaultName>

# now list the contents of the vault
glacier.py --region us-west-2 archive list <vaultName>

Restore Archives

The glacier-cli tool uses the archive description as the file name. You cannot restore by the archive id using glacier-cli. Here's how to restore by the "name" of the archive:

# create tmp dir (make sure not to download big files into dirs that are themselves being backed-up daily!)
stamp=`date +%Y%m%d_%T`
tmpDir=/var/tmp/glacierRestore.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir

# download the encrypted archive
time glacier.py --region us-west-2 archive retrieve --wait <vaultName> <archive1> <archive2> ...

The above command will take many hours to complete. When it does, the file(s) will be present in your cwd.

Decrypt Archive Contents

OSE's backup data holds very sensitive content (ie; passwords, logs, etc), so they're encrypted before being uploaded to 3rd parties.

Use gpg and the 4K 'ose-backups-cron.key' keyfile (which can be found in keepass) to decrypt this data as follows:

Note: Depending on the version of `gpg` installed, you may need to omit the '--batch' option.

[root@hetzner2 glacierRestore]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --output hetzner1_20170901-052001.fileList.txt.bz2 --decrypt hetzner1_20170901-052001.fileList.txt.bz2.gpg
gpg: AES encrypted data
gpg: encrypted with 1 passphrase
[root@hetzner2 glacierRestore]# 

There should now be a decrypted file. You can extract it to view the contents using `tar`.

Hetzner 1

On 2018-07-06, we deprecated our managed hosting hetzner1 server, replacing it with hetzner2, a dedicated server with root access that had more resources _and_ cost less per month.

All of the files from hetzner1 were uploaded to Glacier for safe long-term storage in-case they ever need to be recovered.

Support

Awssupport.jpg

Page Maintainers

See Also