Saving endangered YouTube videos

2021-07-20

2 minute read

Remember that video that you thought was funny ten years ago but didn’t age well enough to stay on YouTube? I do too and I hate that I can’t find it anymore! To avoid this from happening again, I’ve decided to save my favorites to a playlist and regularly back it up.

I have a short shell script cron’d to run every morning on a Scaleway server that saves new additions to Object Storage.

Downloading videos

Downloading videos is super-simple with youtube-dl:

1youtube-dl -i https://www.youtube.com/playlist?list=XXXXXX

This will download videos to the working directory with filenames generated from the video’s ID and title. During the last few years I’ve run into various hiccups that required some fine tuning of youtube-dl:

Hitting YouTube API limits: need to space out the downloads over a longer period. I’ve also decided to download the playlist in random order so even if I always get blocked after N requests, I’ll eventually download every video.
Funky filenames: At some point YouTube decided to allow emojis to appear in video titles. I don’t want emojis in filenames though. Luckily enough youtube-dl has you covered.

Here’s what the download command looks like now:

1youtube-dl \
2        --no-color \
3        --playlist-random \
4        --restrict-filenames \
5        --sleep-interval 9 \
6        -i https://www.youtube.com/playlist?list=XXXXXX \
7        > ../log/yt-backup-`date +%F-%H-%M-%S` 2>&1

Downloading everything over and over?
Will this download every video over and over again every day? Fortunately not! Youtube-dl will skip videos if the output file already exists. So do I need to keep every video in local storage? Keep on reading to find out. 🙂

Uploading videos

Uploading to Object Storage is really easy. It has an S3-compatible API so you can use the aws CLI to interact with it.

To avoid uploading every video every day I could delete everything once uploaded. But it would be re-downloaded the next day then… (see blue box above) So instead of deleting I’m replacing uploaded videos with an empty file of the same name:

1NEW=`find videos -size +0 -type f`
2for f in $NEW; do
3    aws s3 cp "$f" "s3://my-bucket/$f"
4    rm "$f"
5    touch "$f"
6done

Conclusion

With the right tools (youtube-dl + aws CLI) it really is this simple. The full script looks like this:

 1#!/bin/bash
 2
 3# update youtube-dl to adapt to latest YT APIs
 4pip3 install --upgrade youtube-dl
 5
 6# download
 7cd /mnt/data/youtube/videos
 8/usr/local/bin/youtube-dl \
 9        --no-color \
10        --playlist-random \
11        --restrict-filenames \
12        --sleep-interval 9 \
13        -i \
14        https://www.youtube.com/playlist?list=XXXXXX \
15        > ../log/yt-backup-`date +%F-%H-%M-%S` 2>&1
16
17# upload
18cd /mnt/data/youtube
19NEW=`find videos -size +0 -type f`
20for f in $NEW; do
21        aws s3 cp "$f" "s3://my-bucket/$f"
22        rm "$f"
23        touch "$f"
24done