Saving endangered YouTube videos
Remember that video that you thought was funny ten years ago but didn’t age well enough to stay on YouTube? I do too and I hate that I can’t find it anymore! To avoid this from happening again, I’ve decided to save my favorites to a playlist and regularly back it up.
I have a short shell script cron
’d to run every morning on a
Scaleway server that saves new additions to
Object Storage.
Downloading videos
Downloading videos is super-simple with youtube-dl:
1youtube-dl -i https://www.youtube.com/playlist?list=XXXXXX
This will download videos to the working directory with filenames generated from
the video’s ID and title. During the last few years I’ve run into various
hiccups that required some fine tuning of youtube-dl
:
- Hitting YouTube API limits: need to space out the downloads over a longer period. I’ve also decided to download the playlist in random order so even if I always get blocked after N requests, I’ll eventually download every video.
- Funky filenames: At some point YouTube decided to allow emojis to appear in
video titles. I don’t want emojis in filenames though. Luckily enough
youtube-dl
has you covered.
Here’s what the download command looks like now:
1youtube-dl \
2 --no-color \
3 --playlist-random \
4 --restrict-filenames \
5 --sleep-interval 9 \
6 -i https://www.youtube.com/playlist?list=XXXXXX \
7 > ../log/yt-backup-`date +%F-%H-%M-%S` 2>&1
Downloading everything over and over?
Will this download every video over and over again every day? Fortunately not! Youtube-dl will skip videos if the output file already exists. So do I need to keep every video in local storage? Keep on reading to find out. 🙂
Uploading videos
Uploading to Object Storage is
really easy. It has an S3-compatible API so you can use the aws
CLI to
interact with it.
To avoid uploading every video every day I could delete everything once uploaded. But it would be re-downloaded the next day then… (see blue box above) So instead of deleting I’m replacing uploaded videos with an empty file of the same name:
1NEW=`find videos -size +0 -type f`
2for f in $NEW; do
3 aws s3 cp "$f" "s3://my-bucket/$f"
4 rm "$f"
5 touch "$f"
6done
Conclusion
With the right tools (youtube-dl + aws CLI) it really is this simple. The full script looks like this:
1#!/bin/bash
2
3# update youtube-dl to adapt to latest YT APIs
4pip3 install --upgrade youtube-dl
5
6# download
7cd /mnt/data/youtube/videos
8/usr/local/bin/youtube-dl \
9 --no-color \
10 --playlist-random \
11 --restrict-filenames \
12 --sleep-interval 9 \
13 -i \
14 https://www.youtube.com/playlist?list=XXXXXX \
15 > ../log/yt-backup-`date +%F-%H-%M-%S` 2>&1
16
17# upload
18cd /mnt/data/youtube
19NEW=`find videos -size +0 -type f`
20for f in $NEW; do
21 aws s3 cp "$f" "s3://my-bucket/$f"
22 rm "$f"
23 touch "$f"
24done
25