NOTE: Since Twitter has shut doen down the free-to-use Twitter API as of 14th Febrary 2023, this will likely be the last version of the parser script. From then on, it will only work if you have previously run the script and have the data cached. I am now working on keeping the Jekyll theme up to date.
Jekyll framework for housing your Twitter archive.
source
.scripts
, run python3 parser.py
._config.yml
to avoid duplicate settings.As noted above, as of February 14th 2023, Twitter has shut down the free-to-use Twitter API. This means that the parser script will no longer be able to download data from the Twitter API as is. However, if you have previously run the parser script, it will have cached the data from the Twitter API, and you will be able to run the parser script again to generate the Jekyll pages.
However, if you are lucky enough to have paid access to the Twitter API, you can run the parser script if you get a session bearer token. The variable name is SESSION_BEARER_TOKEN
, set at line 44 in scripts/parser.py
.
Note that how much data you can download from the Twitter API is limited by your access level. See Twitter's documentation for more information.
_status
is each tweet as a Jekyll page._thread
is each thread of tweets in your Twitter archive as a Jekyll page.media
is each image and video in your Twitter archive.media/avatars
are the avatars of the people you follow and your followers.archive
are indexs for days months and years of your tweets.scripts
is the downloaded data from the Twitter API in the files following.json
followers.json
and profile.json
(Your profile). This is so that you have cached data for future runs of the parser script, and also in case the Twitter API gets shut down. (COUGH!)_data
are the YML files following.yml
and followers.yml
which contain the data of the people you follow and who follow you._config.yml
.The Jekyll pages generated by the parser script contain front matter to allow for flexible use of the data.
All pages have the following front matter:
layout
: The layout of the page.id
: The ID of the tweet. (This is also the format of the filename.)created
: The date of the tweet.year
: The year of the tweet.month
: The month of the tweet.day
: The day of the tweet.original_url
: The URL of the tweet on Twitter.repy_to_url
: (optional) The URL of the tweet this tweet is a reply to.reply_to_names
: (optional) The names of the people this tweet is a reply to.reply_to_id
: (optional) The ID of the tweet this tweet is a reply to if it's a reply to your own tweet.is_in_thread
: (optional) If the tweet is in a thread, this is set to true.thread_id
: (optional) The ID of the thread the tweet is in.id
: The ID of the thread. (This is also the format of the filename.)tweets
: The IDs of the tweets in the thread, seperated by a space.The YML files in the folder _data
contain the data of the people you follow and who follow you. The format in both followers.yaml
and following.yaml
is as follows:
handle
: The handle of the Twitter user.name
: The screen name of the Twitter user.url
: The URL of the Twitter user's profile.avatar
: (optional) The filename of the Twitter user's avatar if they have one, and you have instructed the parser script to download user's avatars. Any downloaded avatars are stored in the folder media/avatars
.Once your have run the parser script, you can make a copy of this project for your use, and use the Norwegian Blue framework to generate a static website from your Twitter archive.
To generate a static website from your project, go to the root directory of the project and type the command:
bundle exec jekyll build --verbose --config _config.yml,_config.dev.yml
WARNING! Unless you have been very quiet on Twitter, this site generation process will take a LONG time. This is why I put --verbose
in the command, so you can see what's going on, and you don't think your computer has crashed.
Once finished generated site will be in the folder _site
.
You might want to add your Jekyll project to a repository. To do this, you need to:
.git
from the project..gitignore
, as it hides all the generated posts and downloaded media from git. Of the lines to keep in this file, you will need to keep _site
and source
.I should note that, with the amount of status posts combined with the amount of media downloaded, that this project will tend to be very large. If you're thinking of hosting this repository on the likes of GitHub, never mind hosting it on GitHub pages, I would caution against it, or at least consider using Git LFS to store the media files for the project, as standard GitHub projects have a soft limit of 1GB. (There is also a limitation on file size of 100MB for a file, and a warning for files of 50MB or over, which could happen with, say, video files.) Even if the resulting project is less than that, it will still be really pushing it to host it on GitHub pages. There's a good discussion of GitHub storage limits here.
Another issue is that if you have a repository host that can handle this size of project, it's going to be difficult to cope with the sheer amount of files on the first push. It might better to either break down the push into smaller steps, or host it yourself on your own server if you have one, as in that case, you could FTP the whole project to the server, set up a Git service on there, and then do a git init
on your project in situ.
As noted above, it takes a long time to generate a site from your project, which makes developing the site difficult. To make this easier, I've found that having a copy of the folders _status
and _thread
with only a small amount of posts in them makes it much easier, and when I'm done I can slot the original folders back in. Obviously make sure you have made a copy of these folders, and not just moved them, as you will need the originals to generate the site.
The folllowing layouts are provided:
base.html
: The base layout for all pages.compress.html
: A utility layout to compress the HTML output.single-tweet.html
: The layout for a single tweet.thread.html
: The layout for a thread of tweets.tweetlist.html
: The layout for a list of tweets.nb_day_index.html
: The layout for the index of tweets for a day.nb_month_index.html
: The layout for the index of tweets for a month.nb_year_index.html
: The layout for the index of tweets for a year.tweet.html
: HTML for a single tweet.twitter-user.html
: HTML for a Twitter user profile.