Fetch Stack Exchange posts for publishing on a Jekyll-powered blog.
Usage: ./se2jekyll.rb -s SITE post_id ...
-s, --site SITE Site name
-t, --tags TAG(S) Space-delimited lowercase tags
-h, --help Display this screen
Example (http://meta.puzzling.stackexchange.com/a/3020):
se2jekyll.rb Meta.Puzzling 3020 > site_evaluations.md
All posts to Stack Exchange acquire a
Creative Commons license
that allows republication with
attribution. This
script uses the
Stack Exchange API to get a copy
of a post for publication via Jekyll
with the primary benefit being added
front matter. It's intended
to be mostly automatic, but you might want to put it in the _drafts
folder for revision before publishing.
site
The API site parameter can be extracted from the
API itself
or you can make a guess based on URL of the site. For instance,
Movies & TV's URL is http://movies.stackexchange.com/
so its site
parameter is just movies
. The meta site is Meta.Movies
. Note
that capitalization does not matter to the API, but the string will be used
in the attribution text as-is.
post_id
Every question and answer has a unique (to the site) ID. The second parameter* is that number, which may be found in the URL or by examining the share link at the bottom of a post. You can find your post_ids by your display name or via the API.
tags
If you want to customize tags, this is the option for you. Pass it any number of space-delimited strings like this:
se2jekyll.rb -s stackoverflow 55885729 -t "libcurl curl locked"
If all goes well, a converted version of the post will be sent to
STDOUT
. It will included some Jekyll front matter (tuned to
my blog's configuration),
a short attribution notice, and the body of the post. It's your
responsibility to redirect the output to an appropriate file.
Depending on which Markdown renderer you use, you might find some strangenesses in the HTML output. For instance, Stack Exchange parses two block quotes separated by a blank line as a single block and GitHub Flavored Markdown as two blocks. Many other quirks won't matter too much, but that one is pretty visible to me. The moral of the story is to leave room for edits.
I've tried to fill in sensible values to the front matter. A few quirks to note:
I use the question title as the post title, which is often a reasonable choice. Not everyone has the titling skill, however.
If a title includes a #
, I encode it #
since that usually
begins a comment in the
YAML fron matter block.
I also titles with multiple colons as &colon
for
a similar reason. But
I interpret titles with one colon as being a title and a
subtitle. This might not work for your blog, but it works for
mine. I suspect
this should be optional behavior if anyone besides me uses this
script.
Currently, the default tag ismeta-post
tag, which works for my
blog's tagging system but might not for yours.
The date is set to the creation date of the post on Stack Exchange, not the current date.
I include two custom variables:
license: http://creativecommons.org/licenses/by-sa/3.0/
comments: no
These should not cause problems unless your blog layout has conflicting definitions of these variables.
Feel free to fork my code if these choices don't make sense for your purposes.
There are several things we could look up with an extra API call or two. In particular:
Obtain multiple posts based on some criteria such as author.
Error checking.
Tests.
Maybe make use of OAuth identification somehow.
There's a Ruby library for the Stack Exchange API. Should I use it?
* Technically, you can pass multiple post_id
s. Currently, only
one is really supported since the output is sent to STDOUT
rather than individual files. It's not terrible hard to break
the posts out based on front matter, however, so I left this as
a
hidden option.