Technology Solutions for Everyday Folks

A Twitter Bot for Dad Jokes

About a month ago, I discovered this gem of a tweet:

Obviously, such unique APIs must be "exploited" as it were, so the day after I decided to build a Twitter bot to use this newfound dataset. And so, the Dad Joke Generator Bot was born.

But...Why?

Well, the answer is simple: dad jokes.

The slightly more nuanced answer is that such a thing was also an "excuse" to play around with different angles of the Twitter API than I'd done with my first bot at the end of last year. Specifically, the difference with this bot is that it'd respond to mentions asking for a joke.

And so, I spent an afternoon-and-a-half pulling something together.

The Bot's Mechanics

The first hurdle of sorts I needed to address was the "how." Twitter has a standard API, but also a streaming API from which data can be sourced. At first my Google searches pointed me in the direction of the streaming API, but I very quickly decided against going that route, and for two practical reasons:

  1. The streaming API requires a constant listener/open connection, which I don't really want running on a production host (also monitoring the stream status); but more so
  2. I'd never expect this particular bot to go viral, so the anticipated volume of data/requests/mentions is very low.

Further making this decision more obvious were the additional caveats of data from the stream: field order and presence discrepancies, lack of sorting, and duplicate messages. If for some reason I had a huge volume of tweets to parse, and that demand continued, moving to the streaming API would make some sense. Out of the gate, this route really would break my "as simple as possible, as complex as necessary" mantra.

Dad Joke Data

The fatherhood.gov endpoint in use has a pretty limited (~50) number of dad jokes, and at the time I developed the bot no new jokes had been added since late 2017. I decided to add a bot configuration value to determine the "maximum age" of the dad joke data (one day), and only to pull a refreshed dataset if a new joke request came in after the data file is at least that age. This means there's no more than one request for dad joke data per day, but more realistically one request no more than once daily...and likely several days (or weeks) between requests.

The dad joke data itself is pretty simple JSON. Slightly buried in each array element's data (the attributes) are the setup and punchlines, nicely split out for us to use:

  • field_joke_opener
  • field_joke_response

I really appreciate the thoughtfulness of having these pieces split apart, because it allows whoever is using the data to more easily use it in a way they see fit. Forcing a format (or having an already stitched-together joke) might well cause more work in a bot like this. Simple is good!

Tweet Mentions

The "new" thing with this bot over the first one is the querying of Twitter for mentions. Fortunately, the mentions_timeline.json endpoint accepts a since_id argument, which is the unique identifier of any Tweet. Since the bot was going to have to keep track of some timestamp anyway, the ability to use a tweet id really simplified that part of the bot. Just write out the "last" (most recent) tweet id and use it in subsequent requests.

Looking for Keywords

The bot source monitors tweet mentions including the words dad and/or joke, only responding if those keywords are found. In PHP it's pretty simple to handle that action with stripos (for case-insensitive searching).

One thing I discovered later on "going live" is that I also had to exclude the bot's own mentions (or replies to itself), because the production bot name includes both keywords. While the situation is pretty specific, leaving the bot able to always respond to its own mentions with a joke in essence creates a never-ending reply thread (infinite loop), with a new joke being replied to on each bot code invocation. Helpful for testing, but bad in production.

Generating a Joke, Replying, and Recording State

On a keyword match, the bot refreshes the dad joke data (if it's expired) and pulls a "random" joke from those available. It then formats the joke with the setup and punchline and pushes out the tweet with a mechanism identical to the progress bot, but with one specific addition: the in_reply_to_status_id argument. This is the unique tweet id that we can use for future mention searches.

I got a bit stuck on the OAuth signature for the reply. Since I'm not using a library to generate the OAuth headers, these details matter more than they might for others. The TL;DR: version is that for the signature, remember the arguments must be in alphabetical order. Everything else will result in a signature error, and they are not fun to triage or troubleshoot/debug.

Finally, the "state" is recorded by dropping the last/most recent tweet id in a file.

Basic Bot Operation

Beyond the mechanics, the only real configuration was to determine what a proper response cadence might be. I chose (somewhat arbitrarily) a five minute interval for cron to run the bot code, which does the search, pattern match, joke randomization, and response. Therefore, any mention to the bot account with the keyword(s) will receive a response within five minutes, like such:

So that's it. It was a fun project to fiddle with for a few hours...and in typical fashion it took an afternoon-and-a-half or so to deal with the code bits, and another day or so to get the repo, bot site, and other bits (README, etc.) all buttoned up for show.