twssbot: An intelligent adapting That’s What She Said joke detector

So one of the basic things they hammer you with in a discrete math or statistics course is Bayesian Inference. Now if you are a bored student, you have probably wondered how the heck you can put this stuff to use. Sure “your spam filter uses it blah blah” but if your inbox looks like mine, you probably don’t like your spam filter too much.

But if you’re a The Office fan AND a bored student, now we’re cooking. In a lot of the IRC channels I hang around in, That’s What She Said (twss) jokes are extremely common. In fact, I even have a /twss alias for making them. So… I wonder, can I make a bot that recognizes innuendos? After 2 hours of Python coding, the results speak for themselves:


14:10 < jdong_> blah blah this isn't funny at all
14:11 < jdong_> I hope you aren't coming in here
14:11 < twssbot> jdong_: That's what she said!
14:11 < jdong_> can you make it straighter?
14:11 < jdong_> That's what she said!
14:11 < twssbot> Autolearning (cancel: 'twssbot: wrong'): can you make it
straighter?
14:12 < jdong_> this is easier to train than I thought
14:12 < jdong_> It just comes naturally
14:12 < twssbot> jdong_: That's what she said!

It still needs some training but already it’s showing great promise. You can play with it (twss) in ##friedcpu on irc.freenode.net. Its source is more or less available at bzr branch http://jdong.mit.edu/~jdong/pytwss/. Being a quick hackjob, don’t expect the code to be that clean or setup to be that intuitive. Sorry.

UPDATE: I was asked if this is how MIT students spend their free time. Well I can’t speak on their behalf, but… probably?

How does it work

I won’t cover how Bayesian classification works, as I trust if you remotely care you would have read the Wikipedia page. As it applies to the bot, it was given examples of “that’s what she said” jokes, and examples of ordinary nonsuggestive sentences as a starting point. Ok, fine, admittedly its initial examples were just the line before someone said TWSS in an IRC channel I log (which was 60MB last year…), so I expect 50% of that to be random noise.

In addition, the bot can:

  1. Be commanded to learn a phrase as an innuendo or a normal sentence through a direct command
  2. Be given feedback (yes or no) by others after it made a joke, so it train itself based on feedback.
  3. Detect when someone in the channel says TWSS, and tries to find and learn the joke made.

With these capabilities, I expect it to be able to train and adapt and become better as people correct it more.

Command Reference

The bot has a pretty crappy command set that gets the job done. I will attempt to document it here. All commands MUST be directed at the bot using twssbot: command parameters.

  • twssbot: learn this is long and hard –Train the bot that “this is long and hard” is an innuendo.
  • twssbot: forget this is not funny — Train the bot that “this is not funny” is NOT an innuendo.
  • twssbot: yes — If the bot recently said TWSS, reinforce that the joke it last made was funny.
  • twssbot: no — Opposite of above. Tells the bot the last joke it made was not funny.
  • twssbot: query some funny sentence — Dumps some debugging info about “some funny sentence” as a dictionary of conditional probabilities. For example, its reply {’twss’: 0.39338996689047478, ‘normal’: 0.7999888342822} tells you that it is 40% confident it is an innuendo and 80% confident it is a normal sentence.
  • (not a command): Saying “That’s what she said”,”That’s what he said”, “twss!”, “twhs!”, “(twhs)”, “(twss)” in a channel causes the bot to take a best guess at the sentence someone found funny. The bot will then tell the channel it is auto-learning that expression.
    • If the trigger phrase was prefixed with a nick (i.e. “jdong: twss!”), the bot will ONLY consider things said by the nickname above
    • If it finds no candidate phrases it will tell the channel it didn’t get the joke. You should probably manually train it with the learn command.
  • twssbot: wrong — Tells the bot that its auto-learn guess (see above) is NOT correct; this rolls back the training above.

In addition to commands, the bot also has some tunable parameters:

  • twssbot: threshold 25 — Sets a confidence-margin percentage between 11 and 49 for triggering. In this example, the bot must be 25% MORE confident that a sentence is an innuendo than a normal sentence for it to trigger. If you don’t give a number, it will return the current threshold. Increase this number to reduce false alarms, decrease it for comic relief.
  • twssbot: trigger_length 3 — Sets the minimum length (in words) of a line for it to be processed. Sometimes really short statements (a word or two) set off the bot even though it is not suggestive.

Practical Uses

This bot is silly, pointless, though entertaining. However, I think this same framework for Bayesian classification is really easy to use (see Reverend library link in acknowledgments) and can be  applied in a variety of daily uses:

  • Detecting the language of an article
  •  Detect if something said is on-topic or off-topic for a channel
  • Label incoming e-mails or RSS feeds
  • In IRC, detect trolling users or unusual behavior patterns
  • Making a That’s What He Said bot (kidding!)

Acknowledgements

I’d like to thank the authors of the Reverend library, without whom it would have taken me several more hours to write a (crappier) Bayesian inference library. I’d also like to thank pyircalib for making such an easy to use IRC library. Both of these were ridiculosuly simple to set up and if you ever want to use Bayesian classifiers or a simple python IRC app, I highly recommend these!

Leave a Comment

Name (required)

Mail (will not be published) (required)

Website

Comment