This post is aimed at an intermediate programmer. If you have any feedback, hit me up at @BrettWitty on Twitter.

Introduction

Procedural text generation is about creating interesting text at random. You can use it to spice up user interactions, or make it a core creative input to a game.

In Python you can do this straight out of the box with the random library and string interpolation (formatting):

import random

colour = [ 'red', 'green', 'blue' ]
colour_choice = random.choice(colour)

# Using string formatting
text = "The {c} ball bounced down the hall.".format(c=colour_choice)

# Using Python 3.6+ format-strings
text = f"The {colour_choice} ball bounced down the hall."

This is fine, but a little cumbersome, especially if you want cascading or repeating structures. The best way to attack this problem is via “grammars”. Kate Compton‘s Javascript library “Tracery” creates grammars in ways that are extremely easy to author and manipulate, and thus excellent for procedural text generation.

I want to write a little tutorial on using Tracery for your Python programs. I’ll use Allison Parrish‘s great translation of tracery.js to pytracery.

Installing

Installing pytracery is easy with pip:

pip install tracery

Basic Tracery

The foundation of Tracery are “grammars”. Grammars are sets of rules.

A rule is very simple. It says “When you see this symbol, replace it with one of the symbols (randomly chosen).”

We specify Tracery grammars in JSON which are converted into dictionaries in Python. The same JSON would work with the Javascript library.

Our color example above looks like this in JSON:

{
    "colour" : [ "red", "green", "blue" ],

    "text" : [ "The #colour# ball bounced down the hall." ]

}

As a Python dictionary, it is basically identical:

rules = {
    "colour" : [ "red", "green", "blue" ],

    "text" : [ "The #colour# ball bounced down the hall." ]

}

If you have the grammar in a JSON file, you can read it in:

import json

with open('rules.json', 'r') as f:
    rules = json.load(f)

Easy! These few lines are equivalent to the first few lines of our example in the introduction. To generate random text, we have to convert this to Tracery objects and run generate.

import tracery

grammar = tracery.Grammar(rules)
grammar.flatten("#text#")

This loads the rules into a grammar, which can interpret the connections between rules.

Any symbol is given by something surrounded by hash marks like #text#. You don’t put the hash marks on the left side of the rules because it’s understood that it is a symbol. Inside the rules on the right-hand side, you need to highlight what is just text and what is a symbol.

grammar.flatten("#text#") is where all the magic happens. It starts with the symbol #text# and tries to turn it into text with no more symbols, which is called “flattening”.

Step by step our text looks like this during flattening:

  1. #text# - This is what we input into grammar.flatten().
  2. The #colour# ball bounced down the hall. - Rules say replace the thing on the left with one of the ones on the right. Since the text rule only has one option, we choose that.
  3. The red ball bounced down the hall. - There’s still a symbol to flatten so we do the same for the colour rule: replace #colour# with one of the options on the right of the colour rule.

This seems complicated for what was easy to achieve in the intro. But now you can create rules that reference other rules, cascading downwards, or even referring to themselves.

rules = {
    "colour" : [ "red", "green", "blue" ],
    "location" : [ "hall", "road" ]
    "text" : [ "The #colour# ball bounced down the #location#." ]
}

grammar = tracery.Grammar(rules)
grammar.flatten("#text#")

This can produce all six:

  • The red ball bounced down the hall.
  • The blue ball bounced down the hall.
  • The green ball bounced down the hall.
  • The red ball bounced down the road.
  • The blue ball bounced down the road.
  • The green ball bounced down the road.

For self-reference:

rules = {
    'text' : [ "The man was #very#hungry." ],
    'very' : [ 'very #very#', '' ]
}
grammar = tracery.Grammar(rules)
grammar.flatten("#text#")

This can produce:

  • The man was hungry.
  • The man was very hungry.
  • The man was very very hungry.
  • Or a variant with dozens of verys if you are extremely lucky!

If you want to explore how rules interact, check out Kate Compton’s live tutorial.

More advanced Tracery

The input to flatten doesn’t have to be a rule. It can be an arbitrary text string using other symbols:

grammar.flatten('I prefer a #colour# #colour#.')

This can yield I prefer a green blue.

Modifiers

One little trick with rule replacement is that it replaces symbols verbatim. If our #text# example was "#colour# balls bounce down the hall.", then you’ll get "red balls bounce down the hall." We can insist on editing the replaced symbols by “modifiers”.

The modifier follows the symbol like: #colour.capitalize#. If you do this, then the first letter of the replaced symbol will be capitalized.

Other basic and useful modifiers:

  • #animal.s# :: For plurals (cats, foxes)
  • #animal.a# :: For a/an indefinite articles (a tiger, an antelope)
  • #verb.ed# :: For past tense verbs (jumped, tried)

These rules use very simple heuristics that work most of the time, so don’t expect the library to have a nuanced understanding of English.

If you need to chain modifiers, just add it as per usual: #animal.s.capitalize#

To use them in PyTracery:

import tracery
from tracery.modifiers import base_english

rules = { ... }

grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)

Easy! You can add your own modifiers by writing a function that takes in text and parameters and outputs text. For example:

def drunkcase(text, *params):
    import random
    newtext = ""
    for letter in text:
        if random.randint(0,1):
            newtext += letter.upper()
        else:
            newtext += letter.lower()

    return newtext

To add it to the grammar, you need to give it a text name, so you provide a dictionary of { "ruletext" : function }:

my_mods = { "drunk" : drunkcase }

grammar.add_modifiers(my_mods)

And in action:

rules = { "greeting" : [ "Hi all!", "Hi everybody!" ] }
grammar = tracery.Grammar(rules)
grammar.add_modifiers(my_mods)

grammar.flatten("#greeting.drunk#")

This might yield 'hI EvERybODY!' Luckily modifiers happen after symbol substitution, so you can’t break symbols with an errant modifier (since symbols are case-sensitive).

Modifiers can have parameters: #object.colourtext('red')# might produce the text <font color='red'>object</font>. The modifier function just grabs that in the params positional arguments.

Remember also that these rules produce text and very many cool things are in text form: HTML, SVG, Markdown, Emoji, LaTeX… You can enforce the file format through careful use of rules (<html><body>#bodyText#</body></html>) or modifiers.

Of course, modifiers are code and trickier to write than JSON code, so see what you can achieve in basic Tracery before going nuts with modifier.

Storing state

So far Tracery is quite powerful and easy to use, but it is effectively choosing everything at random with no memory of what it has chosen. There is extra syntax available for storing state to make it remember some of its choices.

The format is: [symbol:value]. The value can be some text, or even another symbol. For example:

rules = {
    'story' : "One day #name1# found #name2# and gave them a hug. #name2# liked it.", 
    'name' : [ 'John Smith', 'Anne Jones', 'Thomas First', 'Jane Second'],
    'text' : [ "#[name1:#name#][name2:#name#]story#"]
}

This will produce stories where it has two characters with randomly chosen names name1 and name2, and will then create story with those names used consistently.

When you create state like this, it persists as the grammar goes deeper into the rules. Once it completes the symbol that those states were attached to, they are popped out of knowledge. Furthermore, if the grammar hits the story bit again, it’ll add two new characters onto its knowledge stack until they are popped.

To grok this, I recommend playing around with some grammars that are recursive but specify state. Kate Compton’s tutorial has an example of nested stories that demonstrates this.

Things Tracery can’t (easily) do

Tracery is very powerful, but it has limitations. It can’t easily make decisions like giving a female name to a female character, without you specifying two different paths for rules: #maleName# and #femaleName#.

It also has to make its decisions one step at a time, and commits to them. You can’t encode something like “Create a character with all the information, but make sure it doesn’t conflict with another character!”

Similarly, Tracery can’t easily mess with the probabilities for the random selection. You can fake it by duplicating entries a few times:

{ 'biassed_coin' : [ 'heads', 'tails', 'tails' ] }

But this is a bit messy, and once you have interacting rules, you lose much of your ability to control the probabilities.

Sometimes achieving a certain effect yields some very messy-looking JSON. Try to separate out rules and reuse them - kinda like code brevity.

Tips for Python

I have a few little tips for implementing procedural text generation in Python with tracery.

Multiple files

The first is to split your rules out into separate files, which you combine at run-time. For example, you can have a whole directory of JSON-encoded rules, and read them all in with:

import tracery
import pathlib
import json

rules_dir = pathlib.Path(MY_RULES_DIR)

rules = {}

for rules_file in path.glob("*.json"):
    with open(rules_file, 'r') as f:
        rules.update( json.load(f) )

grammar = tracery.Grammar(rules)

This loads all the rules in all the files in the MY_RULES_DIR and then makes a Tracery grammar. An alternative way to do this is:

import tracery
import pathlib
import json

rules_dir = pathlib.Path(MY_RULES_DIR)

rules = {}

grammar = tracery.Grammar(rules)

for rules_file in path.glob("*.json"):
    with open(rules_file, 'r') as f:
        rules = json.load(f)
        for rule, options in rules.items():
            grammar.push_rules(rule, options)

You might use a variant on this if you need to put logic in between the rule reads, like for using a different language or integrating age-appropriate rules.

In any case, having separate files makes editing and locating errors far easier. It also helps reuse!

Imposing State

Sometimes when we are creating things, we have some state that we want to impose. Say if you were generating a book of poetry, you might want to choose that you are doing an ode now, which specifies the entry point into the grammar you should use. You can’t do this if you are specifying finer details.

We can achieve this with a combination of Tracery and string interpolation.

We can front-load the details by including it in state:

grammar.flatten("#[heroName:{name}][heroAge:{age}]story#".format( name=hero.name, age=hero.age))

This writes the detail directly into the state. If you are tricksy, you don’t have to specify every detail, so long as your hero object puts in a randomly-selected placeholder (hero.get('age', '#age#')) This method might only work with some structures.

We can also clean up details in the end so long as they don’t affect Tracery’s generation by using string interpolation afterwards:

rules = { 'story' : [ 'The {hero.name} was only {hero.age} years old when...' ] }
grammar.flatten("#story#").format(**details)

This generates the text and then has Madlib-style string-formatting entries for you to fill out. The danger with this method is that you have to specify all the values, even if they are not needed. If you are missing elements, it will raise KeyErrors.

One solution is to make a SafeFormatter which will at least yield some text, even if it’s not perfect. Missing values will be replaced with {{{key}}}.

import string

class SafeFormatter(string.Formatter):

    def __init__(self, default='{{{0}}}'):
        self.default=default

    def get_value(self, key, args, kwds):
        if isinstance(key, str):
            return kwds.get(key, self.default.format(key))
        else:
            return string.Formatter.get_value(key, args, kwds)

Creating State

Suppose you are doing a fantasy character generator. You can create their backstory at the same time as creating their stats sheet.

A simple way to do this is create both as one giant text, and using Tracery state as before. Unfortunately this just fills in the stats sheet with tokens from the backstory, or vice versa. While your character might be “strong”, the stats sheet doesn’t know if that means “STR 18” or not.

A more elaborate method is to use custom modifier functions to interface with Python data. The function can look at the text being set, and then induce side-effects. So it can choose that they are strong, which then calls a function which given "strong" as a parameter, sets the character’s strength value.

This might look like:

{
    "story" : [ "... And our hero was especially #adjective.setStats# ...", ],
    "adjective" : [ "strong", "agile", "...", "charismatic" ]
}

And the associated modifier:

class Hero():

    # ...

    def set_stats_from_adjective(self, text, *params):

        if text == "strong":
            self.STR = 18
        elif text == "agile":
            # ...

        return text

This is slightly clunky in that you encode the possibilities both in the JSON and in the Python class, and it muddies the waters on what modifiers do. A better option would be to create extra actions instead of just [statename:state], but that’s beyond the scope of this blog.

Wrap-up

There’s loads of fun things you can do with procedural text generation. Tracery makes it easy, fun and powerful, whilst doing it in Python gives you access to one of the best programming languages around.