Friday, December 5, 2014

Arguments

IT is a truth universally acknowledged, that a formal language after its very first drafts must be in want of arguments. But arguments, with the common syntax of Algol-derived languages, would not fit naturally in the system I am developing. I considered using parentheses and the like, but by some trial-and-error the most organic option seemed to be an adaptation of my previous Tzara engine (in turn, "heavily inspired" by the Postmodern Generator).

So, here is the upgraded syntax:



Feature Example Acceptable results for "@test@" (one per line)
Basic literal test: "abc" abc
Sequence test: "a::b::c" a
b
c
In-line sequence test: "[a|b|c]" a
b
c
Literal function test: "@letter@"
letter: "a::b::c"
a
b
c
Declaration test: "@l<<letter@ and @l@"
letter: "a::b::c"
a and a
b and b
c and c
Post-processing on literal test: "text->capital" Text
Post-processing on sequence elements test: "a::b->upper" a
B
Post-processing on in-line sequence elements test: "[A->lower|B] a
B
In-line evaluation for variable naming test: "num @my_n<<num@ @full-{my_n}@"
num: "1::2::3"
full-1: "one"
full-2: "two"
full-3: "three"
number 1 one
number 2 two
number 3 three

The language now needs a finer control of the options for each variable; doing some tests with the Proppian fables, it is immediately clear that:

  • for many variables, equal probabilities do not always result in more fluent text; I will probably need to add an optional way to specify the probability of each item
  • there must be a way to exclude a value from the alternatives; if for example I want to generate something like "the monster was @ugly-synonym@ and @ugly-synonym@", there should be a way to specify that I want two different synonyms for "ugly".
Regarding the language, it is true that my choice for the {...} notation is only oriented by the easiness of coding while exploring a good syntax; any decent parser should accept a nested @...@ (which is what it actually is). However, the language is starting to look "lispy", and maybe I should not reinvent the wheel (and it would be a good excuse to actually learn LISP the proper way).

Thursday, December 4, 2014

The quest for a formal language

As I started to indicate in the previous post, the project I am working demands a formal language of its own. Before I lose my mind implementing in a generic yacc, I need to play with it to have an idea of what I need. Thus, for the time being I am just coding a long Python function that takes a grammar (a dictionary of rules) and generates the text.

(there is an XKCD for everything)

In terms of formal languages and parsers, I never really did more than playing. There probably is some language out there the solves the problem, but (to keep with programmer-related clichés) it is always funny to reinvent the wheel.

So, this is the syntax the polished version is accepting so far (all passing my hand-written units tests!). With a good thesaurus, you can already be dangerous. ;)


Feature Example Acceptable results for "@test@" (one per line)
Basic literal test: "abc" abc
Sequence test: "a::b::c" a
b
c
In-line sequence test: "[a|b|c]" a
b
c
Literal function test: "@letter@"
letter: "a::b::c"
a
b
c
Declaration test: "@l<<letter@ and @l@"
letter: "a::b::c"
a and a
b and b
c and c
Post-processing on literal test: "text->capital" Text
Post-processing on sequence elements test: "a::b->upper" a
B
Post-processing on in-line sequence elements test: "[A->lower|B] a
B

The post-processing functions already defined are lower, upper, title, and capital.

Wednesday, December 3, 2014

My model: Propp

Vladimir Yakovlevich Propp is a household name in the field of narratology. As given by Wikipedia:

Vladimir Propp was born on April 17, 1895 in St. Petersburg to a German family. He attended St. Petersburg University (1913–1918) majoring in Russian and German philology. Upon graduation he taught Russian andGerman at a secondary school and then became a college teacher of German.
His Morphology of the Folktale was published in Russian in 1928. Although it represented a breakthrough in both folkloristics and morphology and influenced Claude Lévi-Strauss and Roland Barthes, it was generally unnoticed in the West until it was translated in 1958. His character types are used in media education and can be applied to almost any story, be it in literature, theatre, film, television series, games, etc.
In 1932, Propp became a member of Leningrad University (formerly St. Petersburg University) faculty. After 1938, he chaired the Department of Folklore until it became part of the Department of Russian Literature. Propp remained a faculty member until his death in 1970.
 (source)

In fact, the narrative functions of fairy tales as given by Propp has been used by many "propp-generators", that build a random tale by selecting some or all of the functions. One example is the proppian fairy tale generator once hosted at Brown and now available at archive.org, here.

The generator is a simple JavaScript that randomly selects one of the alternatives for each function requested by the user. While simple, it works very well thanks to the basic text snippeds written by Laura Tan and Celeste Lim -- most of the resulting tales have enough cohesion. I decided to start by adapting their texts as models.

And so, without further ado, here are the results for the first sentence (from the "absentation" function):

It is said in the mountain where I live that the ground is made of our flesh and blood. Old Parents who toiled, sweat, cried, and screamed all bled into the soil and made us who we are today.

It is said in the place where we live that our dust is made of our flesh and blood. Flesh And Blood who toiled, sweat, cried, and screamed all bled into the earth and made us who we are today.

It is said in the hills where I live that our earth is made of our relatives. Old Relatives who toiled, sweat, cried, and screamed all bled into the earth and made us who we are today.
The grammar is still Spartan (in fact, it is pretty much a list of synonyms), but the system is working. I tested the basic features of the language, and even caught a bug (shouldn't have confused Python's .title() with a proper capitalization of the first letter...).

Tuesday, December 2, 2014

How it started

It started a week ago, when I was surfing the web on a poor mobile connection, on a bus from Porto Alegre to Rio Grande. It was impossible to load Boing Boing, so I headed to the next best (and lighter) thing, Hacker News. At the very bottom of the first page, there was a link to an article that immediately caught my attention, The strange world of computer-generated novels. Doing a PhD in history of literature, having been playing with natural language processing and linguistics since I was a teenager (which makes me the only person insane enough to do this), I started reading immediately.

The article was about NaNoGenMo, the National Novel Generation Month, an event that in 2013 flew under my radar. It is a hacker version of the much more famous NaNoWriMo, the National Novel Writing Month, "an annual event that encourages people to churn out a 50,000-word book on deadline". Darius Kazemi started NaNoGenMo in 2013, when he "tweeted out an off-the-cuff idea":
It seemed the perfect hobby -- hey, maybe I can even crack interactive narrative and computational narratology, the academic jargon for what Kazemi was proposing, build a bot the writes novels and publishes them, automagically! The perfect passive income, I wouldn't have to work anymore! And maybe I could even manage a Pullitzer or two.

I decided to set up this blog to narrate my adventure in the field. The title, as you probably noticed, is a snowclone, and indeed an old one: it's the 'got X?' formula that originated with a 1993 California Milk Board ad that ended with "got milk?". Snowclones, because most of what this computational narratology seems to be doing nowadays is substitutions like in snowclones.

And, in fact, I had a starting point. All programmers are lazy, and I had where to start from: a "generator of post-modernism" that I wrote back in 2008, my first attempt at JavaScript, essentially translating into Brazilian Portuguese and expanding the famous Postmodern Essay Generator (you can still find my work here -- it keeps amazing me, and it is a statement on the quality of academic writing in the humanities, how many people write me to say that they loved it).

The original code is a bit messy (I didn't want to spend time learning how to write a proper parser in JavaScript, and thus just went with dumb string replacements), and as a result of my idea of converting it into Python for this project it got even messier. But I have a bare-bones system that allows me to test how this kind of narration would work, while I develop a better language for the generative grammar and the so-needed parser. While it has its problems (like the possibility of getting into an infinite loop), the language of the Tzara engine, as I had called my dumb JavaScript replacements, already has some interesting features: named variables, declaration of constants, on-the-fly evaluation.

I had my engine but now, what should I ask the system to write? A detective novel? A Bildungsroman? An historical novel? Fantasy or science fiction? Comics?