Last active
July 18, 2016 16:24
-
-
Save tra38/5ca218e02b470a5b117468b21ab17215 to your computer and use it in GitHub Desktop.
This is a script to 'unspin' a spintax-generated blog post about templating
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'spintax_parser' | |
class String | |
include SpintaxParser | |
end | |
spintext = %{ | |
>"Templates may seem like a terrible way to produce sentences until you consider the alternatives." --[Chris Pressey](https://github.com/dariusk/NaNoGenMo-2015/issues/16) | |
{How can you|Is it possible to|How on earth do you} {build|create|come up with|compose} an {algorithm|program} {to build|to create|for making} 'human-readable' novels? While {there was|there has been} [several](https://en.wikipedia.org/wiki/Just_This_Once) [attempts](https://en.wikipedia.org/wiki/Racter) made, {people|individuals|humans|human beings} {never|has not|has not yet|haven't} {reached|arrived at|spotted|located|stumbled on|found} a good "answer" {to that particular|to this} question... yet. | |
{However,|But} people are still {planning to|wanting to|seeking to|trying to} {sort out|resolve|solve} this "AI hard" problem. NaNoGenMo (National Novel Generation Month) is held every November...to enter, all you have to do is write an algorithm that can generate 50,000 words and then make your source code publically available. We previously discussed techniques to provide ["structure" to computer-generated novels](https://dev.to/tra/structure-in-computer-generated-novels) to make these novels easier for humans to read. | |
But what if we take a different approach to text generation? What if, instead of trying to induce structure in computer-generated works, we directly teach a {computer|computing machine|computing device|electronic computer} how to {write|compose} a novel? | |
There are {rules|regulations|conventions|patterns} that we have to {memorize|memorise} and use when {writing|penning} stories. We can {call|name} these {rules|regulations|conventions|patterns} "grammars". We have to teach a {human|human being} being how to write. We can also 'teach' a {computer|computing machine|computing device|electronic computer} how to {write|compose} too..by essentially hardcoding in the {rules|regulations|conventions|patterns} of literature. We can {call|name} this {framework|model} a template. We can also write 'nested' templates, having rules-within-rules to try and add more {variation|variance} to the generated stories. | |
[Calyx](https://github.com/maetl/calyx) is a Ruby gem that can be used to quickly define templates. Here's an example of how one such template can be defined: | |
``` | |
class HelloWorld < Calyx::Grammar | |
start '{greeting} {world_phrase}.' | |
greeting 'Hello', 'Hi', 'Hey', 'Yo' | |
world_phrase '{happy_adj} world', '{sad_adj} world', 'world' | |
happy_adj 'wonderful', 'amazing', 'bright', 'beautiful' | |
sad_adj 'cruel', 'miserable' | |
end | |
HelloWorld.new.generate | |
#"Hello bright world." | |
``` | |
Calyx was originally built to generate the NaNoGenMo novel [The Gamebook of Dungeon Tropes](https://github.com/dariusk/NaNoGenMo-2015/issues/189), a Choose-Your-Own-Adventure with procedurally-generated dungeon rooms. I have personally used it as a good prototyping tool for generating stories. [Somewhere, Something](https://github.com/dariusk/NaNoGenMo-2014/issues/133) is another computer-generated novel based on templates, though the programmer used Python instead of Ruby. In fact, it is far easier to find examples where templates are used in some fashion...than it is to find examples where templates are *not* used at all. | |
Templates are probably the easiest way to produce computer-generated literature, and has already been used commercially by companies such as [Narrative Science](https://www.narrativescience.com), [Automated Insights](https://automatedinsights.com), and [Yseop](http://yseop.com/EN/home). These companies produce automated reports and news stories that are {based|founded} on real-life data and are consumed by human beings. I have also attempted to produce a similar type of system when I wrote an {algorithm|algorithmic program} to generate the novel [The Atheists Who Believe In God](https://github.com/dariusk/NaNoGenMo-2015/issues/45). | |
Templates had also been very popular with "black hat SEO" specialists. These specialists are interested in quick content generation to appease the search engines, no matter how spammy or repetitive this content is. Therefore, these specialists resort to [article spinning](https://en.wikipedia.org/wiki/Article_spinning): taking a prewritten article and then replacing most of the words with {synonyms|equivalent words}. As a result, one article can be used as a basis to generate hundreds of "unique" content pieces. There's even a unique {language|syntax} used for article-spinning called Spintax...and there are many parsers for this format written in languages such as [PHP](https://gist.github.com/irazasyed/11256369), [JavaScript](https://github.com/johnhenry/spintax), and [Ruby](https://github.com/flintinatux/spintax_parser). Spintax has also been written to generate [spam blog comments as well](https://gist.github.com/shanselman/5422230), {varying|modifying} the possible {responses|replies} in the hopes of {tricking|fooling} spam filters and human beings into treating the comments as genuine. | |
The main {problem|trouble|difficulty} with templates is the 'manual labor' involved. After all, you still need a {human|human being} to produce the templates that the computer uses to {write|compose} stories. Yet, this 'manual labor' can be reduced with the use of automation. Spammers has written algorithms to "automatically" generate spintax based on a human-written article. While the output of the resulting templates may be {ugly|despicable}, they can be cleaned up later by human beings. Thomson-Reuters also received a patent in 2015 to [use machine learning to generate templates based on a corpus of {preexisting|preexistent|pre-existent} news material](http://www.google.com/patents/US20150261745) that will then be cleaned up by human beings. | |
Templates can also be criticized for avoiding the {problem|trouble|difficulty} of actually adding 'creativity/intelligence' to machines. The machine is not really being inspired to {write|compose} in the same way as a {human|human being} is...all it really does is follow orders encoded within a template. But other methods of text generation, such as [Char-RNN](https://github.com/karpathy/char-rnn) and [Markov chains](https://en.wikipedia.org/wiki/Markov_chain), has failed in producing long-form stories, though they are technically impressive and are evocative to {read|consume} in short bursts. The smarter the algorithm, the dumber the creative output...at least, [for now](http://qz.com/682814/i-want-to-talk-to-you-see-the-creepy-romantic-poetry-that-came-out-of-a-google-ai-system/). | |
Since templates are so effective in {producing|generating|creating} human-readable works, they will most likely be used in the near-future in a variety of different fields. It is likely that the first human-readable computer-generated novel will be based on templates used in {combination|aggregation} with other algorithmic techinques. | |
== Appendix === | |
Even this blog post has been generated using 'spintax', and this 'spintax' was generated using tools that can be easily found using Google. You can {view|see} the source Spintax [here](INSERT_LINK_TO_RAW_SOURCE_CODE). The generated spintax is very {terrible|awful|dire} and required a lot of work to clean up...but AI is always advancing. Who knows what may come next? | |
I only used spintax only as a proof of concept of templating. I strongly disapprove of article spinning, and so does Google. That's why it has [attempted to penalize the practice](http://www.seoblog.com/2014/08/panda-update-effectively-killed-autoblogs/) during its Panda updates to its search engine algorithm. | |
} | |
puts spintext.unspin |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment