Genetic Programming As If You Meant It: An Exercise in Mindful Practice

GECCO 2016 Workshop Proposal

November 6, 2015

Bill Tozier
email: [email protected]
twitter/github: @Vaguery

Overview

A hands-on exploration of some of the fundamental assumptions of evolutionary computing, framed as a kata or training exercise in which the audience makes collective, informed decisions to "repair" a running genetic programming process with an intentionally limited toolset.

Familiarity (but not expertise) with evolutionary algorithms and genetic programming is helpful, but participants should be comfortable participating in collegial process design discussions. Detailed background will be provided in written materials, though it will not be the focus of the face-to-face tutorial session.

Details

In this tutorial the participants will undertake a kata I call "Genetic Programming As If You Meant It", which is aimed at theoreticians and practitioners familiar with genetic programming. It addresses the assumptions built into our "typical" framing of genetic programming by forcing an exploration of a running software system that violates several of the standard forms of the field. As a discussion-driven tutorial, a diverse technical audience, with a mix of theoreticians and programmers, experts and students, mathematicians and data miners will improve the collective experience.

The format is inspired by the Coding Katas popular among software developers. As in those exercises, the intent here is to surface implicit habits and assumptions "built into" the field of evolutionary algorithms and machine learning, so that we might more mindfully decide when or whether they are useful. In this case, we will walk through a brief interactive—and collective—"game" (quoted because it's a serious game).

There are three roles in this game: that of the User, which is "played" by the audience as a whole; the System, which is an autonomous and unstoppable genetic programming system running on cloud servers; and the Facilitator, who manages and interacts directly with the System, coaches and advises the User, and judges the validity of proposed User moves.

The User's goal is to "rescue" the System by incrementally adding selection criteria and search operators to the running process—without interrupting it. This incremental "tuning" is aimed at getting the System to produce useful and interesting results for the problem it's been set up to "solve".

It should go without saying that System's goal is to run forever and generate a great deal of stored data and heat, which is apparently the goal of every Genetic Programming system. In this case, though, the User cannot stop the System, and thus aims to direct it towards particular kinds of data.

In each of its turns, the System creates an additional 500 individuals (without deletion), using the search operators it knows at the beginning of that turn, and selecting parents for breeding according to the fitness criteria ("rubrics") that it is aware of at the beginning of that turn.

The System always moves first.

On its turn, the User can examine the results and the state of the System, in order to collectively decide how to add exactly one (1) new search operator and one (1) new rubric to the system's specification.

If the Facilitator agrees that the changes are warranted, they are put in place before the next turn of the System.

At the beginning of the game, the only search operator known to the System is "random guessing", and it has no fitness criteria. That is, the initial behavior of the system is to simply create 500 random solutions without evaluating them on each turn.

The User moves are built in a simple (and quite limited) domain-specific language, and are technically within easy reach of any attendee of a Computer Science conference. In any case, technical aspects of the problem should not be of concern to the participants; rather the pedagogic goal is their experience of learning-by-doing in the context of a complex, fully-featured problem-solving system that is like but not identical to their familiar evolutionary search systems.

The conversations and arguments among the members are the point, in other words.

The crux of the kata is the second requirement on the User moves: they must provide the Facilitator with convincing warrants or justification for each move made. These cannot take the form "because we always do that", but can be of the form "because then we will be able to see if X instead of Y". In other words, they will be judged as valid insofar as they are plausible contextual arguments about the situation at hand rather than external experiences or prior knowledge.

The core of this kata is to promote discussion of the cause-and-effect relationships that are our "folk knowledge" about evolutionary algorithms and machine learning practice. We will discuss our working concepts of "fitness", "validation", "algorithm design", "pathology" (including "bloat", "parsimony", and "overfitting"), and will explore qualitative and quantitative justifications for design decisions about those behaviors.

A lively discussion will bring all of this to the surface in the course of the exercise. The setup of the System and the formal turn structure means that many evolutionary algorithms users' typical responses to "problems" are unavailable. While these constraints simplify the discussion and the options within the tutorial itself, they will also serve to provoke insights into more "standard" practices and situations in "normal" evolutionary algorithms.

A wide range of analytical tools will be on hand to explore and discuss the System state in every turn, and the "population" of individuals created, and their scores, is available in real time in each turn. Indeed, the System will be running in a cloud instance set to provide immediate feedback throughout the game.

Technical requirements

A reliable internet connection is necessary
A projection system
Furniture setup suitable for group work (round tables or easily moved chairs)

The "target" problems

There are four pre-built problems to date. All of them are intentionally "hard" (and would probably have won a Humie several years ago, if solved "automatically"), and have training and test data already set aside. All have also been checked that they can be solved (to some extent) by the System with sufficient prodding.

Speaker

William Tozier is a technical consultant, writer and performing engineer with 20 years' experience designing, constructing and applying Genetic Programming, Artificial Life and Agent-based systems in research and industrial settings. The focus of his work (in those fields) has been on Usability and User Experience, the practical effects of representation schemes, and loudly arguing against all mention of "optimization" whenever he encounters the idea in a Genetic Programming context. Most professional work in the last decade has focused on ameliorating the negative social effects of the imminent global collapse of the "academic lifestyle", and on fostering non-academic (and non-corporate) career paths for STEM scholars. He is not affiliated with any universities, and lives in Ann Arbor, Michigan.

Vaguery/GECCO-kata.md

Select an option

No results found