cybercat.institute

Modular Error Reporting with Dependent Lenses# Archive of category 'compiler'

Apr 8, 2024

•

Modular Error Reporting with Dependent Lenses# Archive of category 'haskell'

Apr 22, 2024

•

The Build Your Own Open Games Engine Bootcamp — Part I: Lenses# Archive of category 'update'

Nov 26, 2023

•

jekyll,

update

About the CyberCat Institute blog# Archive of category 'model'

Sep 2, 2024

•

model

On Modelling

Jul 15, 2024

•

compositionality,

model,

Compositionality and the Mass Customization of Economic ModelsWe are an EU-based nonprofit organisation working on pure and applied research and open source software development. We are interested in all aspects of complex systems, optimization, control and intelligence, but our primary focus is on machine learning and economic theory.

We are neither academic nor startup, but at a nexus between both. Our work is motivated by both curiosity and urgent social need.

What is cybernetics?

The popular understanding of the word “cybernetics” has drifted over time to mean biotechnology, but its original meaning in the 1960s was the interdisciplinary study of controlling complex systems, in fields such as ecology, economics and computer science. Due to the urgent need for an interdisciplinary approach to 21st-century problems, and the emergence of category theory as a new mathematical foundation, we believe the time is now right to try to reclaim the original dream of cybernetics, and its name.

What is categorical cybernetics?

Category theory is a field of mathematics that arose out of topology and geometry. Applied category theory (ACT) is a recently-emerged interdisciplinary field that applies methods of category theory to practical problems, achieving industrial success in fields such as quantum computing and natural language processing.

Categorical cybernetics, or CyberCat for short, is a sub-field of ACT focussing on optimization and control, using a small set of category-theoretic tools such as optics and open games.Cross-posted from Oliver’s EconPatterns blog

The last EconPatterns post traced the history of economic design, focusing on the operations research group at Stanford’s business school and its role in developing auction design and market design. In this post I want to take this a bit further and describe the overlapping roles of operations research and economic design in more detail, anchoring on typical “operations research” domains, and how they quickly cross over into “economic” domains.

But a short definition of terms first: operations research is an applied branch of mathematics, mostly focusing on optimization (or “programming” in the original sense of linear programming, dynamic programming, combinatorial programming, etc.): orchestrating inputs to optimize (minimize or maximize) an output in the form of an objective variable.

The canonical result is a constrained programming or optimization problem expressed as one objective function and any number of inequalities expressing constraints or limits.

In the first post on the cybernetic economy I already stressed the role of these limits: available stocks can ran out or warehouses can overflow, machines can only transform so many pieces in an hour, pipelines, roads, conveyor belts can reach their capacity and get congested.

“Orchestration” means putting tasks into their correct order, balancing loads and flows, minimizing stocks without risking stockouts, avoiding congestions, disruptions, or volatility in the flow of goods, people, information, targeting objectives like fastest time to completion, minimal slack times, or lowest cost of inventory.

Operations research is the formal mathematical tool used by industrial engineers to design production plants, supply chains, workforce deployment plans, transport schedules and sundry other things that require juggling many parts under tight and often volatile conditions.

Even if it’s steeped in industrial (and military) lore, it’s also used in areas like microchip design, financial engineering, and all over the place in the digital economy. It’s pretty much everywhere in those parts of the economy that typically remain invisible to the casual observer: the engine room of a modern economy.

The hallmark of operations research is that it’s set up to serve one principal, focusing mostly on operations within an organization. This distinguishes it from economics proper, which focuses on exchange between and the resulting tension in objectives, motivations, and desires of multiple principals.

Operations research cuts over to (mathematical) economics at the same juncture where decision theory crosses over to game theory: when the diverging interests of the participants move to the forefront of the analysis.

EconPatterns deliberately straddles the boundary between the disciplines for a number of reasons: operations research has much closer ties to both computer science and industrial design, offers a much richer toolset to aggregate and disaggregate processes within a hierarchical structure, has a closer connection between theory and practice, is a much better design paradigm to model complex longitudinal interactions with many specialized components, and ultimately has more tangible and straightforward objectives, typically those that can be measured with a stopwatch or a yardstick, rather than abstractions such as the idea of an equilibrium as a stable state where conflicts are resolved.

On the other hand operations research works from a paradigm of central planning, a paradigm that is losing analytical heft the more the connected process under scrutiny — the value chain — involves interaction, goal and resource conflicts between principals rather than between machines, tools, parts, information, and labor.

So roughly speaking, as soon as the tension between principals becomes the driving factor, we cross over to economics. As soon as the need to concatenate activities or to disaggregate higher-level processes into tasks and subtasks dominates, we’ll lean more on operations research.

But the core message is still that from the EconPatterns vantage point, where the value chain is the analytical starting point for any design endeavor, that all but the most trivial value chains have multiple crossings not only between machines but also between organizations, jurisdictions, and even belief systems, and that not only efficiency but also accountability is relevant to the integrity of that value chain, the formal aspects of economic design will inevitably be on the cusp between the disciplines.

Let’s put this to use in two examples.

Machine replacement and coordinated machine replacement

Machine replacement is one of the core problems in industrial engineering. In its simplest form, it means finding the ideal time of putting an existing machine or process out of use and replacing it with another, presumably superior one.

The calculus, easy enough for bachelor-level exams, requires comparing the cost of the new machine (minus scrap value of the old machine) to the performance differential, most likely in a net present value calculation. If the performance benefit is higher than the cost of replacement, it’s a go.

From this starting point we can make the problem arbitrarily complex. What if the performances of the machines are not constant over time? What if the old one becomes gradually less efficient, with lower throughput and more frequent outages, or the new machine needs time to ramp up? What if the replacement itself doesn’t only include the purchasing cost of the new machine but also a work stoppage? What if the machine is part of a production facility? Does the whole production line have to be closed down, or are other similar machines on a different line able to take on the shortfall? What if the new machine is only able to perform better in conjunction with other replacements? What if uncertainty is involved?

Even if it’s useful to think of industrial engineering in terms of real industrial machines for milling, turning, or drilling, in an industrial machine shop, these “machines” could be pretty much everything. If a bank considers a new process for checking creditworthiness or if a college department contemplates restructuring its degree curriculum, they encounter similar planning and orchestrating problems. The introduction of video assist in sports is an example of machine replacement.

Today, in most cases the “machine” is simply a computer. More abstractly, a “machine” is simply any workplace where a defined transformation is taking place.

All of this happens, if nothing goes wrong, according a meticulously planned program, and if something goes wrong, hopefully according to a meticulously crafted contingency plan — the hallmark of central planning.

If we remove that requirement, and allow two new stakeholders in — competitors and customers — we get something we might call coordinated machine replacement, or in a more succinct and better known term: innovation.

Innovation in its most technical definition is the increase in total factor productivity, or aggregate outputs produced by aggregate input factors (in economics, famously labor, capital, and soil, but I’ll devote another post to that). In other words: the collective replacement of machines, processes, activities, to make resource use more efficient, in a (more or less) competitive economy.

In a “textbook” model of the economy, where firms are seen as singular and solitary production functions, replacement happens by Schumpeterian competition: companies which improve their efficiency by optimizing their production will gain a competitive advantage which lets them capture value, “Schumpeterian rents” in the innovation economics nomenclature, for as long as that efficiency advantage persists.

This economic pressure: technologically disadvantaged competitors see their margins evaporate until they either catch up or give up and leave the industry, is the driver of economic innovation and, in turn, economic growth.

So much for the textbook treatment.

The era of Henry Ford shutting down production for five months to replace the Model T with the Model A being decisively over, the problem of keeping interdependencies uninterrupted while interrupting a single step in a complex value chain moves to the forefront.

Research and development for new car generations now starts long before the existing car generation gets taken off the market. There is no more reason to lay off workers, cancel orders for parts, keep dealerships waiting for new vehicles, or hope that customers are willing to wait a few months rather than wandering off to the competition.

Some of this still falls under a competitive bracket. Laid-off workers and stranded dealers might also defect to the competition. In other cases, outright coordination might become necessary such as in the adoption of shared technology standards or auditing rules. Value chains might become reintegrated such as electric vehicle manufacturers, recognizing that market competition does not supply enough charging stations, reluctantly entering the market for charging infrastructure.

The less we think about technological disruption of value chain as a purely competitive event of isolated actors, the more we need to reach into the toolbox of operations research methods.

Machine scheduling and coordinated machine scheduling

Machine scheduling is at the heart of operations research, and even if one of its synonyms, “ job shop scheduling”, betrays its origin on the shop floors of the industrial era, it’s still at the heart of most algorithmic processes that try to direct inputs toward productive outputs.

The underlying idea is that jobs have to be allocated to machines on which they can performed. In its simplest form, these jobs consist of a sequence of steps, similar to Adam Smith’s pin factory, where a prior step has to be finished before a subsequent job can be started.

This setup can be made more complicated in many ways. Machines (and their operators) might be specialized to perform only certain tasks. Jobs might require setup times which either have to wait until prior jobs are finished or can be started while the prior job is still running. Uncertainty can come into play in many ways.

The objective is typically to minimize time to completion, maximize machine utilization, or some related measure.

Machine scheduling has successfully crossed over from the shop floor to the digital economy, especially when it comes to platform operations where the “machines” can be vehicles: taxicabs, scooters, coaches, and the “jobs” can be passengers trying to get from A to B in a timely, cheap, and secure manner.

This is again a scenario where the worlds of economics and operations research intersect. We can think of a platform as a central conductor trying to move people from A to B, which inevitably requires operations research knowledge, but we also have passengers (and in some cases, drivers) as participants with diverging interests, which requires economic and especially game theoretic knowledge.

The boundary is blurry and the scale might tip whenever we realize that we’re better off assigning a modicum of autonomy to the many interlocking parts, that the machines might find a better solution if we let them compete for scarce resources and avoid congestions rather than insisting that coordination requires central control.

But it also helps to think of operations research as the discipline that operates bottom-up, assembling economic engines from universal elementary operations, while economics tend to operate top-down, from a highly aggregated macroeconomic perspective to individual microperspectives. But it also helps to think of operations research as the discipline that moved from the shop floors to academia while economics is still trying to move in the opposite direction.

The blurry boundary between economics and operations research

Design is ultimately about breaking complex problems down into their constituent parts, solving them in isolation and reassembling them in the hope that the partial solutions fit together. This requires, almost regardless of the application domain, that we start with a rough outline of the potential solution and decide step by step which partial problems require particular attention to detail.

This can be done in a methodical or in a haphazard fashion. In particular, the opposing risks of not enough attention to detail or too much attention to detail loom large over failed design projects. This is certainly not restricted to economic design, but economics as a discipline suffers from a lack of conceptual rigor and increasingly an overflow of formal rigor.

This isn’t only the case for the part of the design process where we go from a “rough outline” (a conceptual understanding of the overall problem) to a fully fleshed out formal model, but also, once we understand that we need to apply a formal toolset, a lack of understanding which toolset applies to the problem at hand.

In this post, we’re in the latter part of that process. Both operations research and mathematical economics are highly formalized frameworks which share a common history in the evolution of constrained optimization but which for at least two generations (roughly from the inception of the Econ Nobel and the deliberate choice by the Nobel committee to reward the economists but not the operations researchers working on the same problem) barely talked to each other.

Over the last ten years or so, we’ve seen a gradual rapprochement between the disciplines, in large part because the new players of the digital economy started to realize that their machinery is often economic in nature — auctions, matching markets, information and risk aggregators — even if they deal in abstract information goods rather than in physical objects assembled on the shop floors of the industrial economy.

In the process they’ve also recognized that the academic paper exercises which constitute the main output of modern economics aren’t sufficient to assemble production-ready economic engines. For this you also need scalability, modularity, interoperability, and an understanding of human interaction that bows as much to drab realism as it does to formal aesthetics.

To offer a simple example: in the push to succeed in the global coordinated machine replacement problem known as the transition from fossil to renewable sources of energy, we can’t just assume that we’re fine and markets clear if aggregate supply matches aggregate demand.

We also have to take into account that energy is rarely ever produced where and when it’s needed, neither in place nor in time. So we have to apply a model of an energy economy that pays attention to stocks and flows — in other words, a cybernetic economy.# Archive of category 'categorical'

May 29, 2022

•

categorical,

What is Categorical Cybernetics?# Archive of category 'categorical cybernetics'

May 29, 2024

•

Reinforcement Learning through the Lens of Categorical Cybernetics

Apr 15, 2024

•

machine learning,

categorical cybernetics,

functional programming

Building a Neural Network from First Principles using Free Categories and Para(Optic)

Apr 12, 2024

•

Enriched Closed Lenses

Feb 22, 2024

•

Iteration with OpticsRecently we held a workshop in Edinburgh titled Mathematics for Governance Design, consisting of a roughly 50/50 split between social scientists and category theorists.

The workshop was organised by Philipp and myself from the CyberCat Institute together with Seth Frey, Saba Siddiki and Josh Tan from (in an overlapping way) Metagov, the Institutional Grammar Research Initiative and the Computational Institutional Science lab. It was funded and hosted by the International Centre for Mathematical Sciences as part of their Mathematics for Humanity programme of events.

We designed the workshop to have as little scheduled time as possible and as much unstructured working group time as possible - inspired by our past experience running a workshop at Wytham Abbey and originally inspired by Dagstuhl. And it was a resounding success: it felt like the theme of the week was seeing famous people who we would never expect to interact with each other interacting. The danger of running a workshop like this is that the two different groups would form cliques and only interact with each other under duress, but the exact opposite happened.

Probably my personal highlight was being able to meet Matilde Marcolli and talk about our shared interest in the very hard question of how to surpass the well-known scalability barrier for human self-organisation (for example written about extensively by Elinor Ostrom). We agreed that compositional game theory and related categorical cybernetics methods could plausibly have a role to play for building models of social situations consisting for example of groups of groups arranged in an approximate hierarchy. (When taken as revolutionary this is an aspect of anarcho-communism, although my personal interest is a bit too theoretical to call it that.) In fact I should write a blog post on the general topic of hierarchies of lenses, which is something I’ve talked extensively about with several people, most notably Toby Smithe in the context of modelling the human cortex. I talked quite a bit it in high level terms in this blog post, but a more technical post might be in order.

Another highlight was being able to finally engage with institutional grammar and think seriously about how it could relate to open games. There is some past work ( this and this, plus a paper I wrote earlier this year with Vincent Wang-Maścianica that isn’t released yet) on connections between open games and natural language, but to me the limiting factor has always been that the open game semantics of individual words must be hand-crafted, which will not scale beyond the tiniest of toy examples, and there has never been a plan for it besides “maybe hand-craft enough examples to fine-tune an LLM and hope for the best”. My immediate thought now is that the type of natural language texts we would like to apply this to will often factor through an institutional grammar representation, in a way that is likely to be extremely useful for game-theoretic analysis. I learned last week that institutional grammar has already been connected to agent-based models, and if it is possible to go to an agent-based model then it should be not significantly harder to go to a game-theoretic model. The benefit, of course, is that Nash equilibrium is a very standard perspective for thinking about the theoretical nature of institutions.

Of course we understand as well as anybody that interdisplinary research is extremely hard and we should not expect immediate technical results after bringing together two such different groups. But we felt real excitement in the room, and we have every reason to expect multiple new collaborations to be formed, and we are already planning a successor workshop next year.# Archive of category 'game theory'

May 9, 2024

•

Exploring best response dynamics

Apr 22, 2024

•

The Build Your Own Open Games Engine Bootcamp — Part I: Lenses

Apr 1, 2024

•

game theory

Colimits of Selection Functions

Jun 24, 2022

•

cryptoeconomics,

open game engine

A Software Engine For Game Theoretic Modelling - Part 2# Archive of category 'software engineering'

Apr 8, 2024

•

Modular Error Reporting with Dependent Lenses# Archive of category 'compositionality'

Jul 15, 2024

•

compositionality,

model,

Compositionality and the Mass Customization of Economic ModelsCross-posted from Oliver’s Substack blog, EconPatterns

The fundamental economic exchange is surprises for eyeballs.

Modern economics is built around understanding the mechanics of market exchange, but it hasn’t always been that way. The etymological root of economics, the Greek oikonomia points toward household management, or husbandry of the (largely self-sufficient) estate, the oikos. Today we would call it home economics.

After discussing the fundamental grid of the economy in the last post, it makes sense to lay out the underlying assumptions of human behavior within that economy in some detail — and both the title and the introductory statement (possibly the first pattern introduced) should make it clear that these assumptions differ somewhat from the traditional textbook treatment of economic agents.

But they also differ from the various attempts to bound the rationality assumptions of textbook economics in some way, be it in the Carnegie “ satisficing” or in the Berkeley “ behavioral” tradition. It nevertheless incorporates both, in addition to a variety of other behavioral quirks which we might not associate with the economic realm.

The major reason to tweak our behavioral assumptions is that to design economic structures we need a coherent framework for dealing with a variety of settings in which we need to be able to apply a varying set of behavioral assumptions while still trying to stay coherent.

So it’s not so much a behavioral assumption but a template for developing context-specific behavioral assumptions — or in other words, a design pattern. Humans behave differently in different social settings, and we should be able to pick the right model for the right circumstances, but still be able to treat it as a special instantiation of a shared underlying pattern.

This explicitly includes using the assumption of perfect rationality wherever it is warranted.

So let’s grab our opening statement and take it apart.

Eyeballs

“Eyeballs” is marketing vernacular for attention. The term can be taken quite literally — there are devices that track eyeball movement to find out how much screentime is spent staring at ads. But for the most part I will use it metaphorically as the cognitive effort devoted to a task.

It is perfectly fine to assume away cognitive limitations in a wide variety of circumstances. It simplifies our model significantly. It deflects accusations that a given policy claim is the outcome of an opportunistically chosen (boundedly rational) behavioral model rather than an underlying economic force. And in many scenarios it creates good-enough predictions for the task at hand.

Assumptions are simplifications that ideally give us more gain in parsimony than loss in predictive accuracy. As long as that’s what they do, they do their job.

But there are also situations where such an simplifying assumption produces results that stray too far from the observable reality, and we need to have a plan for how we want to adjust the behavioral model in those situations.

A fair starting assumption is to expect that the economic actor will allocate cognitive resources economically and allocate the most attention to those tasks where she expects the most bang for the buck. And that brings us to the other part of the statement.

Surprises

The economic expression for “expects most bang for the buck” is “maximum expected utility”, but this requires a lot of foreknowledge where we can’t simply assume under all circumstances that our economic actor already possesses it. Every time you see an economics paper assuming that our actor knows something about the distribution of a random variable you know we’re on shaky ground.

So the next level is to assume that our actor will venture to find out and acquire this knowledge step-by-step in what we can call a process of discovery — which usually means a sequence of failures that terminates either with a moment of success or the decision to call it off. In econspeak, this discovery process is known as tatônnement.

But we shouldn’t assume that our agent just wanders around in the desert aimlessly hoping to find an oasis — a stark example of such a discovery process with a life-or-death ending — but that there should be a plan behind those wanderings.

That plan is usually to devote the existing resources, cognitive and physical, in a way that maximizes the knowledge gained about the terrain. In our desert scenario this might translate to climbing to the top of a ridge to survey the territory, or alternatively to stay near the valley floor to limit exposure to sunlight.

We can call this process in two ways: uncovering secrets — where a secret is anything that wasn’t known before but is known after — or hunting for surprises.

Surprise expresses the same thing — some difference between what was known before vs what is known after — but it also gives us the opportunity to express it in two ways: positive surprise and negative surprise.

The fundamental economic exchange is surprise for eyeballs

Loosely translated, positive surprise is beneficial — something worth seeking out — and negative surprise is harmful — something to be avoided. On this single dimension we can build a (surprisingly) wide range of behavioral models, including differentiating individuals by their propensity to seek out positive surprise and accept negative surprise in the process, in other words by their affinity for disorder.

This has clear connotations to the behavioral assumption of risk preference, and this connection definitely warrants further attention — risk is a transferable economic commodity — but it also gives us the additional angle that planning is a vehicle to mitigate negative surprise for individual actors, and contracting is a vehicle to mitigate negative surprise for collective action, including the canonical form of collective action: the organization (which will be at the center of next week’s post).

A lot of this will be fleshed out in the weeks to come, and some of the jumping-off points should already be apparent. Surprise gives us the opportunity to invoke both information entropy and ultimately thermodynamic entropy. But as already mentioned, this series will only use these ideas conceptually, and point towards formal treatments in their respective literatures.

Design is a guided trial-and-error process where judgment calls have to be made about the structure of the problem, about splitting it into its constituent parts and putting the parts back together in the hope that no unwanted interaction effects emerge, about taking requirements and putting them in an order, about defining and resolving contingencies and dependencies, about the level of detail at which a problem needs to be resolved, at which precision, and how far into the future.

For this we need a flexible model of behavioral assumptions that can be adjusted to fit the task at hand, that can be experimented with. “Surprises for eyeballs”, or in other words, “secrets for attention”, gives us exactly that.

The good old-fashioned attention economy

There’s an obvious objection to this treatment, and it’s a fair one. “Surprise for eyeballs” is most obviously suited to the information economy, or maybe more aptly: the attention economy, and in the trad economy we might be better off dealing with the canonical exchange of supply vs demand in its trad form of an effort (a product or service) vs a payment.

Let me use George Akerlof’s famous essay on the market for lemons to show why even in a world of a one-off transfer of a physical object for a simultaneous transfer of a monetary equivalent is still a special case of an attention economy full of surprises.

Akerlof’s paper kicked off the field of information economics, and is most widely associated with introducing the concept of asymmetric information. But as the second half of its title suggests, it’s actually about quality competition (a “lemon” being a colloquial term for a used car of poor quality), and the information angle is about the inability of conveying this quality — especially about the inability of an owner of a high-quality car to establish that his car is not a lemon.

But how do we find out if a car is a lemon? And how do we insure ourselves against the risk of acquiring a lemon? By finding out.

In the same sense of the stranded-in-the-desert example above, the process of finding out is a discovery process except with opposite signs. It’s a sequence of successes terminated by a failure — which is true for all machines: they run until they break down.

But there’s an inevitable random element to this process, and even if we can assume that lemon-ness correlates negatively with longevity, that relationship is far from deterministic. We cannot conclude with certainty from the time of failure whether the car was a lemon — even if the prior owner knew about its lemon-ness.

This simple recognition has a wide array of ramifications worth taking apart in detail, because most of them are central to economic design — not only of economic engines like markets, auctions, recommenders or reputation engines, but also to the design of economic institutions. Notoriously, the business model of the Roman Catholic Church is that of a certifier of good conduct: a good old-fashioned reputation engine.

The tl;dr of this excursion is that almost all goods are experience goods in that their value only becomes apparent when they are consumed, and the consumption harbors the possibility for surprise, positive or negative.

If this happens over a longer time span like driving a car, if it happens immediately like eating ice cream, or if immediate consumption might trigger belated effects like getting toothache, depends on the circumstances.

But the canonical economic trade of a perfectly substitutable commodity of perfectly equal quality is a simplifying assumption resting on a lot of institutional underpinnings. Almost all trades, in the trad economy or the digital economy, contain an element of surprise, and in turn engage our propensity to shield ourselves from it, or to embrace it.Cross-posted from Oliver’s Substack blog, EconPatterns

In the first four posts, I tried to map out an economy structured around the need to find out. This didn’t happen by accident, but is the result of spending a couple of decades in a realm where academic economic knowledge is held in little regard in no small part because its gatekeepers like to give off an air of having it already figured out, even if from the circumstances it’s clear that is rarely ever the case.

It doesn’t match my own opinion, but I perfectly understand when, say, the founders of a three-person startup bid adieu to their knowledge of academic economics when they learn that there is no such thing as a demand curve unless they put in the effort to assemble it piece by piece, transaction by transaction, price change by price change.

Most of them stop at this point and direct their attention to other, more pressing concerns, and I can’t blame them for it. The “need to find out” gets short shrift in most economics classes since economic instruction at universities generally starts from a vantage point where the groundwork has already been laid by wizards behind the curtain, and all that’s needed for mere mortals is to fine-tune the preconceived machinery.

That’s also a major reason why economists find employment in government, big banks, and, increasingly publicly listed tech firms, within large machineries, but are rarely ever in demand as one of the three founders of a recently minted startup with more enthusiasm than cash — or data.

This series tries to remedy that situation, and I could have subtitled it either “economics for startuppers” or “startup thinking for economists”, except the intended scope — and my intended audience — is a bit wider than that.

Use of decentralized knowledge in society

The underlying idea of “finding out” pursued in EconPatterns is ultimately derived from Adam Smith’s gains from specialization that drives specialization of labor, and that ultimately influenced another key contribution to economic lore, Friedrich Hayek’s Use of Knowledge in Society.

Hayek’s point was that there’s no point trying to steer the whole economy from a central vantage point because there is always someone somewhere closer to the ground, steeped in operational detail, who knows better, and can put that knowledge to better use than the central planner.

This idea that there is always local knowledge that is more detailed than the aggregated knowledge on the macro level, that there is knowledge that is nested, and that all participants have a mental map of the economy that is most detailed in their own vicinity and that degrades in detail, certainty, or precision, that resorts to using coarse-grained models, aggregates, or even stereotypes, the further one moves away from one’s own location, is deeply embedded in EconPatterns.

And this isn’t only true for the physical dimensions, it’s also true for the temporal dimension. Both the past and the future get hazy very quickly, and we resort to increasingly coarse-grained knowledge the further we go in each direction: Hayek’s “knowledge of the particular circumstances of time and place”.

There is an inevitable urge to remedy this shortcoming with the magic potion of “more transparency”. Every time we hear news about another supply chain pile-up, there is the inevitable stratum of pundits opining that this (the negative surprise, that is) could have been avoided if we just magically gave every participant a detailed map of the whole economy, or at least the whole chain of events — network really — leading to the participant’s problems stemming from the unexpected supply chain outage.

This is illusory of course to anyone attuned to the operational details of supply chain, not only because these pundits habitually underestimate by several orders of magnitude just how much operational raw data is out there, most of which is of no use to anyone but the data owner, but also because the countervailing demands of privacy and transparency (usually leading to the conundrum of each side demanding transparency from the other party but insisting on privacy for oneself) will inevitably lead to privacy winning out, except in those cases where the more powerful actor can compel less powerful actors to disclose their secrets.

Supply chain and value chain integrity

Designing a mechanism that orchestrates the conflicting information needs of the participants in a value chain or its mapping into the physical realm, the supply chain, is still a holy grail in operations and in economic trade, but in no small part because the reasons why such a governance mechanism is hard to come by are still poorly understood.

Finding this holy grail, and mapping out the path to its discovery, is of course the goal of this series. A starting point is to arrive at a better understanding how knowledge disseminates thru an economy, where, when, and why it forms clusters (especially in the form of belief clusters), and how to interfere in that flow in a structured, goal-oriented way.

Just to offer a simple example, novices in the field of supply chain are often surprised to learn that the bill of lading, one of the crucial documents ensuring integrity of a product thruout its transit along ports, flights, shipments, loading and unloading, handovers and often rough handling, is still legally required to be in paper form, sent by courier from station to station.

A simple impulse is to blame an overbearing bureaucracy or an industry staunchly resistant to organizational change and technological progress, but an alternative and more plausible explanation is that paper solves a few integrity requirements that electronic communication still has a hard time to solve.

When handovers and handshakes are still the literal thing involving actual hands, if signatures are still done by hand in the presence of the counterparty, we are solving a few problems about identity that turn out to be quite tricky once we try to shift them online, into the digital domain where ascertaining that an individual is who they claim to be can be exceedingly tricky.

Turns out this simple example is repeated all over the place, in all kinds of domains and scenarios, with a number of idiosyncratic details added, but the underlying pattern still the same.

This is why I will come back to that example again and again. Because that is what EconPatterns is about.Categorical cybernetics, or CyberCat to its friends, is – no surprise – the application of methods of (applied) category theory to cybernetics. The “ category theory” part is clear enough, but the term “ cybernetics” is notoriously fluid, and throughout history has meant more or less whatever the writer wanted it to mean. So, let’s lay down some boundaries.

I first proposed CyberCat, both as a field and as a term, in this 2019 blog post (for which this one is partly an update). There I fixed a definition that I still like: cybernetics is the control theory of complex systems. That is, cybernetics is the interaction of control theory and systems theory.

We add to this applied category theory, which has some generic benefits. Most importantly we have compositionality by default, and a more precise way of talking about it than in fields like machine learning where it is present but informal. Compositionality also gets us half way to computer implementation by default, by making our models similar to programs. Finally category theory gives us a disciplined way to talk about interaction between models in different fields.

It turns out - and this fact is at the heart of CyberCat - that the category-theoretic study of control has a huge amount of overlap with things like learning and strategic analysis. Those were also historically part of cybernetics, and can be seen as aspects of control theory with a certain amount of squinting, so we also include them.

On top of that definition, a cultural aspect of the historical cybernetics movement that we want to retain is that cybernetics is inherently interdisciplinary. Cybernetics is not just the theory but the practice: in engineering, artificial intelligence, economics, ecology, political science, and anywhere else where it might be useful. (Part of the reason we created the Institute – more on that in a future post – is to make this cross-cutting collaboration easier than in a unviersity.)

Cybernetics has been an academic dirty word since many decades now: in the 60s and 70s it went through a hype cycle, things were over-claimed and the field eventually fell apart. As founders of the CyberCat Institute we believe that the time is right to reclaim the word cybernetics. Apart from anything else, the word is just too cool to not use. More importantly, the objects of study – and the interdisciplinary approach to studying them – are even more important now than 50 years ago.

Having laid out what CyberCat could potentially be, I will now narrow the scope. At the Institute we are focussing on not just any applications of category theory to cybernetics, but to a small set of very closely interrelated tools. These are, roughly, things that have a family resemblance to open games.

This post isn’t the place to go into technical details, but what these things have in common is that they model bidirectional processes: they are processes (that is, they have an extent in time) in which some information appears to flow backwards (I described the idea in more detail in this post). The best known of these is backpropagation, where the backwards pass goes backwards. A key technical idea behind CyberCat is the observation that many other important processes in cybernetics have a lot in common with backprop, once you take the right perspective. The category-theoretic tool used to model these processes is optics.

Besides backprop, the things we have put on a uniform mathematical foundation using optics are value iteration, Bayesian inference, filtering, and the unnamed process that is the secret sauce of compositional game theory.

This is the academic foundation that we start from. The question that comes next is, so what? How can this knowledge be exploited to solve actual problems? This is where the CyberCat Institute comes in, but I want to leave that for a future post. In the meantime, you can look at our projects page to see the kinds of things we are working on right now.# Archive of category 'probability'

Feb 6, 2024

•

Passive Inference is Compositional, Active Inference is Emergent## Directors

Oliver Beige

Jules Hedges

Philipp Zahn

Members

Nicolas Eschenbaum

Fabrizio Genovese

André Videla# Archive of category 'dependent lenses'

Apr 8, 2024

•

Modular Error Reporting with Dependent Lenses# Archive of category 'machine learning'

Oct 14, 2024

•

On Hopfield Networks and Boltzmann Machines

May 9, 2024

•

Exploring best response dynamics

Apr 15, 2024

•

machine learning,

categorical cybernetics,

functional programming

Building a Neural Network from First Principles using Free Categories and Para(Optic)# Archive of category 'deep learning'

Mar 18, 2024

•

deep learning,

AI safety

Learning with Invariant PreferencesI thank Oliver Beige for many helpful comments.

Fables or algorithms?

Economic theory formulates thoughts via what we call “models.” The word model sounds more scientific than the word fable or tale, but I think we are talking about the same thing. (Ariel Rubinstein) 1

Are economic models useful for making decisions? One might expect that there is a clear answer to this simple question. But in fact opinions on the usefulness or non-usefulness of models as well as what exactly makes models useful vary widely - within the economic profession and of course even more so beyond. Sometimes the question feels like a Rorschach test - telling more about the person than about the subject.

In this post, I want to explore the question of usefulness. Even more so, I want to explore how the usefulness ties into the modelling process. The reason for doing so is simple: Part of our efforts at CyberCat is to build software tools to improve and accelerate the modelling process.

The importance of this is also evident: If models are useful, and we improve the process generating useful models, we improve our decision-making. And in so far as these improvements tie into computing technology, as they do in our opinion, improvements could be significant.

Economic models

My question, "are economic models useful", is quite lofty. So, let's first do some unpacking.

What do I mean by economic model? A mathematical, formal model which relates to the domain of decision-making at hand. A prototypical example is a model that tells us how to bid in an auction. Such models are often classified as applied economic models.2

Why do I emphasize "economic"? If my question was: Are mathematical models useful for decision-making, the answer would be a simple yes and we could call it a day. Operations research models are in production for a multitude of tasks (job scheduling, inventory management, revenue management etc.). In fact, many of these models are so pervasive that it is easy to forget them. Just think about the business models that have been built on the navigation and prediction functionalities of Google maps.

The distinction between operations research and economics is obviously blurry and more due to artificial academic barriers than fundamental differences ( check out Oliver's post on\ this). I am making the crude distinction that economic models are about several agents interacting - most often strategically - whereas traditional operations research models are focused on single decision-makers.

Now, this is crude because obviously operations research by now also includes auctions and other models that are interactive in this way. Moreover, as Oliver pointed out in another\ post several leading economists who advanced the practical use of economic models (which we still come to) have an operations research background.

It is, I think, also not a coincidence that operations research has moved into the realm of interactive agents: Due to globalization and in particular the internet, companies have become more interconnected and also have much more technical leverage. 50 years ago, the idea that a regular company could be designing their own market probably would have been quite a thing. Today, it is part of the standard startup toolkit.

Technology and interconnectedness are driving the need for models that help decide in such a world as well as design the frameworks and protocols in which decisions take place. Economic models are the natural candidate for this task.

Useful?

Let's turn to the central part of my question. What do I mean by useful? Opinions on this vary widely. According to Rubinstein, the question how a model can be useful is already ill-posed. Models are not useful. Models might carry a lesson and can transform our thinking. But they are of little value for concrete decisions.

In economics, Rubinstein's position is an extreme point. On the other side of the extreme, economists and even more importantly computer scientists are working on market design and mechanism design models.3 Models in this spirit are "very" practical: they do affect decisions in a concrete sense - they get implemented in the form of algorithms and are embedded in software systems.

We can think of fables and algorithms as two ends of a spectrum - from basically irrelevant to decisive for a choice we have to make. While it is hard to precisely locate a given model on this "usefulness" line, we can consider how a model can become more useful when moving along the spectrum. Of course, what constitutes value and who benefits how from a model changes along this path as well. The usefulness of a model is a matter of degree and not an absolute.

Let's begin at the fable end and start moving inroads. How can a model produce value? If we are faced with infinitely many ways to think about a situation, even a simplistic model can be valuable. It helps to focus and to select a few key dimensions. This aspect becomes even more important in an organizational context where people have to collaborate and it is very easy to get lost in a myriad of possibility and different interpretations.

Many classic games (in the game theory sense) like the Battle of the Sexes, Matching Pennies, and of course the Prisoners' Dilemma help to focus on key issues - for instance the interdependency between actions and their consequences. To be clear, the connection how to map a model into a concrete decision is very loose in this case and the value of the model lies in the eyes of the analyst.

These games often focus on a few actions ("defect" or "cooperate"). Moreover, agents have perfect information about the consequences of their actions and the actions of others. In many situations, e.g. in business contexts, choices are more fine-grained and information is not perfect. Models in Industrial Organization routinely incorporate these aspects, for instance analyzing competition between companies. From a practical perspective, these models often resemble the following pattern: If we had information X, the model would help us make a decision. Consider strategic pricing: It is standard in these models to assume demand to be known or at least drawn from a known distribution. The demand curve will then be typically a smooth, mathematically well behaved object. Such models can produce insights - no doubt about it.

But they rarely help to make a concrete decision, e.g. what prices to charge. There are many reasons for this but let me just give an obvious one as a co-founder of a startup: I would love to maximize a demand curve and price our services accordingly. But the reality is: I do not have a curve. Hell, if I am lucky I observe a handful of points (price-quantity combinations). But these points might not even be on any actual demand curve in the model's sense. So, while useful for structuring discussions around pricing, in the actual decision to set prices, the model is only one (possibly small) input. And this is very typical. Such models provide insights and do help to inform decisions. But they are only part of a collage of inputs into a decision.

There are economic models which do play a more dominant role in shaping decisions. Consider auctions. There is a rich theory that helps to choose a specific auction format to solve a given allocation problem. Still, even in this case, there are gaps between the model and the actual implementation, for instance when it comes to multi-unit auctions.

The examples I gave are obviously not meant to be exhaustive. There are other ways how a model can be useful. But this is not so important. The main point is, that all along the usefulness line, economic models can produce value. The question is not whether a model produces a choice but whether, at the margin, it helps us make better decisions. And this can happen all along the spectrum. Moreover, ceteris paribus, the further we move along the path towards the algorithm end, the more influence the economic model gains relative to other inputs into a decision and the more value it produces.

If we accept this, then an immediate question comes up: How can we push models from the fable side more towards the algorithm side? Let's explore this.

The process of modelling and the library of models

I first need to discuss how models get located on a specific point on the usefulness line in the first place. But this requires digging into the actual modelling process. Note again that I am only interested in "instrumental" modelling - models that are useful for a specific decision at hand. My exposition will be simplistic and subjective. I will neither cover the full range of opinions nor be grounded in any philosophical discussions of economics. This is just me describing how I see this (and also how I have used models in my work at 20squares).

Applied models in economics are a mixture of mathematical formalism and interpretative mapping connecting the internals of the model to the outside world. Mappings are not exclusive: The same formal structure can be mapped to different domains. The Prisoner's dilemma is such an example. It has various interpretations from two prisoners in separate cells to nuclear powers facing each other.

The formal, inner workings of models are "closed" objects. What do I mean by that? Each model describes a typically isolated mechanism, e.g. connecting a specific market design with some desirable properties. The formal model has no interfaces to the outside world. And therefore it cannot be connected to other models at the formal level. In that sense a model is a self-contained story.

Let me contrast this with a completely different domain: If one thinks about functional programming, then everything is about the composability of functions (modulo types). The whole point of programming is that one program (which is a function) can be composed with another program (which is a function) to produce a new program (which is a function).4

Back to economic models. When it comes to applications, the "right" model is not god given. So, how does the process of modelling real world phenomena look like?

As observed by Dani Rodrik5, the evolution of applied models in economics is different from the evolution of theories in physics. In physics one theory regularly supersedes another theory. In economics, the same rarely happens. The practice of modelling is rather about developing new models, like new stories, that then get added to the canon.

One can compare this to a library where each book stands for a model that has been added at some point. Applied modelling then means mapping a concrete problem into a model among the existing staple or, if something is missing, develop a new model and add it to the canon.

Inherent in this process is the positioning of a model on a specific point in the spectrum between fables and algorithms. Models mostly take on a fixed position on the line and will stay there. There are exogenous factors that influence the positioning and that can change over time. For instance, the domain matters. If you build a model of an intergallactic trading institution, it is safe to assume that this model will not be directly useful. Of course, this might change.

Like stories, certain models do get less fashionable over time, others become prominent for a while, and a select few stay ever-greens. Economists studying financial crises in 2006 were not really standing in the spotlight of attention. That changed radically one year later.6

Let me emphasize another aspect. I depicted applied models as packages of internal, formal structure and interpretative maps connecting the internals with some outside phenomenon. This interpretative mapping is subjective. And indeed discussions in economic policy often do not focus on the internal consistency of models but instead are more about the adequateness of the model's mapping (and its assumptions) for the question at hand. Ultimately, this discourse is verbal and it is structurally not that different from deciding which story in the bible (or piece of literature, or movie) is the best representation of a specific decision problem.

The more a model will lean towards the fable side, the more it will be just one piece in a larger puzzle and the more other sources of information a decision-maker will seek. This might include other economic models but of course also sources outside. Different models and other sources of information need to be integrated.

As a consequence, whatever powers we gain through the formal model, a lot of it is lost the moment we move beyond the model's inner working and need to compare and select between different models as well as integrate with other sources. A synthesis at the formal level is not feasible.

Let me summarize so far: A model's position on the spectrum of fable to algorithm is mostly given. There is not much we can do to push a single model along. Moreover, we have no systematic way of synthesizing different models - which would be another possibility to advance along the spectrum.

We have been mostly concerned with the type of output the modelling process generates. Let's also briefly turn to the inputs. Modelling by and large today is not that different compared to 50 years ago. Sure, co-authorships have increased, computers are used, and papers circulate online. But in the end, the modelling process is still a slow, labor-intensive craft and demands a lot from the modeller. He or she needs knowledge in the domain, must be familiar with the canon of models, needs judgment to balance off the tradeoffs involved in different models, etc.

This makes the modelling process costly. And it means we cannot brute force our way to push models from fable to algorithm. In fact, in the context of policy questions many economists like Dani Rodrik7 criticize the fact that discussions focus on a single model whereas a discussion would be more robust if it could be grounded in a collage of different models. But generating an adequate model is just very costly.8

Taken together, the nature of the model generating process as well as its cost function, are bottlenecks that we need to overcome if we want to transform the modelling process.

Let's go back to our (functional) programming domain to see an alternative paradigm. Here, we are also relying on libraries. But the process of using them is markedly different. Sure, one can just simply choose programs from a library an apply it. But one can also compose models and form new, more powerful programs. One can synthesize different programs; and one can find better abstractions through the patterns of multiple programs which do similar things. Lastly, one can refine a program by adding details. And of course, if you consider statistical modelling, this modularity is already present in many software packages.

It is modularity which gives computing scalability. And it is this missing modularity which severely limits the scalability of economic modelling.

Consider the startup pricing example I gave before. Say, I thought about using a pricing model to compute prices but I am lacking the demand information. What am I supposed to do? Right now, I am most likely forced to abandon the model altogether and choose a different framework instead.

What I would like to do instead is to have my model in a modular shape so that I could add a "demand" module and combine it with my pricing optimization - maybe a sampling procedure or even just a heuristic. The feature I want is that I have a coherent path from low to higher resolution.

The goal behind our research and engineering efforts is to lift economic modelling to this paradigm. Yet, we do not just want to compose software packages. We want an actual composition of economic models AND the software built on top.

How to get there? Compositionality!

Say, we want to turn the manual modelling process, which mostly relies on craft, experience and judgement, into a software engineering process. But not only that. We are aiming for a framework of synthesis in which formal mathematical models can be composed.

How should we go about this? This is totally unclear! Even more, the question does not even make sense. This is a bit like asking how do we multiply a story from Hemingway with a story by Marquez.9

Similarly, models in economics are independent and closed objects and generally do not compose. It is here where the "Cat" in CyberCat comes in. Category theory gives us a way to consider open systems and model them by default relative to an environment. It is this feature which allows us to even consider the composition of models - for instance the composition of game theoretic models we developed.

Another central feature that is enabled through category theory is the following paradigm:

model == code

That is, the formalism can be seamlessly translated back and forth between model and an actual (software) implementation. Thereby, instead of modelling on pen and paper, modelling itself becomes programming. It is important to note that we do not just want to translate mathematical models into simulations but code does actually symbolically represent mathematical statements.

To summarize, category theory gives us a formal language of composable economic models which can be directly implemented.

Equipped with this foundation, we can turn to the programming language design task to turn the modelling process into a process of software engineering.

Industrial mass customization of economic models

Modelling as programming enables the iterative refinement of models. Whereas in the traditional sense, models are not only closed but also dead wood (written on paper), under this paradigm models are more like living objects which can be (automatically) updated over time.

Instead of building a library of books, in our case the models are part of a software library. Which means the overall environment becomes way more powerful over time, as the ecosystem grows.

Composition also means division of labor. We can build models where parts are treated superficially at first but then details get filled in later. This can mean more complexity but most importantly means that we can build consistent models that are extended, refined, and updated over time.

These aspects resemble similar attempts in mathematics and the use of proof assistants and verification systems more generally. Here is Terence Tao on these efforts10:

One thing that changed is the development of standard math libraries. Lean, in particular, has this massive project called mathlib. All the basic theorems of undergraduate mathematics, such as calculus and topology, and so forth, have one by one been put in this library. So people have already put in the work to get from the axioms to a reasonably high level. And the dream is to actually get [the
libraries] to a graduate level of education. Then it will be much easier to formalize new fields [of mathematics]. There are also better ways to search because if you want to prove something, you have to be able to find the things that it already has confirmed to be true. So also the development of really smart search engines has been a major new development.

It also means different forms of collaboration between field experts and across traditional boundaries. Need a financial component in that traditional IO model? No problem, get a finance expert to write this part - a modern pin factory equivalent. See again Terence Tao11:

With formalization projects, what we’ve noticed is that you can collaborate with people who don’t understand the entire mathematics of the entire project, but they understand one tiny little piece. It’s like any modern device. No single person can build a computer on their own, mine all the metals and refine them, and then create the hardware and the software. We have all these specialists, and we have a big logistics supply chain, and eventually we can create a smartphone or whatever. Right now, in a mathematical collaboration, everyone has to know pretty much all the mathematics, and that is a stumbling block, as [Scholze] mentioned. But with these formalizations, it is possible to compartmentalize and contribute to a project only knowing a piece of it.

Lastly, the current developments of ML and AI favor the setup of our system. We can leverage the rapid development of ML and AI to improve the tooling on both ends of the pipeline: Users are supported in the modelling setup and solving or analyses of models becomes easier.

The common thread behind all of our efforts is to boost the modelling process. The traditional process is manual, slow, and limited by domain expertise - in other words very expensive.

Our goal is to turn manual work into mass customizable production.

Closing remarks

What I described so far is narrowly limited to economic modelling. Where is the "Cybernetics"?

First, I focused on the composability of economic models. But the principles of the categorical approach extend beyond this domain. This includes the understanding how apparently distinct approaches share commonality (e.g. game theory and learning) and how different structures can be composed (build game theoretic models on top of some underlying structure like networks). In short, we work towards a whole "theory stack".

Second, the software engineering process depicted above focuses very narrowly on extending the economic modelling process itself. But the same approach will mirror the theory stack with software enabling analyses along each level.

Third, once we are operating software, we open the ability towards leveraging other software to support the modelling process. This follows pragmatic needs and can range from data analytics to LLMs.

A general challenge to decision-making is the hyper-specialization of expert knowledge. But as decisions are more and more interconnected, what is lacking is the ability to synthesize this knowledge. Just consider the decision-making of governments during the Covid epidemic. For instance, in the decision to close schools, one cannot simply rely on a single group of domain experts (say physicians). One needs to synthesize the outcomes of different models following different methodologies from different domains. We want to develop frameworks in which these tradeoffs can be articulated.

Ariel Rubinstein. Economic fables. Open book publishers, 2012, p.16 ↩
I will focus on micro-economic models. They are simply closest to my home base and relevant for my daily work. ↩
The view on what economists do there is markedly different from Rubinstein's. Prominently Al Roth: The Economist as Engineer: Game\ Theory, Experimentation, and Computation as Tools for Design\ Economics. ↩
And probably most importantly, functions themselves can be input to other functions. ↩
Economics Rules: The Rights and Wrongs of The Dismal Science. New York: W.W. Norton; 2015 ↩
Of course, the classification of practical and non-practical is not exclusive to economics. Mathematics is full of examples of domains that are initially seen as without any practical use and then turned out to be important later on. ↩
Ibid. ↩
In addition, if the modelling falls to academics, then also their incentives kick in. The chances for publishing a model on a subject that has already been tackled by a prominent model can be very low - in particular in the case of a null-result. ↩
We might of course come up with a way how these two stories can be combined or compared. But this requires extra work; there is no operation to achieve this generically. These days we might ask an LLM to do so. And indeed this might be a useful direction for the future to support this process. ↩
Quoted from this\ interview ↩
Ibid. ↩Cross-posted from Oliver’s EconPatterns blog

There’s a non-zero chance that sometime in the not-so-far future we will think of the “Bayesian Revolution” in the same way we think of the “Marginal Revolution”.

Bayesian beliefs give us the same opportunity to think of expectations as an attribute linked to the observer rather than to the observed object in the same way utilities gave us an opportunity to accept that value is not an attribute intrinsic to an object, but exists only in the eye of the beholder.

Much of modern economics rests on the recognition that differences in valuation create opportunities for mutually beneficial — and thus voluntary — exchange.

We’re still a few steps away from translating that recognition to differences in expectations, in no small part because most of the current effort seems to go into trying to shoehorn Bayesian statistics into the domains dominated by trad statistics (aka frequentism), but one dimension on which we’re inching towards a better understanding of subjective probabilities is that it’s slowly dawning on us that there might be perfectly legitimate reasons why different individuals might attach different likelihoods to the same event — and even more, why their estimates might diverge over time.

The operative word being “legitimate” here, since if we start from the idea that whichever event we’re only partially observing was suddenly revealed to us in full, a persistent divergence in beliefs about this event surely means that at least one party must be dead wrong.

And even if we’re allowing that there might be multiple paths to the ultimate objective truth, if one party is wrong, it must surely be shown up in the future.

Economists love to use sports metaphors when teaching statistics, simply because sport is a realm where the outcome is recorded right at the conclusion of the contest, from a single authoritative source, without ambiguity.

As a teaching device this is perfectly understandable in that it takes away a distraction which seems peripheral to the topic, but ultimately every practitioner will run into the problem that such an occurrence of an unfailing, immediate, and impartial single source of truth is quite rare in the real world.

(Indeed once we’re getting into the nitty-gritty of how the outcomes of sports events are tallied, it’s quickly becoming obvious that the “unfailing, immediate, and impartial” umpire is mostly a figment of our imagination. From photo finishes to disqualifications, not to mention accusations of favoritism, the idea that the true value of a random variable can suddenly be disclosed with finality is a popular ploy in economics that doesn’t even have a realistic foil in sports. Economic theory might get away with such a foreshortening of reality, but economic design doesn’t.)

We can go one step further and claim that for most uncertain events, meaning all events that are only partially observable, there might not be an underlying true value, no ground truth, at all. We’re eternally in limbo about what the underlying “true value” is, simply because there is no moment of truth.

In most scenarios the truth typically remains elusive, at the end of a lengthy, costly, meandering, and conflicted discovery process, and we tend to swap in “truth” in the computer science meaning of “(single) source of truth”: that which is held to be true at any stage of the discovery process, as a proxy for the “ground truth”: that which would be held to be true at the end of a fully exhaustive discovery process.

In either form, whether there’s no unimpeachable source of truth or no underlying ground truth, once we make that leap to accepting that the truth is elusive, we suddenly gain access to a far richer world of behaviors, especially collective behaviors.

As a pattern, it means to start from the assumption that observed values are never perfectly true, true values are never perfectly observed, and truthfinding is an asymptotic reconciliation process between conflicting beliefs that’s not guaranteed to terminate in finite time.

Organizations as belief clusters

I have previously called organizations tectonic plates shaped around shared beliefs broken up by fault lines where beliefs — especially beliefs about the future and the likely outcomes of actions taken — are no longer reconcilable.

In this series I am going to be a bit more precise in working out the question how belief convergence or belief divergence shape the coordinated sequence of activities within and across organizations we call a value chain. In particular, how these concepts can be applied across domains: social, political, cultural, religious, with the economic realm just a special case.

One of the first lessons of organization theory is that organizations are paradoxes: they can, under best conditions, produce more collectively than the sum of individual efforts of all members, but this requires that individual efforts have to be constrained away from what the individuals would do if left to their own devices. In other words, to achieve an organizational goal (almost) everyone will have to compromise.

The other lesson for organizations is that they take two steps: first, the aggregation of efforts, and second, the disaggregation of gains. Both have to work in the eyes of the beholders making up the organization for it to work, or otherwise — especially in the case of efforts being exerted in the present but rewards only collected and distributed in the future.

If this doesn’t hold true: every participant holds the shared belief that individual contributions are being rewarded beyond what they could expect outside the organization, it will either disintegrate or has to be propped up by force.

Nothing up to this point has limited what type of organization we’re talking about here. Simply put, we’ll find this type of pattern in any kind of organization: economic, religious, or political, flat or hierarchical, although we expect the kinds of rewards to differ.

Along with the three underlying sources of goal conflict (moving forward vs staying together, moving forward vs staying put, moving in which direction), this gives us a grid to think about the sources of conflict between the individual and the group, and later between ingroup and outgroup, to think about why organizations are shaped the way they are (flat or hierarchical, open or closed…) and why which forms of organizations are prevalent in which environment: why firms are organized differently than political parties, sports clubs, or religious congregations.

Belief propagation and filter bubbles

So far the discussion has looked at shared beliefs from the vantage point of the (formally incorporated) organization. In my previous post, I looked at the question from the perspective of value chains. But what about starting with the individual?

Rather than asking how organizations fail due to a breakdown of shared beliefs, we might as well ask how organizations emerge as a consequence of shared beliefs.

Luckily, most people have heard of filter bubbles, the subgroups on social media channels created (and algorithmically amplified) by positive interaction with like-minded people, and negative interaction with other-minded people.

This is a fundamental economic and social mechanism (formalized in my dissertation before social media or filter bubbles were a thing), not only to socialize with like-minded people, especially to filter incoming information based on whether the sender holds shared beliefs on prior issues, but also to adjust the credibility we assign a sender based on how much the newly transmitted information matches our priors.

If you take this filtering-by-affinity mechanism and impose it on the network structure of the cybernetic economy, you get an information flow pattern known as a “gossip network”.

It is such an ubiquitous mechanism that we find it everywhere, on all levels of social and economic organization, but it also comes across as slightly unevolutionary. If we want to fully understand the hazards of our environment, we should take in all information from all sides and not just amplify the information that confirms what we think we already know. But we don’t. We filter — which makes sense — but we filter out what we don’t want to hear, not what we shouldn’t hear — which doesn’t make sense.

The interesting thing is that we know much more about these mechanisms now thanks to the internet, and especially thanks to recommender systems: Amazon uses them to guess what else we want to buy. Spotify offers new music to discover, Tinder believers matching social preferences translate into a good romantic match.

The reason why recommender systems emerged in the early days of the internet is also tightly connected to Amazon, and later to the resurgence of machine learning thanks to the Netflix prize: the recognition that there is an unsurveyably large number of choice alternatives, and the inevitable corollary that our notion of consumers having well-informed preferences between them — an axiom that undergirds modern microeconomics — is no longer tenable.

Recommender systems have become one of the fundamental economic design templates, and in the process have reshaped economics (and in the case of matchmaking platforms, also our social lives), not only because they provide the raw data for much more fine-tuned consumer choice, but also because they give us a deeper insight into the evolution of preferences.

But there is also a deeper insight which comes from the information good that is being transferred. Preferences are in our economic understanding unimpeachable. They express individual tastes and are as such not rankable by social desirability, as much as we might want to impose our own, undoubtedly (in our minds) more sophisticated, tastes on others, or construct an argument why some tastes are more in tune with a more or less well-defined social good than others.

On the other extreme of the spectrum are mutually agreed-upon ground truths, better known as facts. In-between these extremes preferences (no truth value can be assigned) and facts (undisputed common truth value aligned with ground truth) lies the wide world of counterfactuals: states of the world to which we assign (and, as new knowledge emerges, adjust) truth values between zero and one. This world of counterfactuals inevitably involves the future.

This is a pattern we find everywhere, not only in the economic realm and its various subrealms like entrepreneurship (high individual belief in the success of a venture opportunity countering widespread low beliefs), but also in the political and social realms. We can connect this to Thomas Kuhn’s model of scientific inquiry and Ludwik Fleck’s harmony of illusions, or to the geographic expansion of religion, language, pottery design, agriculture, or any idea based on shared counterfactuals.

The search pattern: avoidance of negative surprise, I’ve already discussed in a previous newsletter. The canonical conflict resolution mechanisms: markets or hierarchies in the economic realm, wars or elections in the political realm, are typically tied to the fundamental pattern of exchange in that realm, as I will discuss in a future newsletter.tl;dr - Advanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.

For many years, AI has been routinely used for economic decision making. Two major roles it has traditionally played are high frequency trading and algorithmic pricing. Traditionally these are quite simple, at the level of tabular Q-learning agents. Even these comparatively simple algorithms can behave in unexpected ways due to emergent interactions in an economic environment. Probably the most infamous of these events was the flash crash, for which algorithmic high speed trading was a major contributing cause. Much less well known is the subtle issue of implicit collusion in pricing algorithms, which are ubiquitous in several markets such as airline tickets and Amazon: a widely 2020 cited paper found that even very simple tabular Q-learning will converge to prices higher than the Nash equilibrium price - but our research found that this depends sensitively on the exact method of training, and the effect vanishes when the algorithms are trained independenly in simulated markets.

Besides markets, AI is also already used for making decisions in supply chains (see for example [ 1 2 3 4]), and surely will be moreso in the future. Contemporary supply chains are extraordinarily complex. A typical modern technology product can have hundred of thousands of components sourced from ten thousand suppliers across half a dozen tiers which need to be shipped across the globe to the final assembly. A single five-dollar part can stop an assembly line, which in the case of industries like automotive can cost millions per hour of downtime. The worst type of inventory a company can carry is a 99.9% finished product it cannot sell. Over time, supply chains have been hyper-optimised at the expense of integrity, so that a metaphorical perfect storm in the shape of an Icelandic volcano named Eyjafjallajökull erupting or a container ship named Ever Given getting stuck in the Suez Canal caused massive disruption that inevitably leads to delayed goods, spoiled perishables, lawsuits and contested insurance claims easily in the ten digits. The COVID-19 pandemic was a business school case for all the types of havoc supply chain disruptions can wreak, oscillating wildly from not enough containers to too many containers in port, obstructing the handling of cargo, from COVID-related work shutdowns in China to sudden shifts in consumer behavior in Western countries, leading to layoffs in hospitality industries and labour shortages in production and transportation. Beyond these knock-on effects that can explode planning horizons for procurement and shift the delicate power balance from buyer to supplier, another major problem in supply chain is the knock-off effect: fashion brands and pharmaceutical companies alike fight the problem of counterfeit products being introduced into the supply chain when no one is looking, leading to multi-million dollar losses along with the reputational damage, and, especially in pharmaceuticals, posing a hazard to health and life for many. Supply chain integrity crucially on transparency across a multitude of participants who are typically less than eager to share confidential data.

Moving fowards from these events, the delicate tredeoff between efficiency and integrity is a perfect use-case for the integrated and inter-connected decision-making that is afforded by AI.

This brings us to the issue of economic decisions being deferred to large language models such as GPT4. The well known examples are not “natively economic”, but many people are adapting transformer architectures to operate on various types of data besides linguistic data, and it is only a matter of time before there are “economics LLMs”. In the meantime, GPT is entirely capable of making economic decisions with the right prompting - although virtually nothing is known about its performance on these type of tasks. We do not recommend using GPT to make investment decisions for you, but we expect it to become widespread anyway, if it isn’t already. Similarly, we expect large parts of complex supply chains to be almost entirely deferred to AI, extending the existing automation and its associated benefits and risks.

AI undermines individuality in economics

The traditional (tabular Q-learning) and contemporary (LLMs) situations are very different in many ways, but they have a subtle and crucial point in common. This is that decisions that look independent are secretly connected. There are two ways this could happen: one is that human decision-makers defer to off-the-shelf software that comes from the same upstream supplier - as is the case for algorithmic pricing in the airline industry for example. The other is that there really is a single instance of the AI system in the world and everybody is calling into it - as is the case with GPT.

For off-the-shelf implementations of tabular Q-learning for algorithmic pricing, there is some evidence that having a single upstream supplier has a significant impact on the behaviour of the market, and this is something that regulators are actively investigating. For LLMs virtually nothing is known, but we expect that the situation is worse. At the very least, the situation will certainly be more unpredictable, and we expect the compounding of implicit biases to be worse as these systems become ubiquitous and deeply embedded into decision-making. We plan to research this, by building economic simulations where decisions are made by advanced AIs and studying their behaviour.

A further possibility is more hypothetical, but we expect it to become a reality within the next few years. Right now the technology behind large language models - generative transformers - mainly operates on textual data, but it is actively being adapted for other types of data, and for other tasks besides text generation. Making economic decisions is very similar to playing games, and so there is an obvious analogy to the wildly successful application of deep reinforcement learning to strategically complex game playing tasks such as Go and StarCraft 2 by DeepMind. Combining this with generative transformer architectures could be immensely powerful, and it is not hard to believe such a system could surpass human performance on the task of economic decision-making.

Modelling for harm prevention

Compositional game theory - a technology that we developed and implemented - is currently the state of the art for implementing complex meso-scale microeconomic models. The way things are traditionally done, models are written first in mathematics and are later converted into computational models in general purpose languages (traditionally Fortran, but increasingly in modern languages such as Python), a process that is very slow and very prone to introducing hard-to-detect errors. We use a model is code paradigm, where both the mathematical and computational languages are modified to bring them very close to each other - most commonly we build our models directly in code, with a clean separation of concerns between the economic and computational parts. Our models are not inherently more accurate, but they are 2 orders of magnitude faster and cheaper to build, and this unlocks our secret weapon: rapid prototyping models. By iterating quickly, and continuously obtaining feedback from data and stakeholders, we reach a better model than could be built monolithically.

Why do we want to build these models? The bigger picture is, we want to inform the discussion about regulation of AI. This discussion is already widespread at the highest level of governments around the world, but is currently heavily lacking in evidence one way or the other. There’s a good reason for this: the domain of LLMs is language, and it is extremely difficult to make convincing predictions about the possible harms that can happen mediated by linguistic communication. More restricted domains, such as the behaviours of API bots, are easier to reason about. We have identified the general realm of economic decision-making as a critically under-explored part of the general AI safety question, which our tools are well-placed to explore through modelling and simulations.

Our implementation of compositional game theory allows modularly switching the algorithm that each player uses for making decisions. Normally when doing applied game theory we use a monte carlo optimiser for every player. But we also have a version that calls a Python implementation of Q-learning over a web socket. We could also easily switch it to calls to an open source LLM, or API calls to a GPT API bot or similar.

What’s more, this is emphatically not a mere hack that we bolt on top of game theory. At the core of our whole approach is our discovery, as seen in this paper, that the foundations of compositional game theory and several branches of machine learning are extremely closely related - this foundation is what we call categorical cybernetics. This foundation is what guides us and tells us that what are are doing is really meaningful. More than that, though, it opens a realistic possibility that we can know qualitative things about the behaviour of AIs making economic decisions, a much higher level of confidence than making inferences from simulation results. And when it comes to informing the discussion on regulation when the stakes are as high as they are, more certainty is always better.

What if?

So far we have focussed on the likely negative accidental impacts AI is likely to have on markets and supply chains, where they perform their intended purpose locally but interact in unforeseen ways. This is already concerning, but there is another side to the issue. What if decisions that should be independent are made by a single AI that has “gone rogue”, i.e. has a goal that is not the intended one? Depending on your personal assessment of the likelihood of this situation you could read this section as a fun thought experiment or a warning.

Being handed direct control of markets and supply chains gives perhaps the most powerful leverage over the physical world that an AI could have. Since it can collude with itself, it can easily create behaviours that would never be possible when decisions are made by agents that are independent and at least somewhat rational.

By far the most straightforward outcome of this situation is chaos. Markets and supply chains are so deeply interconnected that it would take very little intentional damage to create a recession deep enough to bring society to its knees. However, by virtually destroying the institutions that it controls this makes it a one-time event, which while extremely bad, would be easily recoverable for humanity as a whole.

Much worse would be the ability of a rogue AI to subtly direct real-world resources towards a secret goal of its own over a long period of time. It isn’t a hypothetical that complex supply chains can easily hide parts of themselves: consider how widespread is modern slavery in the supply chains of consumer electronics, or how the US government secretly procured the resources needed to build the first nuclear weapons at a time when supply chains were much simpler.

Conclusion

It is arguable exactly how extensive are the risks associated to allowing AIs to interact with economic systems, with the scenarios described in the previous section being hypothetical. However, it is undeniable that some serious risks do exist, including already-observed events such as flash crashes and implicit collusion. We have identified that the specific factor of decision-makers using the same upstream provider of decision-making software leads to poorly-understood emergent behaviours of supply chains and markets.

Our theoretical framework, compositional game theory, and our implementation of it, the open game engine, are the perfect tools for building and simulating models of economic situations with AI decision-makers. The goal of creating these models is to produce evidence leading to a better-informed debate on issues around the regulation of AI.It’s been a busy few weeks in the world of category theory for deep learning. First of all come the preprint Categorical Deep Learning: An Algebraic Theory of Architectures from authors at Symbolica and DeepMind, including our friend Bruno. And then hot on the heels of the paper, Symbolica raised a big investment round based largely on applications of the ideas in the paper.

The paper is about structured learning and it proposes a big generalisation of geometric deep learning, which is itself a big generalisation of convolutional networks. The general idea is that the data processed by a neural network is not just random data but is the vectorisation of data coming from some real world domain. If your vectors encode an image then there is implicit geometry inherited from the physical world. Geometric deep learning is all about designing architectures that encode geometric invariants of data, specifically in the form of invariant group actions a la Klein’s Erlangenprogramm.

What the paper points out is that the whole of geometric deep learning can be massively generalised from group actions to arbitrary (co)algebras of functors and (co)monads. From there you can easily re-specialise for specific applications. For example, if your training data is vectorisation of source code of a programming language, you can encode the structure of that language’s source grammar into your architecture in a virtually mechanical way.

Suffice to say, I’m very excited about this idea. This could be a watershed moment for applied category theory in general, and it happens to be something that’s right next door to us - the paper heavily uses categories of parametrised morphisms, one of the two building blocks of categorical cybernetics.

Invariant preferences

The first thought I had when I read the paper was invariant preferences. A real AI system is not something that exists in isolation but is something that interacts in some way with the world around it. Even if it’s not a direct “intentional” action such as a robot actuator, the information flow from the AI to the outside world is some kind of action, making the AI an agent. For example, ChatGPT is an agent that acts by responding to user prompts.

Intelligent agents who act can have preferences, the most fundamental structure of decision theory and perhaps also microeconomics. In full generality, “having preferences” means selecting actions in order to bring about certain states of the world and avoid others. Philosophical intention is not strictly required: preferences could have been imposed by the system’s designer or user, one extreme case being a thermostat. AI systems that act on an external world are the general topic of reinforcement learning (although some definitions of RL are too strict for our purposes here).

This gave me a future vision of AI safety where neural network architectures have been designed upfront to statically guarantee (ie. in a way that can be mathematically proven) that the learned system will act in a way that conforms to preferences chosen by the system designer. This is in contrast to, and in practice complements, most approaches to AI safety that involve supervision, interpretation, or “dynamic constraint” of a deployed system - making it the very first line of an overall defense in depth strategy.

A system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, no matter what it learns. A lot of people have already put a lot of thought into the issue of “good and bad world-states”, including very gnarly issues of how to agree on what they should be - what I’m proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.

This post is essentially a pitch for this research project, which as of right now we don’t have funding to do. We would have to begin with a deep study of the relationship between preference (the thing that actions optimise) and loss (the thing that machine learning optimises). This is a crossover that already exists: for example in the connection between softmax and Boltzmann distributions, where thermodynamics and entropy enter the picture uninvited yet again. But going forward I expect that categorical cybernetics, which has already built multiple new bridges between all of the involved fields (see this picture that I sketched a year ago), is going to have a lot to say about this, and we’re going to listen carefully to it.

There’s a few category-theoretic things I already have to say, but this post isn’t the best place for it. To give a hint: I suspect that preferences should be coalgebraic rather than algebraic according to the structural invariant learning machinery, because they describe the output of a neural network, as opposed to things like geometric invariant which describe the input.

World-models

The thing that will stop this being easy is that in a world of incomplete information, such as the real world, agents with preferences can only act with respect to their internal model of the outside world. If we’re relying on invariant preferences for safety, they can only be as safe as the agent’s internal model is accurate. We would also have to worry about things like the agent systematically deceiving itself for long-term gain, as many humans do. The good news is that practitioners of RL have spent a long time working on the exact issue of accurately learning world-models, the first step being off-policy algorithms that decouple exploration (ie. world-model learning) from exploitation (ie. optimisation of rewards).

There is also an alternative possibility of manually imposing a human-engineered world-model rather than allowing the agent to learn it. This would be an absolutely monumental task of industrial-scale ontology, but it’s a big part of what Davidad’s project at the UK’s new ARIA agency aims to do. Personally I’m more bullish on learning world-models by provably-accurate RL at the required scale, but your mileage may vary, and in any case invariant preferences will be needed either way.

To wrap up: this is a project we’re thinking about and pursuing funding to actively work on. The “Algebraic Theory of Architecture” paper only dropped a few weeks ago as I’m writing this and opens up a whole world of new possibilities, of which invariant preferences is only one, and we want to strike while the iron is still hot.# Archive of category 'functional programming'

Apr 15, 2024

•

machine learning,

categorical cybernetics,

functional programming

Building a Neural Network from First Principles using Free Categories and Para(Optic)# Archive of category 'reinforcement learning'

May 29, 2024

•

Reinforcement Learning through the Lens of Categorical Cybernetics

Mar 18, 2024

•

deep learning,

AI safety

Learning with Invariant Preferences

Feb 6, 2024

•

Passive Inference is Compositional, Active Inference is Emergent# Archive of category 'open games'

Apr 22, 2024

•

The Build Your Own Open Games Engine Bootcamp — Part I: Lenses# Archive of category 'AI safety'

Mar 18, 2024

•

deep learning,

AI safety

Learning with Invariant Preferences

Jan 16, 2024

•

category theory

How to Stay Locally Safe in a Global World

Dec 11, 2023

•

AI Safety Meets Value Chain Integrity# Institute for Categorical Cybernetics

Governance and control for the age of AI

Our mission is to develop theory and software for governing systems that learn and make decisions, for the benefit of their users and of humanity.

Latest posts

Dependent lenses are useful for general-purpose programming, but in which way exactly? This post demonstrates the use of dependent lenses as input/output-conversion processes, using parsing and error location reporting as a driving example.
In which we discuss how knowledge travels thru the economy, and how, when and where it forms clusters.
In Towards Foundations of Categorical Cybernetics we built a category whose objects are selection functions and whose morphisms are lenses. It was a key step in how we justified open games in that paper: they're just parametrised lenses weighted by selection functions. In this post I'll show that by adding dependent types and stirring, we can get a nicer category that does the same job but has all colimits, and comes extremely close to having all limits. Fair warning: this post assumes quite a bit of category-theoretic background.
In which we describe organization and organizations as tectonic plates shaped by clashing beliefs.
A system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, no matter what it learns. A lot of people have already put a lot of thought into the issue of good and bad world-states, including very gnarly issues of how to agree on what they should be - what I'm proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.
In which we establish an underlying model for human behavior and claim that all economies are just a variation of the attention economy.
An Economic Pattern Language (@econpatterns for short) takes the economy and disassembles it into its constituent parts. But first, this blog post describes the economy as a whole.
In this post I'll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we've been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I'll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.
This post is a writeup of a talk I gave at the Causal Cognition in Humans and Machines workshop in Oxford, about some work in progress I have with Toby Smithe. To a large extent this is my take on the theoretical work in Toby's PhD thesis, with the emphasis shifted from category theory and neuroscience to numerical computation and AI. In the last section I will outline my proposal for how to build AGI.
Suppose your name is x and you have a very important state machine that you cherish with all your heart. Because you love this state machine so much, you don't want it to malfunction and you have a subset which you consider to be safe. If your state machine ever leaves this safe space you are in big trouble so you ask the following question.
Advanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.
This is a short summary of the post. It is meant to explain how to write for our blog.
Some time ago, in a previous blog post, we introduced our software engine for game theoretic modelling. In this post, we expand more on how to apply the engine to use cases relevant for the Ethereum ecosystem. We will consider an analysis of a simplified staking protocol. Our focus will be on compositionality – what this means from the perspective of representing protocols and from the perspective of analyzing protocols.
Categorical cybernetics, or CyberCat to its friends, is – no surprise – the application of methods of (applied) category theory to cybernetics. The "category theory" part is clear enough, but the term "cybernetics" is notoriously fluid, and throughout history has meant more or less whatever the writer wanted it to mean. So, let’s lay down some boundaries.

Newer »The best way to contact us is to email Jules or one of the other directors directly.Processing math: 3%

In this post I’ll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we’ve been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I’ll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.

This post is strongly inspired by the account of Moore machines in David Jaz Myers’ book Categorical Systems Theory, and Matteo’s enthusiasm for it. There’s probably a big connection to things like Delayed trace categories, but I don’t understand it yet.

The diagrams in this post are made with Quiver and Tangle.

The iteration functor

For the purposes of this post, we’ll be working with a symmetric monoidal category C, and the category Optic(C) of monoidal optics over it. Objects of Optic(C) are pairs of objects of C, and morphisms are given by the coend formula

\mathbf{Optic} (\mathcal C) \left( \binom{X}{X'}, \binom{Y}{Y'} \right) = \int_{M : \mathcal C} \mathcal C (X, M \otimes Y) \times \mathcal C (M \otimes Y', X')

which amounts to saying that an optic \binom{X}{X’} \to \binom{Y}{Y’} is an equivalence class of triples

(M : \mathcal C, f : X \to M \otimes Y, f' : M \otimes Y' \to X')

I’m pretty sure everything in this post works for other categories of bidirectional processes such as mixed optics and dependent lenses, this is just a setting to write it down which is both convenient and not at all obvious.

The iteration functor is a functor \mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set} defined on objects by

\mathrm{Iter} \binom{X}{X'} = \int_{M : \mathcal C} \mathcal C (I, M \otimes X) \times \mathcal C (M \otimes X', M \otimes X)

We refer to elements of \mathrm{Iter} \binom{X}{X’} as iteration data for \binom{X}{X’}. We call the object M the state space, the morphism x_0 : I \to M \otimes X the initial state and the morphism i : M \otimes X’ \to M \otimes X the iterator.

Note that in the common case that \mathcal C is cartesian monoidal, we can eliminate the coend to obtain a simpler characterisation:

\mathrm{Iter} \binom{X}{X'} = \mathcal C (1, X) \times \mathcal C (X', X)

Given an optic f : \binom{X}{X’} \to \binom{Y}{Y’} given by f = (N, f : X \to N \otimes Y, f’ : N \otimes Y’ \to X’), we get a function

\mathrm{Iter} (f) : \mathrm{Iter} \binom{X}{X'} \to \mathrm{Iter} \binom{Y}{Y'}

Namely, the state space is M \otimes N, the initial state is

I \overset{x_0}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y

and the iterator is

M \otimes N \otimes Y' \xrightarrow{M \otimes f'} M \otimes X' \overset{i}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y

This is evidently functorial. Funnily enough, although the action of \mathrm{Iter} on objects when \mathcal C is cartesian is easier to understand, its action on morphisms is less obvious and is not evidently functorial, instead demanding a small proof.

Pairing iterators and continuations

We have an existing functor K : \mathbf{Optic} (\mathcal C)^{\mathrm{op}} \to \mathbf{Set}, given on objects by K \binom{X}{X’} = \mathcal C (X, X’). This is the continuation functor, and it is the contravariant functor represented by the monoidal unit \binom{I}{I}. (This functor first appeared in Morphisms of Open Games.)

For the remainder of this section I’ll specialise to the case \mathcal C = \mathbf{Set}, in which case an optic \binom{X}{X’} \to \binom{Y}{Y’} is determined by a pair of functions f : X \to Y and f’ : X \times Y’ \to X’, and iteration data i : \mathrm{Iter} \binom{X}{X’} is determined by an initial value x_0 : X and a function i : X’ \to X.

Given iteration data and a continuation that agree on their common boundary, we know enough to run the iteration and produce an infinite stream of values:

\left< - | - \right> : \mathrm{Iter} \binom{X}{X'} \times K \binom{X}{X'} \to X^\omega

Namely, this stream is defined corecursively by

\left< x_0, i | k \right> = x_0 : \left< i (k (x_0)), i | k \right>

This operation is natural (technically, dinatural): for any iteration data i : \mathrm{Iter} \binom{X}{X’}, optic f : \binom{X}{X’} \to \binom{Y}{Y’} and continuation k : K \binom{Y}{Y’}, we have

\left< i | K (f) (k) \right> = f^\omega \left( \left< \mathrm{Iter} (f) (i) | k \right> \right)

where f^\omega (-) : X^\omega \to Y^\omega means applying the forwards pass of f to every element of the stream. As a commuting diagram,

Here’s a tiny implementation of the iteration functor and the pairing operator in Haskell:

data Iterator s t = Iterator {
    initialState :: s,
    updateState :: t -> s
}

mapIterator :: Lens s t a b -> Iterator s t -> Iterator a b
mapIterator l (Iterator s f) = Iterator (s ^# l) (\b -> (f (s & l .~ b)) ^# l)

runIterator :: Iterator s t -> Lens s t () () -> [s]
runIterator (Iterator s f) l = s : runIterator (Iterator (f (s & l .~ ())) f ) l

The category of elements of Iterator

The next step is to form the category of elements \int \mathrm{Iter}, also known as the discrete Grothendieck construction. This is a category whose objects are tuples \left( \binom{X}{X’}, i \right) of an object \binom{X}{X’} of \mathbf{Optic} (\mathcal C) and a choice of iteration data i : \mathrm{Iter} \binom{X}{X’}. A morphism \left( \binom{X}{X’}, i \right) \to \left( \binom{Y}{Y’}, j \right) is an optic f : \binom{X}{X’} \to \binom{Y}{Y’} with the property that \mathrm{Iter} (f) (i) = j, that is to say, the iteration data on the left and right boundary have to agree.

The functor \mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set} is lax monoidal: there is an evident natural way to combine pairs of iteration data into iteration data for pairs:

\nabla : \mathrm{Iter} \binom{X}{X'} \times \mathrm{Iter} \binom{Y}{Y'} \to \mathrm{Iter} \binom{X \otimes Y}{X' \otimes Y'}

This means that the tensor product of \mathbf{Optic} (\mathcal C) lifts to \int \mathrm{Iter}, by

\left( \binom{X}{X'}, i \right) \otimes \left( \binom{Y}{Y'}, j \right) = \left( \binom{X \otimes Y}{X' \otimes Y'}, i \nabla j \right)

The category \int \mathrm{Iter} can essentially already describe iteration with optics, although in a slightly awkward way. Suppose we draw a string diagram that not coincidentally resembles a control loop:

Here, f and f’ denote some morphisms f : X \to Y and f’ : Y \to X in our underlying category, and x_0 represents an initial state x_0 : I \to X.

Normally string diagrams denote morphisms of a monoidal category, but we make a cut just to the right of the backwards-to-forwards turning point, and consider that everything left of that is describing a boundary object. Namely in this case, we have the object \left( \binom{X}{X}, i \right) where the iteration data i : \mathrm{Iter} \binom{X}{X} is given by the state space I, the initial state x_0 : I \to I \otimes X and the iterator \mathrm{id} : I \otimes X \to I \otimes X.

The remainder of the string diagram to the right of the cut denotes an ordinary optic f : \binom{X}{X} \to \binom{I}{I}, namely the one given by f = (Y, f, f’), with forwards pass f : X \to Y \otimes I and backwards pass f’ : Y \otimes I \to X. This boils down to describing the composite morphism f; f’ : X \to X.

Overall, we can read this diagram as denoting a morphism f in \int \mathrm{Iter} of type f : \left( \binom{X}{X}, i \right) \to \left( \binom{I}{I}, \mathrm{Iter} (f) (i) \right). The iteration data on the right boundary is \mathrm{Iter} (f) (i) : \mathrm{Iter} \binom{I}{I}, which concretely has state space Y, the initial state x_0; f : I \to Y and iterator f’; f : Y \to Y.

This works in principle, but splitting the diagram between denoting an object and denoting a morphism is very non-standard. So far, this amounts to doing for the iteration functor what we did for the selection functions functor in section 6 of Towards Foundations of Categorical Cybernetics.

The full theory of iteration

Now we take the final step to fix the slight clunkiness of using \int \mathrm{Iter} as a model of iteration. This continues the firmly established pattern that categorical cybernetics contains only two ideas that get combined in more and more intricate ways: optics and parametrisation.

There is a strong monoidal functor \pi : \int \mathrm{Iter} \to \mathbf{Optic} (\mathcal C) that forgets the iteration data, namely the discrete fibration \pi \left( \binom{X}{X’}, i \right) = \binom{X}{X’}. This functor generates an action of the monoidal category \int \mathrm{Iter} on \mathbf{Optic} (\mathcal C), namely

\left( \binom{X}{X'}, i \right) \bullet \binom{Y}{Y'} = \binom{X \otimes Y}{X' \otimes Y'}

See section 5.5 of Actegories for the Working Amthematician for far too much information about actegories of this form.

We now take the category \mathbf{Para}_{\int \mathrm{Iter}} (\mathbf{Optic} (\mathcal C)) of parametrised morphisms generated by this action. We also refer to this kind of thing (parametrisation for the action generated by a discrete fibration) as the Para construction weighted by \mathrm{Iter}, \mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C)) - the name comes from it being a kind of weighted limit and I think the reference for this is Bruno’s PhD thesis, which is sadly unreleased as I’m writing this.

Working things through: an object of \mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C)) is still a pair \binom{X}{X’}, but a morphism \binom{X}{X’} \to \binom{Y}{Y’} consists of three things: another pair of objects \binom{Z}{Z’}, iteration data i : \mathrm{Iter} \binom{Z}{Z’}, and an optic \binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}.

Now suppose we have a diagram of an open control loop, that is to say, a control loop that is open-as-in-systems (not to be confused with an open loop controller, which is unrelated):

Here the primitive morphisms in the diagram are f : A \otimes X \to B \otimes Y, f’ : B’ \otimes Y \to A’ \otimes X, and an initial state x_0 : I \to X. The idea is that f is the forwards pass, f’ is the backwards pass, and after the backwards pass comes another forwards pass but one time step in the future.

To make formal sense of this diagram, we imagine that we deform the backwards-to-forwards bend upwards, treating the state as a parameter, and then cut the diagram as we did before:

Now we can read this off as a morphism \binom{X}{X’} \to \binom{Y}{Y’} in \mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C)). The (weighted) Para construction makes everything go smoothly, so this is an entirely standard string diagram with no funny stuff.

Technically categories of parametrised morphisms are always bicategories (or better, double categories), and I think this is a rare case where we actually want to quotient out all morphisms in the vertical direction, i.e. identify \left( f : \binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}, i : \mathrm{Iter} \binom{Z}{Z’} \right) with \left( g : \binom{X \otimes W}{X’ \otimes W’} \to \binom{Y}{Y’}, j : \mathrm{Iter} \binom{W}{W’} \right) whenever there is any optic h : \binom{Z}{Z’} \to \binom{W}{W’} making \mathrm{Iter} (h) (i) = j and commuting with f and g. Coming back to our earlier picture of cutting a string diagram, this exactly says that we identify all of the different ways we could make the cut. In order to do this we change the base of enrichment along the functor \pi_0 : \mathbf{Cat} \to \mathbf{Set} taking each category to its set of connected components.

One final note: Almost everything in this post used nothing but the fact that \mathrm{Iter} is a lax monoidal functor \mathbf{Optic} (\mathcal C) \to \mathbf{Set}. With minimal translation, I think the entire thing works as a story about “forcing states in a symmetric monoidal category”: given any symmetric monoidal category \mathcal C and a lax monoidal functor F : \mathcal C \to \mathbf{Set}, the category \mathbf{Para}^F (\mathcal C) is equivalently described as \mathcal C freely extended with a morphism x : I \to X for every x : F (X). I’ll leave this as a conjecture for somebody else to prove.# Institute for Categorical Cybernetics

Governance and control for the age of AI

Our mission is to develop theory and software for governing systems that learn and make decisions, for the benefit of their users and of humanity.

Latest posts

Recently we held a workshop in Edinburgh titled Mathematics for Governance Design, consisting of a roughly 50/50 split between social scientists and category theorists.
In which we connect the physics Nobel Prize to machine learning and economic design.
In this post we will make probably the single most important step from a generic type theory to one specialised to bidirecional programming.
In this post we'll begin designing a kernel language in which all programs are optics. What I mean by a "kernel language" is that it will serve as a compiler intermediate representation, with a surface language compiling down to it. I intend the surface language to be imperative style like the current Open Game Engine (with an approximately Python-like syntax), but the kernel language will reflect the category theory as closely as possible. I plan the kernel language to be well typed by construction, something that seems like overkill until I think about the problem of figuring out how pattern matching should work in a bidirectional language.
In which we learn why "flat earth" is a perfectly sound scientific proposition and why being wrong two thirds of the time can actually be quite lucrative.
This is the first post in a new series documenting my work developing a bidirectional programming language, in which all programs are interpreted as optics. This is something I've been thinking about for a long time, and eventually I became convinced that there were enough subtle issues that I should take things extremely slowly and actually learn some programming language theory. As a result, this post will not be about categorical cybernetics at all, but is a foundation to a huge tower of categorical cybernetics machinery that I will build later.
In which we try to capture all the ways how beliefs can shape social and economic interaction.
Are economic models useful for making decisions? One might expect that there is clear answer to this simple question. But in fact opinions on the usefulness or non-usefulness of models as well as what exactly makes models useful vary widely. In this post, I want to explore the question of usefulness. Even more so, I want to explore how the usefulness ties into the modelling process. The reason for doing so is simple: Part of our efforts at CyberCat is to build software tools to improve and accelerate the modelling process.
Suppose we have some category, whose morphisms are some kind of processes or systems that we care about. We would like to be able to talk about contexts (or environments) in which these processes or systems can be located.
This is an overview of the 'RL lens', a construction that we recently introduced to understand some reinforcement learning algorithms like Q-learning
In which we bring back together the estranged fraternal disciplines of economics and operations research and map out how we can combine them to design cybernetic economies.
I explore the effect of players following their best response dynamics in large random normal form games.
The first installment of a multi-part series demistifying the underlying mechanics of the open games engine in a simple manner.
In this post we will look at how dependent types can allow us to effortlessly implement the category theory of machine learning directly, opening up a path to new generalisations.
I'm going to record something that I think is known to everyone doing research on categorical cybernetics, but I don't think has been written down somewhere: an even more general version of mixed optics that replaces the backwards actegory with an enrichment. With it, I'll make sense of a curious definition appearing in The Compiler Forest.

« OlderCross-posted from Oliver’s EconPatterns blog

Despite its bad reputation, “flat earth” is a perfectly scientific, mathematically grounded, and highly useful model of reality.

Indeed, it might be the perfect example to illustrate George Box’s famous aphorism about how all models are wrong, but some are useful.

We confirm its usefulness every time we open a map, on paper or on a screen, and manage to get from A to B thanks to its guidance, nevermind that it (and we) completely ignored the curvature of the earth all along the way.

Of course we could’ve consulted a globe, but for most everyday pairings of A and B, a travel-sized globe won’t help us much in navigating the route, and a globe that’s big enough to provide us with sufficient detail about our particular route would be much too big to carry around.

Indeed, if we push this forward, of the hierarchy of simplifying abstractions:

earth is flat
earth is spherical (a ball)
earth is a rotational oblate ellipsoid (a pill)
earth is a rotational oblate ellipsoid with a variance in surface elevation never exceeding 0.2% of its diameter. the last one — hills and valleys — matters much more in everyday life than the knowledge that earth is spherical or even an ellipsoid.

That doesn’t mean it’s entirely useless knowledge. Nobelist Ken Arrow’s very first academic publication, about optimizing flight paths just after World War 2, pushed the envelope, so to speak, from a planar world to a spherical world.

“All the literature assumed that the world was flat, that everything was on a plane, which may be germane if you’re flying a hundred miles. But we were already flying planes across the Atlantic, from Newfoundland to Scotland. It turned out to be an interesting mathematical problem to change these results to be applicable to the sphere — and that was my contribution.” — Kenneth Arrow, 1995.

Indeed today, Arrow’s contribution is used in any long-distance flight planning software, and its effect is visible every time we fly from Frankfurt to New York and are surprised looking out the window to find ourselves above Greenland.

But we shouldn’t be led into thinking that before Arrow scientists believed that the earth is flat. They just recognized that for their task it didn’t matter that it wasn’t, and at a time when “computers” were still human professionals rather than electronic devices, simplifying the calculations mattered.

Unobservables and counterfactuals

One reason why “flat earth” is such a great example for proper modeling is that it gets the point about scope across, simply because modeling scope matches geographic scope: for short hops, flat earth is perfectly fine, but for transatlantic flights, you’re bound to run into trouble. Somewhere inbetween is a fuzzy boundary where the simple model gradually fails and complication becomes necessary.

Another reason is that it’s a perfect little ploy to expose a particular type of academic conceit, simply because it goes against the Pavlovian reflex by certain academics to roll out “flat earth” as the synecdoche for conclusively disproven pseudoscience.

But there is a critical difference between claiming earth is flat (an empirical hypothesis without support) and proposing a deliberate counterfactual of a flat earth (a modeling design choice), and it strikes at the heart of George Box’s aphorism, which we could amend to say that models are useful because they are wrong.

A map is useful because it’s not the territory.

This distinction is crucial, because so many people, including and maybe especially academics get it wrong: a counterfactual is not the same as an unobservable.

Unobservables are hidden truths. Counterfactuals are openly expressed falsehoods.

In model design, counterfactuals — things we invoke even if we know they’re wrong — play an important role in the form of assumptions, which is why making assumptions explicit is a crucial but oft-ignored exercise in model design.

From flat to round

Graduate students in econometrics often get the advice (or at least used to) to pick up Peter Kennedy’s A Guide to Econometrics in addition to whichever weighty and forbidding tome is assigned as the official textbook (typically Greene’s doorstop), with the undertone that Kennedy’s book might provide a cheat code to unlock the arcane secrets encloistered in its more voluminous peer.

Kennedy succeeds in doing this not because he dilutes the mathematical heft of the official textbooks, but because he offers a very succinct exposé of how we should approach ordinary least squares (OLS), the workhorse of econometric modeling. Step by step:

This is the original OLS setup
Here are the five major assumptions that undergird it
Because of these assumptions, the scope of OLS in its original form is quite limited
But we can, one by one, test if each assumption is violated and implement a fix
And if everything fails, here are a few alternatives…

This is so lucid that it’s surprising (and somewhat disheartening) to see it rarely ever expressed in this succinct — and compositional — form.

Assumptions are counterfactuals that limit the scope, and to expand the scope we have to investigate and potentially drop some of the assumptions, but this always involves a tradeoff between parsimony and universality.

Models are by design complexity-reducing conceits. But for this to be successful the modeler has to be willing to start by ruthlessly reducing complexity to expose the underlying mechanism, and academia isn’t always an environment where Occam’s razor is sharp.

Stereotypically speaking, academia is incentivized to err on the side of giving convoluted answers, including giving convoluted answers to simple questions, or even the worst of all words: convoluted wrong answers to simple questions.

Pretty much everyone in academia who laughed about large language models getting very simple questions horribly wrong ( “9.11>9.9”, “how to ferry a goat”, “countries with ‘mania’”) should’ve felt a pang of recognition. Trying to get the big questions right often comes at the cost of getting the simple questions wrong. That might be an acceptable tradeoff in academia, in design it can be fatal.

“Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.” — George Box, 1976.

The point Box tries to drive home is not only that there are decreasing returns to upping model complexity and pursuing correctness is an elusive goal, but that returns are indeed quite often negative.

Models and the cybernetic economy

The scene-setting assumption of EconPatterns, expressed in the very first post, is that we operate in a cybernetic economy were macroeconomic aggregates are often deceptive.

Economic engines — take for example a dynamic pricing model for an airline or an energy provider — are elaborate beasts by necessity. If we want to capture time, portfolio, geography, bundling, and other consumer preferences in price finding, we are quickly confronted with a staggering number of variables we have to juggle to produce anything coherent, nevermind accurate, which seems to go counter to Box’s remonstrances about beauty in simplicity.

But the seeming contradiction is easily resolved. Even the newsletter emphasized that “economics usually skips this operational layer for the sake of expositional expediency, and for the most part it does ok doing so.”

The skill in modeling rests first and foremost in the ability to ruthlessly pursue parsimony, but also in the ability to recognize when and where parsimony fails us.

Translated into a design strategy, this means both to have a mental model of the ultimate end product when starting with the first sketches, but also to recognize the underlying fundamental mechanisms — the primitives — in the end product.

The only way to resolve this is to modularize the design process, not only to make it composable (we can assemble the system from the modules), but also compositional (we can infer system behavior from module behavior).

What if models are wrong?

Anyone who has ever spent any time in the engine rooms of capitalism knows how ubiquitous quantitative modeling is to predict anything at all. Even smalltown bakeries predict sales to decide how much flour to buy and how many loaves of bread to bake. Insurances run massive predictive operations staffed by actuaries. Even the microchips in our smart phones use prediction to allocate tasks.

We are surrounded by model-based predictions in our everyday lives, indeed one might claim our livelihoods depend on them. We just choose to ignore them because they’re mostly operating quite well until we’re confronted with the consequences of them breaking down — the negative surprise that gets our attention.

Undergrad microeconomic classes at business schools teach expected value of imperfect information ( EVII) as a simple “managerial” framework to explain Bayesian updating from a decision-theoretic perspective.

If you as a decision maker have a 20% chance of being right in your predictions, how much would you pay someone who has a track record of being right 30% of the time for a particular prediction? Not much, you’d think.

Narrowly speaking, that’s a pretty good mental model about how the consulting industry works, but a bit more philosophically speaking, the idea that models have to be perfect to be useful is a (very common) fallacy — usually expressed as “we don’t have enough data to model anything” or “the model made a wrong prediction so the model must be wrong”. Indeed models can often be especially useful even if they are far from accurate.

Strictly economically speaking, models are useful if they shift the probability of being right about the future upward, even if it’s only by a small delta. We only have to compare the salaries of baseball players who hit a .200 batting average (the proverbial Mendoza line) with those who hit at a .300 clip. Getting it wrong 70 percent of the time can be a pretty lucrative skill under the right circumstances.

The purpose of modeling is to reduce the propensity of negative surprise, which is why we usually only notice when they do the opposite.

To update Max Planck’s famous dictum that science progresses one funeral at a time, formal modeling helps us to speed up science so that it progresses one public embarrassment at a time — which happens every time a confidently made prediction is stumped by reality.1

To up the ante, managerially speaking, models are important decision tools even if they don’t improve chances of being right at all, simply because they act as tiebreaker tools to overcome decision paralysis, especially in scenarios where “we don’t have enough data to model anything”.

It’s a simple explanation why soothsaying and bird divining existed throughout history, and why they’re still around. Sometimes people need a device — any device — to prune the branches of their personal decision trees, to overcome the form of decision paralysis we know as “procrastination”.

Good models, beyond improving predictive accuracy, also help simply by providing a formal grid to map out the structure of the decision scenario. This is what makes game (and decision) theory relevant: it’s a formal tool to map out the interrelatedness of scenarios around the decisions the participants face.

How to decide if models are wrong

Popular opinions about modeling range from “a modeled prediction establishes a scientific fact” to “models can’t predict anything at all since (prominent example where a model went wrong)”. Strangely enough, both mental models seem to be especially popular in the natural sciences, sometimes even proposed by the same person.

Neither of these extreme positions have any grounding in reality, and their popularity is likely more the result of ambiguity intolerance and conceptual problems with the idea that improvement can come in the form of an upshift in likelihoods.

As Milton Friedman put it in the context of positive economics as a scientific endeavor:

“Its task is to provide a system of generalizations that can be used to make correct predictions about the consequences of any change in circumstances. Its performance is to be judged by the precision, scope, and conformity with experience of the predictions it yields.” — Milton Friedman, 1953.

But this is only one side of a coin. We can devise models that are purely predictive (the internal causal mechanism remains opaque to us) or models that are purely causal (they make no claim about predictive accuracy or might even be deliberately wrong in the expectation that how and where they go wrong reveals something about the internal causal mechanics) — like we do with pretty much every financial plan ever.

Most models end up somewhere in-between on that spectrum. The important thing is to be upfront about what the design objective is.

I’ve written about the role of modeling in science, the social sciences, and economics before, but it remains a contested issue, so it felt like devoting a whole post to it might be worth the effort.

My take is ultimately shaped by my own experience in industry, and in turn shapes what I am trying to achieve with EconPatterns.

The short version is that formal modeling is a relevant part of economic practice, especially the unobserved part (the “engine room”) of economic practice, that a sound understanding of formal modeling tools is necessary for anyone within economics (even if the need for mathematical rigor varies widely between fields).

The economy is also a data-rich environment, and we have enough experience to know that certain things in the economic realm are bound to follow discernible and generalizable patterns.

But formal modeling has to rest on a sound conceptual understanding, and economic endeavors, especially those that include economic design, should spend enough time on the conceptual architecture to not end up building complicated models that fail at the simple answers.

On the same topic, see also Philipp Zahn’s perspective.

It should be noted here that Planck’s dictum itself is an astute observation about belief systems and the glacial progress of belief propagation in academia. This might be worth its own newsletter. ↩# Archive of category 'cybernetics'

Mar 22, 2024

•

On Organization

Mar 8, 2024

•

Stocks, Flows, Transformations: The Cybernetic Economy

May 29, 2022

•

categorical,

What is Categorical Cybernetics?# Archive of category 'jekyll'

Nov 26, 2023

•

jekyll,

update

About the CyberCat Institute blogCross-posted from Oliver’s Substack blog, EconPatterns

On a certain level of abstraction, an economy can be described as a network of stocks, flows, and transformations. Let’s call this level the cybernetic economy.

Stocks, flows, transformations

Stocks and flows are two fundamental forms of displacement: in time and space respectively, and they are typically restricted by upper and lower capacity constraints: overstock vs stockout, overflow vs desiccation.

Transformation in the usual sense of industrial production means the recombination of inputs to produce new outputs, but we can also include creation and consumption as starting and endpoints of network flow. In the case of natural resources, creation often takes the form of extraction.

The stocks and flows usually come in the form of information, materials, effort, payments, equipment, and on a more abstract level, risks, beliefs, rights, and commitments. Risk is just as much an economic good that can be transformed, bundled, disassembled, transported as any physical material.

Most of these objects should sound familiar from economic textbooks, especially macroeconomic textbooks. The cybernetic economy differs from this textbook treatment mostly by explicitly highlighting the network of interactions, and by stressing the global ramifications of local interactions.

This network view of the economy on the other hand should be familiar to anyone with a background in industrial production, where orchestrating multi-step processes on shop floors densely packed with machines, pathways, buffers, and assembly stations is a major part of the job description, and where stockouts of five-dollar parts can stop ten-million-an-hour assembly lines — as can pathways congested by improvised material buffer overflows.

Economics, especially macroeconomics, usually skips this operational layer for the sake of expositional expediency, and for the most part it does ok doing so. As long as the operational friction stays within bounds, no stocks and flows pushing against their upper or lower capacity limits, no production schedules foiled by unobtainable five-dollar components, we can safely assume a frictionless world and focus on the established gears and levers central to macroeconomic inquiry.

In other words, as long as there is only a modicum of disorder in the economy, it’s perfectly fine to assume a well-ordered economy.

Which underlines a key principle: the right level of aggregation matters. A map is not the territory, but we might need different maps to do different things within the territory. In the same sense we can drop operational details and aggregate activity on a high level as long as we can be sure that the loss of realism — the loss of predictability — is inconsequential for the task at hand.

But we should have a more fine-grained map at the ready just in case our survey map fails to capture the finer points.

The cybernetic economy

The economy we’re looking at is an economy that can be disaggregated and disassembled to the individual component, the individual participant, the individual activity, just as needed whenever it is needed.

I’m resurrecting the somewhat outmoded term “cybernetic” for it because it conveys the focus on flows, on routing, buffering, concatenating, on orchestrating activities and resources.

Routing, network flow, buffering, job shop scheduling, machine replacement models are all standard tools of the trade in operations research. They are no longer, or not yet again, standard tools in economics, but in order to describe the economic activities as intended, and to couch them in a wider social and political context, they should become economic tools again.

EconPatterns intends to bring them back together under the same motivation that it intends to bring mathematical, statistical and computational tools together: to build up a toolset which we can use to design economic objects.

But, and this is the conjurer’s trick, it’ll do so almost entirely without resorting to formal modeling or even mathematical notation. This is not out of nostalgia for an era where political economy was a branch of the philosophical faculties. The economy is as data rich as any field of inquiry and we seem to have just enough recognizable, repeating and generalizable patterns to give the scientific method a try.

But the point of the exercise is to develop an economic design language, to establish a conceptual foundation, rather to rephrase current economic knowledge. This is why it invokes the famous Bauhaus Vorkurs, the foundational course that gave the Bauhaus students a starting point from which to branch out into their respective workshops.

The things for which economics, mathematics, statistics, operations research, computer science, and other fields have developed very intricate formal mechanisms will pop up mostly as pointers. The question which sorting, filtering, or separating algorithm to use is relevant and often decisive to the success of an economic activity, but it is secondary to the question when to sort, filter or separate — and what.

Instead it will take very close looks — some might think unreasonably close looks but my hope is the reasons for doing so will reveal themselves in due time — at existing economic artifices and their constituent parts. One of the motivations is to show that the Grand Bazaar in Istanbul and an online e-commerce platform have surprisingly many things in common, and there’s a reason for it.

An economic pattern language

To this end, EconPatterns — and I believe this is the defining novelty — will borrow liberally from design theory and practice, as well as from architecture. The chosen container for this endeavor is Christopher Alexander’s design pattern. There are many reasons for this choice, not the least of which is that design patterns have successfully been translated from architecture to software design.

The in-depth discussion of “why design patterns?” surely deserves its own article, but it also introduces an interesting tension. As design philosophies go, Alexander and the Bauhaus stalwarts are certainly at opposing ends of the spectrum, A to B, organic to geometric, habitable spaces to machines for living.

I’m hoping to put this tension to good use. Designing economic contraptions poses relevant questions beyond their productivity and efficiency. Which is a major reason why I am not trying to resolve that conflict or take sides.

Admittedly, the whole endeavor is open-ended, and the crucial question if the patterns sketched out so far will ultimately come together as a coherent whole is still unresolved. This is why the blog format is the right one at this juncture: to put the question out in the open while I present the first pieces of the puzzle.

EconPatterns will inevitably be shaped by my own background and my own particular interests, which is one reason why economic organization will be the initial focus. The fundamental model of the economy is different, as is the underlying concept of human behavior (as next week’s entry will show). I’m somewhat inclined to say that there are not that many people out there with a background both in design and economics, so I’m quite comfortable in claiming that the exercise should offer sufficient novelty.

I’m also very clear that I don’t hold exclusive rights to the very concept of design patterns — if anything I might be the first practitioner to apply them to economic design problems — but the ultimate defining characteristic of a design pattern that sets them apart from economic laws is that they’re entirely voluntary. They are simply proposals of how to look at, structure, and solve a certain design problem, and the ultimate arbiter for their success is if enough practitioners will find them useful enough to apply them to express their ideas.

Which in itself should hopefully take much of the pedantry out of economic debates.# Archive of category 'supply chains'

Apr 4, 2024

•

Value Chain Integrity

Dec 11, 2023

•

AI Safety Meets Value Chain Integrity> “All of this will lead to theories [of computation] which are much less rigidly of an all-or-none nature than past and present logic. They will be of a much less combinatorial, and much more analytical, character.

In fact, there are numerous indications to make us believe that this new system of formal logic will move closer to another discipline which has been little linked in the past with logic.

This is thermodynamics, primarily in the form it was received from Boltzmann, and is in part theoretical physics which comes nearest in some of its aspects to manipulating and measuring information.”

– John Von Neumann, The General and Logical Theory of Automata, 1948.

Allow me a quick excursion from the regular programming to celebrate the 2024 physics Nobel Prize awarded to John Hopfield, inventor of the eponymous Hopfield network, and Geoffrey Hinton, co-inventor (with Terrence Sejnowski) of the Boltzmann machine.

Since this is an economic design series, the question why a physics Nobel, and especially a Nobel Prize awarded for a contribution to machine learning, should be of interest is a fair one.

The long answer is that, having spent a few long years translating the underlying mechanisms of both networks into economic game theory, and in turn into the emergence of consensus (or its opposite, partisanship) in social groups, I think I can offer a fairly unique perspective to discuss the impact of this prize on economics.

The short answer is that these two networks also shape the whole economic outlook presented in EconPatterns.

To recapitulate.

In the first post, I established the economy as a network of a small set of fundamental activities: stocks, flows, transformations, which have to be orchestrated to produce desirable outputs.

This orchestration requires agreement on beliefs among participants, first that these activities do indeed lead to these outcomes, and second that these outcomes are indeed desirable.

This framing mapped a network of economic activities onto a belief network, with the underlying assumption that unless all participants have perfectly homogenous beliefs, goal conflict within the network becomes inevitable as the network becomes larger, until ultimately the network has to crumble into smaller subnetworks (aka clusters) that can hold shared beliefs.

Expressed in the first law of organization: the objective of organization is to resolve the conflict between moving forward (orchestrate activities that produce a desirable output) and staying together (hold the shared belief that these activities do indeed lead to the proposed desirable output).

Where this conflict cannot be resolved within an organization, competition emerges.

Competition is the starting point of economic inquiry, and it typically treats it as exogenous. In other words, competition has to happen, and by virtue of simply happening (and by drawing attention to surpluses and scarcity in the economic network via price signals) it helps steer the economy in the right direction.

What it skips is the question where exactly the beliefs diverge sufficiently that orchestration within the same organization is no longer possible so that rifts open up and competition emerges.1

This is a question that economics of organization tries to tackle in the form of the make-or-buy decision, but finding an appropriate formalization has been elusive. And this is where Hopfield, Hinton, and Sejnowski come in.

Hopfield networks and belief clustering

To make that leap we first have to divest ourselves from any expectation that our formalization expresses any kind of tangible economic activity, and accept that we go down to bare-bones expressions of individual beliefs, and the main activity is to both be influenced by and trying to influence the beliefs of ours peers.

In other words, for any proposition, participants express their beliefs in the simplest possible way as subjective expectations: in simple Boolean logic, zero for “I believe it’s false”, one for “I believe it’s true”, or in a stochastic setting, any value between zero and one as the expression how probable they consider the proposition to be true. (Alternatively we can consider a wider range from −1 to +1 to express opposing beliefs, which is especially useful in political settings.)

Hopfield’s first paper, published in the midst of the first “ AI winter” in 1982, astounds in its brevity. It is only five pages long.

Up to this juncture, including the emerging connectionist revolution that lead to Rumelhart & McClelland’s famous two-volume work in 1986 (which also included Hinton & Sejnowski’s paper), neural networks where almost exclusively conceived as feedforward networks (information flows from input to output) with backpropagation (feedback flows from output to input) as learning mechanism.

Hopfield’s recognition was to fold the network unto itself: all network nodes can send and receive signals to and from all others, and the designation as input or output nodes is arbitrary.

In isolation this wouldn’t be particularly interesting, but the marvel of neural networks in general, and Hopfield networks in particular, is that the behaviors of individual nodes are connected, and that this connectivity can be expressed in a weight (or covariate) matrix, where high positive weights translate as “shared beliefs” and high positive weights as “opposing beliefs”.

Neural networks function in two modes: training mode (weights are flexible) and execution mode (weights are fixed). Training in this case translates into finding out which nodes hold correlating beliefs, and setting the weights accordingly.

Hopfield’s question is what happens when a connected network with a given set of (symmetric) weights plus a vector of isolated beliefs (aka biases) per node is allowed to converge from a given starting state (the input) to a stable state (the output), when each node tries to agree with all connected neighboring nodes with shared beliefs and disagree with neighboring nodes with opposing beliefs.

Hopfield’s first paper from 1982 tackles this question with a Boolean choice of zero and one for all nodes, and a second paper from 1984, also five pages long, expands this to allow uncertainty in the form of probabilistic belief values between zero and one, plus a sigmoid function to connect inputs and outputs.

His conclusion, in the shortest possible form, is that the network exhibits memory, in a form that makes it a “content-addressable memory”.

In other words, the network converges from an input pattern to the nearest pattern it has been trained on — an important feature in pattern recognition with an obvious early application in detecting handwriting. If the input pattern is something that vaguely looks like a 7, the output ideally should identify this as a 7 and not a 3.

Under the right conditions, if the training data set contains a number of shapes that are vaguely 7-ish looking, the network should memorize this as a distinct pattern and when activated, recognize this.

In somewhat more technical language, the network should contain local optima representing the trained-on patterns and basins of attraction that capture all the trained variants (and their interpolations).

Boltzmann machines and the rationality of erratic behavior

As a physics-inspired mathematical construct, this is extremely neat and its translation into belief-driven collective action expands beyond the metaphorical similarity. Implemented as a feedback network, it has quite a few drawbacks which curbed its widespread adoption in favor of less finicky backpropagation architectures.

One major drawback, in an analogy to what I like to call the “bicycle repair cooperative on Shattuck Ave”, is that it doesn’t scale particularly well.

Shattuck Avenue is in downtown Berkeley and the bicycle cooperative prided itself on its strong collectivist ethos, where all topics are discussed and decided together. This might work if the collective is small and beliefs are highly aligned, but runs into trouble when the collective gets bigger (adding one new member adds N new connections) and beliefs diverge.

Which is why “fully connected consensus” never translates as a template for large companies.

The other problem is that it produces a whole lot of local optima which don’t map to trained patterns, so the network is always at risk of producing meaningless output — a problem that also increases with network size.

One remedy for this comes from Hinton & Sejnowski’s Boltzmann machine, which introduces “vanishing noise” as a means to avoid local minima.

Vanishing noise just means that as a node is called upon to update its belief (aka state), we introduce a small likelihood that the node accepts a new state even if it is unfavorable, and that this likelihood becomes smaller over time.

This is very much an analogy for shaking up the system, implemented as “distributed simulated annealing” — annealing being the metallurgical procedure to add short spurts of heat in a cooling process to avoid getting trapped in imperfect lattice structures.

The connection to thermal annealing not only creates a connection to physics proper, it also opens another batch of neat features.

For one, we suddenly have a system that even if behavior happens on the individual level — each node updates its belief individually and only according to its own interests — we can still express the behavior of the whole system in a single macro equation.

The economic equivalent of this is a potential game (introduced by Dov Monderer and another Nobelist, Lloyd Shapley) where changes in individual utilities can be captured in a single equation for the whole game.

The other intriguing feature is that under vanishing noise, we can characterize the equilibria the system reaches using the eponymous Boltzmann distribution, which tightly connects the model to statistical mechanics, and in turn to entropy, free energy, and — importantly for us — surprise.

This might answer the question why the physics Nobel committee deemed their work worthy of recognition.

In the early 1980s, at a time when artificial intelligence in general and perceptrons in particular were seemingly going nowhere, a bunch of researchers centered around Princeton and Carnegie Mellon put computation on a physical footing, just as John Von Neumann had predicted.

Economics of influence, economics of attention

But the goal of this post is not to resolve the befuddlement that befell some physicists at the news that the physics prize was seemingly awarded to a discovery in computer science, but to map out why this matters to economics, and in particular to economic design.

Starting out in this endeavor, I had at best a vague notion of statistical mechanics and mostly considered this kind of feedback network a bare-bones metaphorical model of what would happen in a social group that faces a simple binary choice and influences each other in their choices, couched in the at the time headlines-making problem of technology standard competition, with the major claim to novelty being that if you consider heterogeneous network effects, you’ll get more interesting results — especially that you can get interesting partisan dynamics that go against the then agreed-upon paradigm that positive network effects inevitably lead to monopolization.

The findings were mostly met with indifference at the time, but economics has evolved significantly since then. Graph theory has become a recognized tool in large part because of the emergence of social networks and the “network science” revolution in sociology. Machine learning has arrived in economics some ten years ago and is currently in a state that can only be described as feeding frenzy.

There is a recognition that the landscape of products is unsurveyable for the consumer, leading to attention dynamics and herding behavior (then of little interest outside finance), and to the introduction of two novel economic engines that found widespread adoption in the internet age: recommender engines and reputation engines.

There is an emerging understanding that preferences are not inscrutable and outside the scope of economic inquiry, but that they are tractable and that they evolve in predictable ways.

There is now far less discomfort dealing with scenarios that have more than one equilibrium, so existence proofs have gradually given way to convergence and evolutionary dynamics.

Bayesian inference, including Bayesian games of incomplete information, is slowly making inroads, giving us a richer toolset to thing about belief propagation and the evolution of norms.

And most importantly, very rich interaction data has become available, pushing the “not very interesting” findings of ideological polarization as an outcome of network heterogeneity to the forefront of the academic (and non-academic) debate.

My own position also evolved over time.

For one, I no longer consider it merely a metaphorical model of human behavior, useful as an illustrative but empirically intractable shorthand for what happens in a social group when traditional preferences and peer influence intersect.

The evidence that statistical mechanics plays a role not only in human behavior, that thing we call rationality (or even bounded rationality), but also in the goal-directed behavior of organisms we don’t generally consider rational, keeps mounting, as I mapped out in the post on surprises for eyeballs as the fundamental exchange of (not only) the attention economy.

Mostly outside of economics, much progress has been made at the intersection of statistical mechanics, Bayesian inference, and information theory, and it slowly trickles into economics proper via the translation into (evolutionary) game theory.

Cognitive effort is a scarce resource which needs to be allocated towards the activities that promise the highest return. The mechanism by which this allocation happens is attention, a term that, other than a common acceptance that we now live in the attention economy, has gained little traction in economics, even if the administration of scarce resources is at the core of economic theory.

In economic design this is somewhat different, since it ultimately deals with conceiving structures that facilitate mutually beneficial exchange. So the orthodoxy that befits theory is of little help, as the primary goal is to make the engines work, to make sure they fulfill their designed function.

This inevitably requires a wider vocabulary, including learning agents, including agents that learn from each other, including agents that form belief clusters as the subset of peers they’re willing to learn from. That includes machine learning as a facsimile for learning agents.

This also includes investigating rationally erratic behavior, mutation, or “physiological” annealing, deviation from self-interested behavior that gets more frequent as the temperature increases.

It includes developing an understanding of the emergence of norms as decentralized constraints of individual behavior.

Once we start incorporating these tools into our design handbook, it quickly becomes apparent that they go a long way towards explaining common behaviors, including behavior we don’t necessarily consider “economic”.

It also fuels the prospect that we will conclude in due time that the mechanism Hopfield, Hinton, and their peers described is embedded in far more biological systems than we thought, and maybe humans are indeed lightning calculators of pleasure and pain, but not in the way we assumed they were.

Literature

John J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, PNAS, 1982.
John J. Hopfield, “Neurons with graded response have collective computational properties like those of two-state neurons”, PNAS, 1984.
Geoffrey E. Hinton, Terrence J. Sejnowski, David H. Ackley, “Boltzmann machines: constraint satisfaction networks that learn”, Tech report, CMU, 1984.
Geoffrey E. Hinton, Terrence J. Sejnowski, “Learning and relearning in Boltzmann machines”, in Rumelhart & McClelland, 1986.

This is of course extremely truncated. The famous economic calculation debate centered on the question of central planning, where “central” was defined as the administrative state also orchestrating economic activity. The perspective EconPatterns takes is of course a different one, as laid out in the newsletters on organizations as tectonic plates and the blurry boundary between economics and operations research. ↩# Archive of category 'cryptoeconomics'

Jun 24, 2022

•

cryptoeconomics,

open game engine

A Software Engine For Game Theoretic Modelling - Part 2# Archive of category 'AI'

Feb 6, 2024

•

Passive Inference is Compositional, Active Inference is Emergent# Archive of category 'events'

Oct 28, 2024

•

events

Mathematics for Governance Design# Archive of category 'category theory'

Jun 28, 2024

•

category theory

The Yoga of Contexts I

Apr 22, 2024

•

The Build Your Own Open Games Engine Bootcamp — Part I: Lenses

Apr 12, 2024

•

Enriched Closed Lenses

Apr 1, 2024

•

game theory

Colimits of Selection Functions

Feb 22, 2024

•

Iteration with Optics

Jan 16, 2024

•

category theory

How to Stay Locally Safe in a Global World# Archive of category 'active inference'

Feb 6, 2024

•

Passive Inference is Compositional, Active Inference is Emergent# Archive of category 'economics'

Oct 14, 2024

•

On Hopfield Networks and Boltzmann Machines

Sep 2, 2024

•

model

On Modelling

Aug 15, 2024

•

Beliefs, Belief Propagation and Belief Clusters

Jul 15, 2024

•

compositionality,

model,

Compositionality and the Mass Customization of Economic Models

Apr 4, 2024

•

Value Chain Integrity

Mar 22, 2024

•

On Organization

Mar 15, 2024

•

The Attention-Seeking Rational Actor

Mar 8, 2024

•

Stocks, Flows, Transformations: The Cybernetic Economy

Dec 11, 2023

•