title	Why Adding Ontologies to LLMs Won't Yield Machine Intelligence
author	Jobst Landgrebe & Barry Smith
language	en-US
rights	Creative Commons

Why adding ontologies to LLMs won't yield machine intelligence

Channel: Barry Smith

Speakers: Barry Smith & Jobst Landgrebe

Introduction & Definitions of Intelligence

[00:00] Barry Smith: So uh, Jobst Landgrebe is the co-author of the book Why Machines Will Never Rule the World, which was published by Routledge first of all in 2022, then in a second and enlarged edition in 2025. Uh, Jobst has an interesting background to say the least. He's worked at high levels of various scientific and mathematical disciplines, uh, including most recently what we might call onco-mathematics, which is the mathematical study of cancer tumors roughly speaking, uh, in which he used AI methods. And so he is, as am I, very much a believer in the powers of AI. We think AI is wonderful, but we also think that these powers are limited—quite drastically limited—from the perspective of many people, including those people who believe that AI will be conscious, will have ethical rights so that we can protect them from being switched off and so forth. All of which are, of course, nonsense. Today, Jobst is going to present a survey of his reasoning when it comes to our understanding of the, uh, limits of particularly large language models.

[01:18] Jobst Landgrebe: Okay. First of all, uh, when we ask the question whether we can expand AI by using ontologies, we have to start at the beginning. So the motivation here is that many people say, uh, LLMs alone cannot create AI, but if we combine them with ontologies, then we will get true artificial intelligence. And what I'm going to do today is to show that this claim is wrong. Now to start, we have to understand what intelligence really is. And there's a definition of intelligence by, um, which was elaborated by Scheler in the 1920s or so—100 years ago—which says that intelligence is realized in humans and higher animals as the ability to adapt to new situations. Um, and this adaptation must be sudden, um, not primed by prior experience. It must be meaningful so that it must be appropriate for the animal or the human being in the situation. It must be untrained and not be a product of repeated attempts of trial and error. It must be novel from the perspective of the acting organism. So it mustn't be novel for mankind or for the species of animal we are talking about, but must be novel for the individual which experiences the situation. And if all these conditions are given and the reaction is appropriate, then we speak of intelligence.

The Mathematics of AI: Deterministic vs. Implicit Models

[02:47] Jobst Landgrebe: Now, AI always consists of two types of models. Either, um, um, uh, and these are always mathematical models: and either they are deterministic models, such as the one shown here on the left, which is, uh, the momentum equation of the Navier-Stokes equations. Um, and this is a deterministic equation which has no error term, and one can, um, calculate exactly with this equation. And on the other hand, we have, um, implicit mathematical models, which are models that are not created by a physicist or mathematician thinking about data and phenomena and writing down an equation like the one on the left-hand side, but which are obtained by, um, by, um, an optimization procedure. And the one that you see here is a multiple linear regression model, which was invented by Boscovich, Legendre, and Gauss towards the 1800s. And it is an equation where the parameters are obtained by what is called a configuration of training.

And, um, so you see that what the main difference between the equation on the left-hand side and the equation on the right-hand side is, that the one on the right-hand side has an epsilon ($\epsilon$), which is an error term and which describes the sum of the deviation of all the points, uh, from the model which are not directly on the regression line or regression plane. And DNNs [Deep Neural Networks] and LLMs which we have are merely very, very big and complicated regression equations. And, um, so this we have to take into account. And they have always an error term because they are just ways of, of, um, modeling relationships between data.

And now both methods—and that's very important—require regularities in the data and can only model or estimate regular relationships. So the differential equations of physics, they describe the regularities we find in nature or that we can tease from nature by doing experiments. And, um, they are valid, uh, because there are these regularities in nature, and the science of physics is the science to, um, elicit these regularities, um, from nature and identify them. And on the other hand, we have, um, these, these, uh, um, regression models, um, that, that, that if you apply them to data, they will only pick out the regularities of the data and not be able to model irregularities. So that these models are partial. Um, in addition, we also will hear later at the end of the talk that intelligence is characterized by irregularity, and therefore it cannot be mapped by these methods to mathematical models. And it's also a dominant characteristic of a complex system, which makes it even worse, and we'll hear this also later. So what, what is important here is that these are both, um, these are the big fundamental types of mathematical models used in AI.

Stochastic Learning, Hallucinations, and the Curse of Dimensionality

[05:50] Jobst Landgrebe: Now, um, uh, AI that is now so fashionable is mostly stochastic or statistical learning, which is used for automation and pattern identification purposes. And it works like this: you take an input set of data and then an output set, and the AI—the equation of the AI shown here on the right-hand side—is modeling a relation between these two sets, for example, between email and spam. And, and when you have now, um, an element, if you train such a model, if you obtain the parameterization of this function, um, and now you, you give it a new data point or new observation that is a data point outside the training distribution that we see here, um, then the likelihood that the model will miscalculate is very high and will lead to large prediction errors. And that's what we see all the time happening with, with also large language models, which cannot cope well with, uh, um, with, um, data points that are outside the distribution, and then they hallucinate, for example.

Um, and the relation is, by the way, obtained by applying numerical optimization procedures to the data, about which I cannot talk today, but which, which are used to create the parameterization that we saw here. These parameters here are created automatically by the optimization procedure, but the optimization procedure itself is defined by humans. And that is very important—machines cannot do this. You have to, to, to define all the setup here: what is the input data, what is the output data, um, which type of relations do you allow, how do you optimize it—all this is, is done by, by humans.

[07:35] Jobst Landgrebe: Now, um, there are many stochastic models that are very useful, but they, they are scope-restricted and error-prone. And, um, we have written a paper in 2019, Making AI Meaningful Again, which, which lists all these errors. And, um, I just want to highlight here—these are funny examples from, from the first Google Gemini picture generation algorithm, which was, which was misconfigured, um, according to the principles of post-modern collectivism, and therefore, uh, created female popes, Black U.S. presidents, and, and Black, um, Wehrmacht soldiers. It had to be, um, it had to be taken offline because it was, of course, ridiculous. And, um, and the models also pick up certain regularities and then they perpetuate them. Like here: AI-generated clocks always show the advertisement position of the clocks from, from advertising pictures found on the web. And so they, they, they depend heavily on the selected training and sample training data, and their specific annotation.

They fail with heterogeneous annotations of identical inputs. They fail with sparsely populated sample spaces, which is a massive contrast to what humans and animals can do, because real intelligence, um, uh, can do just that. Real intelligence can really, uh, uh, find solutions for, for previously unseen problems. And exactly this, they can't do. They are heavily trained, so they are not intelligent at all, and there's no guarantee that the result will be corrected as soon as an error is detected. That's why Google couldn't correct this model here anymore and had to throw it away. Um, the algorithm doesn't understand anything. It cannot predict its own failure. The model cannot—the models cannot model far-reaching relationships. So you, you cannot—the model cannot understand The Brothers Karamazov, which is, which has one and a half thousand pages. It can only summarize a Wikipedia page about it which was written by a human, but can't itself identify the complicated relations inside this novel, for example. Um, also they, they, the models cannot, uh, uh, um, model the semantics of mental categories. And they have, have completely insufficient accuracy for critical situations. Um, they also, um, tend to, to create false generalization. So, so overall, the models have many, many problems and, um, therefore, um, people have, have tried to, to want now to add ontologies, or to, to the models or combine them with ontologies. But before we look at this, I would like to, to explain to you a bit how the LLM really works.

[10:11] Jobst Landgrebe: So essentially, there are huge, um, uh, deep neural networks which are preconfigured without an outcome, and then they can generate sequences of tokens from prompts. And so, so here you see an example of a sentence: "The cat sat on the mat," to which one has added, um, beginning-of-sentence and end-of-sentence tags. And the configuration of the LLM now yields a model—the training yields a model of language in which each token, um, is, is modeled based on the probability of the preceding token. So here you have a series of preceding tokens like "The cat sat on the," and now the model would probably predict "mat" if it has seen this sentence very often in the training data. Um, but if it would have seen more times "A cat sat on the floor," it would predict "floor" upon getting the words "A cat sat on the." And now you can, if you do this, you can then create a sequence and, and do this iteratively again and again until the most likely token given out by the deep neural network is stop, at which point it will end its output. And, and so that mathematically speaking, that means that the LLM simulates an operator via iterative functionals.

[11:25] Jobst Landgrebe: And now one big problem of, of, of the, of the deep neural networks is, um, uh, that that you see they, they are—there's input text taken. The input, um, is mapped to approximately 100,000 tokens. Um, these tokens are usually syllables or, or, um, other, um, beginning-of-sentence, end-of-sentence indicators and so on. And, and this is done using a standard—in the standard approach, uh, context window of 8,192 tokens, but it can also be thinned down and then blown up so that you can have 128,000 tokens, but with less information in the window, of course. And then the, the tokens are embedded into a vector of length 1536, which is, of course, a dimension reduction. And then after dimension reduction, um, GPT-4 has a matrix—a data matrix with 1536 columns.

Now comes the important point: this is a very big matrix with many dimensions. And this means that to achieve the same density that you obtain from 100 data points in one dimension, um, for example, which is sufficient to test height differences between males and females—there 100 samples are high density—but to get the same density with 1536 dimensions, you would need $100^{1536}$ data points. So you would need a tremendous amount of data points if you want to have the same data density for your data in the LLM that you have here to test in one dimension. So that's called the curse of dimensionality. And the problem is now, this is an incredibly large number because the, the observable universe that we can see with telescopes and so on has only $10^{80}$ protons—so that means, that's the Eddington number. And so that means that the, the density—this high density that we have in low-dimensional problems—can never be achieved in our universe.

And so to compensate for the curse of dimensionality that is happening in the LLMs, they have to go and undergo massive reinforcement learning with emulated output evaluation. It's called reinforcement learning from human feedback (RLHF). And the problem is, in the, in the, in, in GPT-4, this mechanism has only 16 million input tokens over thousands of training steps. So therefore, the distribution matrix, um, uh, for the model is still massively underdetermined, and this creates a very thinly populated sample space. And so for a $p$-dimensional unit ball with $n$ uniformly distributed data points, this equation gives you the median distance from the ball center to the closest data point, and if you plug in 1536 dimensions and 16 million data points, you, you, you obtain 0.98. So that means that 90% of the data points, um, are closer to the boundary of the data sphere than to the center. This means that the LLM's regression hyperplane is massively underdetermined, especially for edge cases. And this is what produces the errors in the end.

Right, and what this just says is that if you try to put the knowledge of the world—that is in a way reflected by the texts of the internet which are used to train the LLM in the way I described here—you are, you are creating a dimensionality with 1,536 dimensions, which leads to a totally underdetermined distribution model. And this is a fundamental reason for the errors, that, that, that there is just no way to cope with such a high-dimensional vector. But if you make the vector—vector low, give you, you give it a lower dimensionality, you, you, you end up with other problems because you lose a lot of the information of your input text. Because already this here is a, is a compression of the data. Of course, this embedding already is a massive compression of the data. That if you would make it smaller, this would even be worse. So, so they kind of came up with, 'okay, we have this high-dimensional matrix to, to not compress the input data too much,' but then on the other hand, this, this leads to a total underdetermined data space. And this is something that is very often not taken into account when thinking about LLMs. Certainly, the Pope in his new, uh, encyclical about artificial intelligence did not take this into account.

[15:56] Jobst Landgrebe: Now, here we see how, um, uh, how an LLM is, is, is basically, um, uh, uh, trained and, and used. So, on the left-hand side we have the training, um, of the LLM; on the right-hand side we have its usage. And, and here I'm highlighting again that whenever the LLM is computing, it always—it doesn't have this structure, of course, a much more complicated structure—but it will, but every LLM is a regression model with an error term. And here, I don't want to go into this now too much, but here you can see that all the steps of configuration, but also of usage, contribute to the overall error-proneness of the LLMs. And all the assertions made here—like that the training data are always incomplete by necessity, that refinement—the refinement is massively biased for non-empty and answering aligned with, very often, post-modern collectivism, that simulated data lead to sub-parity with humans—all these assertions here that show you how the errors are generated upon configuration and also usage of the models can be proven by reduction to the Turing halting problem. So you can—and because the halting problem cannot be computed, that means that the—that there are necessarily always errors in this, in the, in the models, how they, how they work. So this is very important, and this has been proven. And it's interesting to see that in our culture, these proofs, which have been already published in '22 and then more proofs have been published later on, that they're just not taken into account. That people still talk about these models as if they were something special, as if they would, I don't know, have, have human personality-like properties and all of this. No, these are just regression models which by necessity make a lot of mistakes, and this has been mathematically proven. And, and we have public opinion and culture that just refuses to take into account the proofs, which is quite remarkable.

Understanding the Context Window

[17:56] Jobst Landgrebe: So, anyhow, um, now you see because of all of this, these many problems of the models we saw here—we have these problems we listed, then we, we saw the problems of dimensionality, we looked at the problems, uh, that come from, from, from the, from the way the models are trained and used—now people want to, um, combine the stochastic AI with so-called symbolic AI, which are ultimately, uh, very often ontologies. And the goals of this combination are the following: one wants to lower the configuration data need by using prior knowledge, and this is very well feasible. So you saw here the curse of dimensionality, but you can reduce the number of data points needed, um, by, um, uh, um, by, by introducing prior knowledge, and, um, you can also obtain certifiable AI, which is something I've written about a couple of years ago, which means that you can create test batteries for an AI system and, and basically certify that the AI system can pass these tests. You can to some extent obtain explainable AI, but not very well, but I cannot talk about this today—it has to do with the black-box nature of the models. So with millions of or billions of parameters, you cannot explain why the model created a certain output. Um, you can also obtain much better reliability, quality, and control over the models by introducing ontologies. But we cannot generate machine intelligence by, by, um, combining ontologies with neural networks. Um, so far, are there any questions?

[19:48] Barry Smith: So, one topic I would like you to say a bit more about is the idea of the context window.

[19:55] Jobst Landgrebe: Yes. So, so going back to, to this here. So the context window is basically, um, the size of, of input material in which the transformer architecture of the, of the large language model is able to take into account relationships between the tokens. So, so, um, so 8,192 tokens, that's about—that are maybe, um, 2,000 words, right? Or, or something like this. And, and so this is the largest, at least in the default configuration of the LLM, the largest piece of what we would call, um, text pragmatic unit in, in which the LLM can hope to identify robust relationships. Um, for example, uh, a negating phrase that negates an entire paragraph—it, it, it could still have a chance of of understanding, or not understanding, but of basically modeling that, that the paragraph was negated.

But if you increase the—this higher, um, then, then more and more—so if you make the context window bigger, it's possible more and more relations are lost. And there's an experiment which is very interesting about the context window, which was published last year, where, um, somebody created, um, I think something like 30,000 sentences of the form: "Donald Trump owns three blue cars. Marilyn Monroe has, uh, two red balloons, three yellow balloons, and five green balloons. Um, James Dean has two white horses and two black horses," and so on. Simple sentences with a, with a person and objects possessed by the person and then the colors of the objects. And then if you give the, the LLM 50 such sentences and, and, and ask you to count how many red objects are in there, or how many white horses, or how many persons are in there, or even how many persons are in there that own more than three horses or so, it can do it. But if you increase the, the number of sentences above 50, the model—the ability of the model to count the objects in the text drops drastically, and above 200 sentences, it's only creating garbage.

Whereas every school child, um, age seven or eight years could painfully go through 500 sentences and, and, and keep track in a table of all the objects easily. It would be boring, but it's feasible. But the machine degrades already its ability to do that with more than 50 sentences. And this shows you how, how poor the whole context window thing is and how—and even with various configurations were tested with different context sizes, and it always failed. And that shows you that the model cannot think. It's just identifying patterns in a quite small range. And, and every, every—whenever you ask ChatGPT to summarize something for you, like I was asking it today, and where it hallucinated by the way, to, to tell me what is, what is Wahrheitssatz or Wahrheitssätze, or what they mean in, um, uh, it, it, of course, finds web pages where this is explained and then summarizes the web pages. But, but, um, but it, it, it cannot go through the works of Frees and do this by itself because the context window is much too small. So the, the info retrieval step here is extremely important, and, and, uh, only since this was introduced the LLMs have become to some extent useful. But still, uh, the, the, the story I just told about, or this about this paper with counting the objects shows you—is very sobering because it shows you how little they can really do. I hope this helps.

[23:44] Barry Smith: Yes, good, good.

Neuro-Symbolic AI Architectures

[23:44] Jobst Landgrebe: Now, we move on. Any other questions, or shall I move on? Move on for the moment. So here, there are different approaches shown how to combine ontologies and AI. So Gary Marcus believes that by doing this, we can create artificial intelligence. So this is—these are the, the methods that have been, um, uh, analyzed in the last or proposed in the last 20 years.

Sequential Approach: Um, this has been done very early on. I've been using this, this method here since 25 years, where you basically have like something like an ontology, then you use a DNN or regression algorithm, and then the output of this is put again into a, into an ontology or rules, and then you get an output. That's called the sequential approach. It's the only approach that can guarantee accurate results. And this is built into all the AI weapon systems—that the last component is usually symbolic to guarantee that the weapon system does its job, like, like the Patriot system is is set up like this, which is a missile interception system, but also, um, attack systems are set up like this when they have to act autonomously.
Nested Model: Then you have a nested model where basically you have, um, you have a, uh, um, either you have a symbolic function that takes as input some data, um, some symbolic context, and, and, uh, the output of a neural network acting on the input of the model so that you basically nest the two into each other. AlphaGo is the most prominent example that is very well known of a nested, um, neuro-symbolic system.
Cooperative Models: Then you have cooperative models where, where the, the reasoning—the, um, symbolic component here called reasoning engine and the, um, DNN take each other's output and update each other until a certain threshold is reached. They can also work very well. Here is shown the case of, of, um, image classification, for example—this is cooperative, um, function of neuro-symbolic AI.
Injection or Compilation: Then is what is the most popular thing that is done, mostly 90% of the papers are, are doing injection or compilation, where basically the three fundamental aspects of DNNs are, um, are, uh, updated or upgraded with symbolic components.
In Loss: So, in loss, what you do, you don't take a normal loss function, but you, you have here your loss function, right—the classical one—but now you add a symbolic loss function to it, which, which gives you more knowledge where you, where you use prior knowledge, um, uh, about, about the outcome that you could have if you just have the naked outcome. I'm just currently—I'm, I'm doing exactly such a project in the context of, um, uh, uh, usage of DNNs in, uh, in drug discovery, where we have—we have a measure, we have several measurement outcomes, but we don't get the latest loss, but we basically add prior knowledge at the molecular level to the loss function so that we get a better outcome and can, can have a better prediction. So this is one—this is, this is when the loss function is, is injected with, with prior knowledge.
In Activation Function: Then you can also inject the activation function. So here in the nodes of the neural networks, you have functions—activation functions—and you can now put a symbolic reasoning component into such a node so that this is not a stochastic component at this node, but a, um, an ontological one.
In Training Data: Or you can also, um, um, change the training data set by adding symbolic input, and which is—which I'm also doing in the context of this drug discovery project right now, where you manipulate the training data by adding symbolic knowledge to the training data.
Fibring Functions: And, um, so then, and then there are also, um, which use multiple interconnected neural networks via so-called fibring functions, which is also symbolic and ontology-based to enable them to collaborate. It's like creating interoperability between the neural networks by having an ontology, uh, that, that, that forces them to store the information a certain way so that they can reuse it, which is a bit similar to the way ontologies get used in various domains of, of computer science. And these are the methods that are available.

[28:08] Jobst Landgrebe: Here's an example I will skip for today, which is—which you can look up for yourself. It's by Cunning (2023). It's a very impressive example of a hybrid, uh, neuro-symbolic hybrid where you have a neural network that, that actually is used to create on the fly a database ontology using, um, inductive logic programming. It's a superb paper, and what it does, it classifies images but is highly superior to a purely, um, to a purely, um, uh, um, regression-based system. And, um, it, it leads to a very strong performance improvement in image classification if you compare it to puristic models, but only for a restricted problem space in which this ontological approach can be made to work. And the restrictions come from the data-to-knowledge generator, which is, of course, programmed by hand and which, which, um, which transforms the DNN output to logic output, which is then suitable for this inductive logic programming system that can indeed create a very restricted ontology on the fly based on the data that are fed into the system, and then it gives a classification output.

The Limitations of Ontologies and Complex Systems

[29:20] Jobst Landgrebe: So, um, now these are—we looked at methods to add, um, ontologies to neural networks, which are very impressive, but we now need to [ask]: why doesn't this lead us to obtain a true artificial intelligence or machine intelligence?

Now, I believe that first of all, ontologies by, by their very nature depend of course on human perception, abstraction, reasoning, and also intersubjective planning, um, and consensus. And that's true because most—almost all ontology projects are collaborative projects, and they need all of—they need planning and consensus and intersubjectivity. And, um, that, that, of course, um, uh, is very demanding and, and makes the ontologies also dependent upon the collective of people who worked on them. And that, that means that the world knowledge embedded in ontologies is restricted and highly context- and perspective-dependent. It depends on the context, the purpose, and the perspective of the ones who made the ontology. Therefore, the ontology scope is always local—there is no global ontology. It's always local. I mean, there are—there are global upper ontologies like BFO [Basic Formal Ontology], but as soon as you make a domain ontology, it's always local and restricted to, to the perspective of those who made it. And that restricts, of course, also the usage of the ontology to local problems, to special problems, like the—the ontological part in AlphaGo, that is, of course, um, has to do with the—with the game of Go and the constraints of the game of Go.

Also, ontologies contain static knowledge and cannot be dynamically altered, in contrast to active perception that humans and animals can do. And so they are very inflexible, and they have also, because of the way they are created, quite high, uh, lead times to be revised. So it takes quite a long time to revise an ontology because it has to go through this consensus effort again. They can also only capture regularities, but complex systems produce irregularities all the time, and, um, and all do, and they do this in non-predictable patterns. So, and also neuro-symbolic AI systems need to be carefully prepared and configured for the task at hand. So they are not intelligent, but they're useful automata. All of this doesn't—doesn't mean that I think ontologies aren't useful. They are highly useful—they are great, but they have their limitations. And so that they basically improve AI models, but cannot create machine intelligence, mainly for the simple reason that intelligence is the ability to react without prior training and preparation, but ontologies are, of course, a highly prepared artifact that is then inserted into the model.

[32:03] Jobst Landgrebe: Um, at the end, I want to, to, um, uh, show you a tiny part of our book, Why Machines Will Never Rule the World, which is the—which is a table of the thermodynamic properties of complex systems. And what this table shows you, but I will—what I will not elaborate about, um, today too much, is that basically the complex systems have, have properties that make their mathematical model—their complete mathematical modeling or their holistic mathematical modeling—impossible.

For example, they, they, they change—they can change not only the elements that are in them, but the element types. And an element type corresponds to a coordinate in a Cartesian coordinate system. And so that means that the coordinate system may change anytime, but if the coordinate system changes, um, the model—I mean if, if something changes it forces you to change the coordinate system, the mathematical model breaks down. You cannot have a mathematical model with a flexible, um, vector space. The vector space needs to be defined before the model is created. But this here leads to alterations of the vector space.

You also have the problem that complex systems have non-ergodic phase spaces, which means that, um, there are no regularities with regard to certain aspects of their behavior. And if you have this, what's worse is not only can you not deduce or, or, sorry, induce, um, mathematical equations explicitly from them because they don't create the pattern, uh, but you can also never draw a representative sample taken, uh, from, from such a distribution because there is no representative sample—there is no regularity from, from which to sample. And there are many other details of this we we will—I will not elaborate on today, but essentially all these—these properties of complex systems make sure that, that mathematics can only create very partial models of them.

Dominant vs. Non-Dominant Characteristics

[33:51] Jobst Landgrebe: And so to wrap it up, um, complex systems have many properties that lead to regular patterns. One could now object, right? So, in humans, um, these are the vital processes like breathing, heartbeat, sleep, drinking, eating, digesting, or sexuality. However, these vital processes can also be temporarily interrupted because the higher functions of the human being are very, um, erratic and they are arbitrary. So you can even stop breathing for a while. You can have coitus interruptus. You can—some people can manipulate their heartbeat, the heart rate. Um, you can stop eating for many weeks and so on. But, but the—but, but basically these are regular, and so you can have, because of this regularity, mathematical models that are adequate. For example, the ECG models that are now available are absolutely perfect. And this is because there's so much regularity in the way—both in, in the healthy state and also in the various disease states of the heart—how the, um, electric signal is conducted to, to, to create the mechanical action of the heart is so regular in, in physiological and pathological cases that you can have a perfect mathematical model of that to interpret the ECG. And this shows you that also for, for complex systems, you can have partial regularities that you can master with, with LLMs or other stochastic systems.

However, the dominant characteristics of complex systems are those that distinguish their bearer from other systems and have no patterns. So these dominant characteristics have no patterns. So in humans, for example, consciousness, self-awareness, feelings, decision-making, intelligence, and language are dominant characteristics, and they are patternless. If you think of it, you can never foresee, um, the stream of consciousness of yourself, even how it will be on the next day. You can never foresee how a conversation will go. Um, you cannot predict any decisions that humans make and so on. And so these are patternless dominant characteristics of complex systems. They again have some patterns—like language has syntax as a pattern. Why does it have syntax as a pattern, uh, and, and also the vocabulary as a pattern? Because otherwise communication wouldn't be possible. But beyond syntax and vocabulary, the languages have—have—have no regularity. You can express whatever you want as long as you keep the syntax, because otherwise you cannot be understood. But, but as long as you keep to the rules of the syntax, you can then express what you want, and it's completely patternless.

[36:23] Jobst Landgrebe: Um, let's look at a simpler example in human language: the sun. So if you think of the sun, um, uh, its basic property is gravity mass, and this property it shares with the earth's mass and its other satellites, and this leads to a very regular behavior. That, that, that, that was the first model that was used to induce modern physics by Galileo. He observed regularities in the movement of the planets and, and then he made the—the Renaissance model or the modern-age model, uh, uh, mathematic—of mathematical physics. And, and, um, this was possible because this—this non-dominant property of the sun as a gravity mass is highly regular. However, as a do—as a, as a nuclear fusion emitter of radiation, this is its dominant property, because this it—it doesn't share with its—which, which the satellites [don't have]. The satellites are no suns—they cannot emit a nuclear fusion radiation. And, and so basically, this dominant property is extremely irregular, and, and we see this, for example, from climate change, which is mainly driven by the changes of activity of the sun. It's unpredictable when it will occur, how much it will occur. Many, many properties if you observe the sun, astron—and astronomers have done this, they, they see that there's very little regularity in the phenomena of radiation that the sun is emitting. So this is a dominant property, and it's again not mathematically, uh, um, it's not amenable to mathematical modeling.

So basically, what I want to say with this is that dominant, uh, properties of complex systems can never be mathematically mapped due to the limitation of mathematics to regular patterns, and since ontologies also represent regularities, ontologies won't help. So by adding an ontology of radiation—that, that there is, of course, a lot of, lot of knowledge about, about, um, about the—the, the, the radiation that results from the nuclear fusion that happen—happens in the sun—however, if you would add an ontology of this to a model of the—of the neural network-based model of the sun, it will not help very much. You still will not be able to predict this dominant property of the sun as a nuclear fusion emitter. And so with this example, I'd like to close, and now you're welcome to ask further questions. Thank you very much.

Q&A: Thermodynamics and Final Summary

[38:53] Barry Smith: So, uh, you were introducing the idea of a complex system, and, uh, you mentioned, um, um, thermodynamics as being the, uh, science in terms of which we are to understand complex systems. Uh, can you explain, uh, how complex systems and thermodynamics are connected?

[39:15] Jobst Landgrebe: Yes. So, so basically, thermodynamics is just a science part of physics that, that was invented to explain—to understand, um, patterns of micro—of microstates. For example, how are, in a bottle of water, how are the molecules of water distributed, and what is going on with them? And, and Boltzmann recognized that there is—there is this notion, um, of ergodicity, which means that if you take a, let's say, a glass of water in thermodynamic equilibrium—so that means at equal temperature, that is surrounding equal pressure, um, constant temperature, um, constant volume, um—then you, you—and now you count, you, you count—you give every molecule of water in the glass an identifier. Now you make a big table where you have on the—you list all of them, and in the second, third, and fourth column, you have the $x$, $y$, and $z$ coordinates in space. And now, I mean, for this you would need to have a Laplacian demon who can count all of them, determine all their places, uh, instantly before anything changes. Let's assume you have this. Now you have this table. Now if you wait long enough, you have a 100% chance that now the molecules will move and start [changing] their position, but you have in an ergodic system a 100% chance that they will all come back to the position, um, that they have in your original table over time. And that's because—because ergodicity means that, that there's an identical probability for, for each model to be somewhere, uh, in the glass—each molecule to be somewhere in the glass, and this is a thermodynamic property.

And so thermodynamics describes such properties, um, of, of systems. And the systems to which physics applies, where we can—we can, um, uh, we can have nice physics equations, they are usually ergodic. And the non-ergodic systems—like, like when I shake the water, heat it up, it becomes non-ergodic, and I cannot—I cannot—it cannot say what is going to happen to the molecule. Especially when I heat it up, the molecules are going to, to, to turn into, um, into steam, into gaseous form, and they're going to, to leave the, the, the compound of the, of the, of the liquid, and then they are going to go somewhere in the universe, we don't know where. And so basically, when, when I put energy into the system, it, it becomes complex, and it loses the predictability, and that's called drivenness, which is, which is the behavior of a system in which energy is inserted into. And, and now you see that how I can use the fundamental notions of thermodynamics to characterize more and more aspects of the system, and this is the relationship of thermodynamics and, and, and the system. So I have a certain system, and now I can check what properties it has, and, and by this I can understand whether it is amenable to, to, to mathematical predictability or not. And this is what, what, what the science of thermodynamics has, has found out, um, since it was developed since the 1880s.

[42:21] Barry Smith: Good. So are you, um, happy? Do you have anything that you would like to add as a final word?

[42:39] Jobst Landgrebe: Yeah, I think—I think the, the big take-home message is, is that, is that, um, that ontologies, although they're absolutely great and you can have much, much better, um, models—much also with much less data, right, so that you can realize many of the requirements shown here by adding ontologies to, to, to stochastic AI—the take-home message is that just they will still not create general artificial intelligence or machine intelligence.

[43:00] Barry Smith: Good.

dotemacs/README.md

Select an option

No results found