Translated from Chinese https://archive.is/M5sDx by ChatGPT-3.5

Go: A Human Counterattack?

Lu Changhai

Last month, a news article titled "Man beats machine at Go in human victory over AI" caught my interest. The article was first published on February 17th by Financial Times. However, I was traveling in London at the time and missed the news by a few days. So, I decided to wait for follow-up news before discussing it. But as I waited, there were no further updates, so I decided to revisit the topic and discuss it again.

The news was about an American amateur 6-dan Go player, Kellin Pelrine, who used the vulnerabilities of the open-source Go programs KataGo and Leela Zero to defeat them with overwhelming superiority.

Before 2015, this type of news would not have made headlines because Go programs were far inferior to human players. However, since DeepMind's Go system, AlphaGo, defeated the French Chinese professional 2-dan player Fan Hui with a score of 5-0 in October 2015, and then defeated the 14-time world champion South Korean professional 9-dan player Lee Sedol with a score of 4-1 in March 2016, Go has been overshadowed by artificial intelligence. I have discussed this interesting history in my article "From Deep Blue to DeepMind".

Because I have discussed this history, the news article piqued my interest. And because of my interest, even though it's late, I want to revisit and discuss whether Pelrine's victory over KataGo and Leela Zero can be seen as a "human counterattack" that reverses the tide in the field of Go.

Let me start with the two thoughts that came to my mind when I read the news. They are two questions:

How does the strength of the defeated Go programs, KataGo and Leela Zero, compare to those Go systems introduced in my previous article, such as AlphaGo and AlphaZero?

How quickly can KataGo and Leela Zero patch up their vulnerabilities? Is it done through direct programming or through "self-learning," including using the vulnerability to accelerate "self-learning"?

I wanted to wait for more follow-up news before discussing these two questions. Unfortunately, up until today, there is still no further information related to the news. It seems that Pelrine's victory over KataGo and Leela Zero is not comparable to AlphaGo's victory over Lee Sedol in terms of sensationalism.

Since there is no follow-up news, let's revisit the original news and supplement it with some technical information to discuss these questions.

First of all, it should be mentioned that the vulnerabilities exploited by Pelrine in KataGo and Leela Zero were discovered by computers - and Pelrine himself was a member of the research group that made this discovery. Specifically, the vulnerabilities were discovered through a series of machine-to-machine "adversarial machine learning", with KataGo as the "target". The paper on this discovery was published in November 2022, entitled "Adversarial Policies Beat Superhuman Go AIs". The fact that Go programs have vulnerabilities and that these can be discovered by computers is not particularly surprising, as all current Go programs are far from exhaustive in exploring the game's variations and cannot strictly implement the theoretically optimal strategy, so vulnerabilities are inevitable. And since vulnerabilities exist, it is still possible to discover them with computers, even if human players cannot. However, the vulnerability discovered this time has a special feature, in that its principle is not difficult, and the strategies that can be derived from it do not involve complex calculations and are even quite intuitive for humans. Therefore, once discovered, even human players can use it (although precisely because it is intuitive, this kind of strategy is easy for human players to spot and is useless in high-level human games). ¹

Perhaps because the vulnerabilities were discovered by computers, the importance of human players was greatly reduced in the news, and inevitably the sensationalism of the news was weakened. On the other hand, AlphaGo, which defeated humans initially, and subsequent versions - such as AlphaZero - have already "retired", and the programs defeated now, KataGo and Leela Zero, are simply based on AlphaZero's design philosophy and developed into new programs. Although KataGo and Leela Zero had already surpassed human players' level before this vulnerability was discovered, defeating them was not equivalent to avenging AlphaGo's victory over humans, but more like defeating the "enemy's" disciples. This may have further weakened the sensationalism of the news.

Despite the fact that the "enemy" has already withdrawn, it is still worthwhile to explore the level of KataGo and Leela Zero compared to AlphaGo and subsequent versions, especially AlphaZero (which was the first question that came to mind when I read the news). I searched online and couldn't find much information, so it seems that a direct answer does not exist. However, in the aforementioned paper, there are some clues that may allow us to make some guesses about the answer to this question.

According to that paper, even before Pelrine played against KataGo and Leela Zero, researchers had already used the vulnerabilities they discovered to play thousands of games against KataGo using a computer, with the result being that if KataGo did not use tree search, its winning rate was as high as 100%; if KataGo used a search with 4,096 visits at each step, its winning rate was about 97.3%; and if KataGo used a search with 10,000,000 visits at each step, its winning rate was about 72%. ²

On the other hand, the paper mentions a related study that was conducted at the same time as the study mentioned above, with the "target" of that study being AlphaZero - to be exact, a copy of AlphaZero with significantly lower training levels than the "official" AlphaZero. That study also found vulnerabilities, and by using those vulnerabilities, they played games against AlphaZero using a computer, with the result being that if AlphaZero did not use tree search, its winning rate was about 90%; and if AlphaZero used a search with 800 visits at each step, its winning rate was about 65%. Although the paper did not explicitly state whether the vulnerabilities found by the two studies were the same, it claimed that the fact that the two studies arrived at similar conclusions indicates that the vulnerabilities of programs like AlphaZero, which are capable of "self-learning" in chess, are not only programming errors but also design flaws. ³ This claim suggests that the vulnerabilities found by the two studies are highly correlated or comparable, because otherwise, two unrelated vulnerabilities that are not comparable could be completely possible in two separate systems.

If two studies find vulnerabilities that have a high correlation or comparability, we can use them as clues to indirectly compare or speculate on the relative levels of KataGo and AlphaZero.

Specifically, from the quoted win rates in that paper, it can be seen that AlphaZero's performance with 800 visits (i.e., opponent win rate is about 65% -- corresponding own win rate is about 35%) surpasses that of KataGo with 10,000,000 visits (i.e., opponent win rate is about 72% -- corresponding own win rate is about 28%). Although different programs were pitted against each other, and even the definition of "visit" involved in their searches may not be the same, this difference is unlikely to be greater than the huge difference between 800 and 10,000,000. In fact, that paper mentioned that when using 10,000,000 visits, KataGo takes over an hour on average to make a move, clearly reaching its limit (even exceeding the time allowed for regular games), while with 800 visits, AlphaZero only corresponds to the level of high-level human players and has not yet reached its limit. Comparing the two, at least in terms of their ability to handle such attacks (which is the substantial meaning of this comparison in this article - the following statements are also limited to this meaning and not a comprehensive comparison of the chess skills of the two), AlphaZero is obviously much more powerful than KataGo. And considering that AlphaZero can reduce the opponent's win rate from 90% to 65% with just 800 visits, and that its training level is significantly lower than the "official" AlphaZero, a fully-powered AlphaZero is unlikely to be matched by the opponent. ⁴

It is generally believed that KataGo is currently the strongest open-source Go program, or at least one of the strongest. Therefore, its level relative to AlphaZero can largely represent the level of Leela Zero relative to AlphaZero, or the upper limit of their level. Therefore, although it cannot be proven conclusively, in my opinion, KataGo and Leela Zero are likely far from being as strong as AlphaZero (which is my answer to the first question, albeit speculative). And if KataGo and Leela Zero are far from being as strong as AlphaZero, then their defeat cannot be extrapolated to mean AlphaZero's defeat. Correspondingly, even if we broadly classify Pelrine's victory over KataGo and Leela Zero as a "human counterattack," it is perhaps only a tactical and localized victory, not a strategic victory that can reverse the situation. What's more important is that the development speed of artificial intelligence far exceeds the evolution speed of human intelligence, especially in the field of board games. Therefore, no matter how much we elevate this "human counterattack," from a long-term perspective, it is destined to be short-lived.

In the process of waiting for further news, there was one thing that surprised me a bit, which was that besides Pelrine, no other chess player seemed to have taken advantage of the loophole discovered in order to "rise from obscurity" and leave a personal record of defeating high-level artificial intelligence Go programs, like taking a photo at a tourist attraction, or even taking the opportunity to gain some rating points if playing against KataGo or Leela Zero can affect their rating. Considering that Pelrine not only has a chess level equivalent to an amateur 6-dan, but also happens to be a member of the research team that discovered the loophole, if he is the only one who has defeated KataGo and Leela Zero with such a "dual identity", then one cannot help but suspect that the threshold for exploiting the loophole may be quite high - and may even depend to some extent on the latter identity that ordinary chess players do not have. Of course, it is also possible that other chess players with sufficient skill and their own dual identity may not want to play such a bad and even somewhat dishonest game. But for whatever reason, if no other chess player achieves victory like Pelrine (strictly speaking, not only must they win, but they must also win against a sufficient number of "visited" searches to be an effective record of "human counterattack"), in my opinion, it will have a significant negative impact on the positioning of Pelrine's victory against KataGo or Leela Zero. Because it turns this matter into a kind of "isolated evidence" event. And if it is really an "isolated evidence" event, it is necessary to take a more critical look. In that light, to be frank, Pelrine's "dual identity" not only undermines representativeness, but also undermines his credibility as a purely human player who does not rely on computers during the game. In that light, the lack of follow-up news in the whole story is somewhat consistent with the principle of "isolated evidence does not stand."

Before ending this article, let's talk about the second question that came to my mind when I read the news: how quickly can KataGo and Leela Zero fix the vulnerabilities? Unfortunately, this issue, which should have been of common concern to both chess players and program developers, also lacks follow-up information. The only information I found online related to fixing the vulnerabilities was a discussion titled "Adversarial Policies Beat Professional-Level Go AIs" in the KataGo "community" on GitHub (numbered #705). The discussion began on November 27, 2022, shortly after the paper mentioned earlier was released, and discussed the vulnerabilities mentioned in the paper (even the title of the discussion was obviously inspired by the paper's title). However, perhaps because, as mentioned earlier, the existence of vulnerabilities in Go programs and the discovery of them through computers are not particularly impressive, and the vulnerabilities were still at the machine level (i.e., not yet exploited by human players), the entire discussion ended in mid-December 2022 without showing any urgency or mentioning specific measures or timelines for fixing the vulnerabilities. As for whether KataGo and Leela Zero, as programs with "self-learning" capabilities, have evolved to automatically fix the vulnerabilities, at least for now, I am not aware of it.

The above is my belated chat about the aforementioned news, part of which is based on my guesswork based on the information I have seen. I hope I did not miss any important information, but if I did, please feel free to correct me ⁵.

A careful reader may ask whether KataGo and Leela Zero, which were defeated by Pelrine, conducted tree-graph search, and if so, how many "visits" were used in the search. Unfortunately, I did not find the answer in the materials.

After describing for a while, what is the vulnerability and the strategy derived from it that "is not particularly difficult in principle" and "has considerable intuitiveness for humans"? I'm sorry, I'm not a Go player, so let me summarize with the words of a Weibo friend (@万精油微博): "The tactic is to play stones in various places on the board to create several dead stones. After AI Go recognizes these dead stones, it will no longer assign them much weight, but will consider other places on the board. The main means of this tactic is to use these dead stones to capture the opponent's stones in a reverse manner." ↩
The concept of "visit" is not defined in the paper (perhaps it is common sense for readers in the field), but I guess it refers to the number of times a node is visited during tree-graph search. However, the specific definition does not affect the analysis in this article, because only the relative number of "visits" is used. ↩
As mentioned earlier, KataGo and Leela Zero are programs developed based on the design philosophy of AlphaZero, and therefore belong to the "type of program like AlphaZero". As for the meaning of "self-taught" in Go skills, please refer to my article "From Deep Blue to DeepMind". ↩
This conclusion relies on many assumptions, so it is only speculative. If researchers can use a program that defeats KataGo to challenge AlphaZero, especially AlphaZero at its full strength, they can make a judgment on it. Unfortunately, AlphaZero (including its replicas) is neither an open-source program nor freely available, so at least as far as I know, such a challenge is not currently part of any research group's research plan. ↩
A Twitter user (@Junyan_Xu) provided some information including discussions on Go websites. It showed that shortly after the publication of the paper mentioned earlier, a 5 kyu amateur Go player found a loophole and defeated KataGo using a search of 50 "playouts" (which typically corresponds to more than 50 "visits" as it does not count repeated visits). This was on December 1st, 2022, about two and a half months before Kellin Pelrine defeated KataGo and Leela Zero. Some players also expressed their intention to try and see how long it would take to reach 9 dan using this loophole, but there were no updates on this later on. Additionally, some people mentioned that KataGo has evolved to address this loophole, and now some previously difficult moves can be played with only about 5,000 "visits" instead of millions as before. This information provides some corrections and additions to the article and was added on March 11, 2023. ↩

alreadydone/围棋：人类的反击？(卢昌海).md

Go: A Human Counterattack?

Lu Changhai

Footnotes