OpenAI's GPT-4 consistently "beats The Turing Test" in the news. But we are very clear that the "predicting the next word" mode adopted by GPT has obvious limitations, the strengths and weaknesses of GPT-4 are qualitatively the same as before. The problem of hallucinations is not solved; reliability is not solved; planning on complex tasks is (as the authors themselves acknowledge) not solved. I think it's time to explore new Turing test methods.
One of the better ways I can think of to determine if AI have human thinking skills is by asking questions about unsolved problems in mathematics. By raising the question that there is no answer yet, in other words, by putting AI and humans in the same situation with complex & unsolved problems, to detect whether it can have original thinking and complex reasoning capabilities.
i.e.
[The Collatz Problem](https://mathworld.wolfram.com/Col