Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

Introduction

In recent years, artificial intelligence (AI) large language models have made rapid advancements, and the impact of AI on human society has expanded to an unprecedented extent. This article discusses some immature insights into the AI revolution brought by large language models from two physics-related perspectives—information and time scales. It first reviews the basic principles and recent developments of large language models, then discusses how to view the significance of large language models from the angles of information dynamics and complexity. Based on a comparison between AI models and human cognitive systems, it will also explore the next development directions of AI and the exploration and development of AI agents.

How developing complex systems theoretical methods can guide the future design of AI has become a topic of great interest.The AI By Complexity reading group was jointly initiated by Assistant Professor You Yizhuang from the University of California, San Diego, Associate Professor Liu Yu from Beijing Normal University, PhD student Zhang Zhang from the School of Systems Science at Beijing Normal University, Mu Yun, Master student Yang Mingzhe, PhD student Tian Yang from Tsinghua University, to discuss how to understand the mechanisms of complex systems from perspectives such as complex networks, statistical physics, algorithmic information theory, causal emergence, free energy principle, and self-organized criticality. This understanding may inspire us to design better AI models. The reading group started on June 10 and is held every Monday evening from 20:00 to 22:00. Friends engaged in related research fields or interested in AI+Complexity are welcome to sign up for the reading group exchange!

Research Areas: Large Language Models, Artificial Intelligence, Information Dynamics, Complexity, System 1 and System 2, AI Agents

Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

Qi Xiaoliang | Author

Physics, 2024, Issue 6 | Source

1. Introduction to Large Language Models

As the background for this discussion, I will briefly introduce the basic principles of large language models. The goal of a language model can be summarized as “learning to speak like humans.” For instance, when asked, “Where does the sun rise?” humans would answer, “It rises in the east.” Therefore, for the model to learn to speak like humans, it also needs to learn to answer, “It rises in the east.” Essentially, a language model is a function:

Here, w represents the model’s parameters (weights), x is the input sentence, and y is the output sentence. Training a language model involves adjusting a large number of parameters w so that the output y is as close as possible to human responses for various potential inputs x.

But how do we define “close to human response”? Clearly, the same question can yield different answers in different contexts. It is impossible to require AI to match every human answer exactly. This imitation of humans can only be probabilistic: using a large corpus as training data, these training data define a conditional probability p(y|x), which indicates how many possible different outputs there are given input x and how the probability distribution looks. The task of the language model is to simulate this probability distribution. The language model as defined has a long history. For example, Claude Shannon, the founding father of information theory, had a famous work[1] that pointed out the limits of information compression and defined the well-known information entropy. This paper discusses how to generate strings similar to human language based on the probability of subtitle appearance (Figure 1).

Figure 1: The language model studied by Claude Shannon in his paper on the source coding theorem.

More specifically, current language models use a method called “next token prediction” to generate sentences. Language is divided into minimal units called tokens (which are smaller than words in English and correspond to individual Chinese characters in Chinese), so the input text can be seen as a series of tokens x₁, x₂, x₃,…, x_n, and the output is the next token x_n₊₁. The sentence generated by the language model is achieved by repeatedly calling the same function (Figure 2):

If the model feels that the conversation is complete, it will output an end symbol, indicating that the answer has concluded, and the answer will be returned to the user.

Figure 2: Schematic diagram of large language models. The input content (pink) undergoes calculations to predict the next word (green), iterating this process.

The most powerful large language models currently adopt a model architecture called transformer[2]. In this architecture, the text is first mapped into high-dimensional vectors. For example, if each token is mapped to a 100-dimensional vector, then the input of 10 tokens would be a 100×10 matrix. Through multiple layers of nonlinear operations, the output is a vector of the same dimension, which is then mapped back to the output text x_n+₁ (Figure 2). The details of this nonlinear operation will not be elaborated here. Compared to earlier machine learning models, the transformer model has two core advantages: one is non-locality—any two input tokens may have either strong or weak relationships, theoretically allowing the model to handle associations between words that are far apart; the second is that the transformer architecture is particularly suitable for parallel computation on GPUs, enabling the model to have a very large number of parameters, reaching hundreds of billions.

Since the transformer was proposed in 2017, companies like Alphabet and OpenAI have developed increasingly advanced transformer models. OpenAI launched the GPT-3 model in 2020 and then released GPT-3.5 in November 2022. GPT-3.5 and the subsequent GPT-4 allowed a wide range of individual users to directly experience the model through a dialogue interface, bringing about a huge and widespread impact, marking the beginning of an accelerated period in the development of large language models, with hundreds of models being developed and their capabilities rapidly increasing, many of which have chosen to be open-sourced. Figure 3 shows recent evaluation results, indicating that models such as Claude, GPT, and Gemini are performing remarkably well in undergraduate-level knowledge, graduate-level mathematics, and programming.

Figure 3: Evaluation results of the model Claude 3 released by the American AI company Anthropic in March 2024, where the three models Opus, Sonnet, and Haiku in the red box are different versions of Claude 3, with decreasing capabilities (image source: https://www.anthropic.com/news/claude-3-family)

It is quite astonishing that large language models have achieved such capabilities based on the simple goal of “predicting the next word.” Of course, the description of model training above is overly simplified; in reality, to train a truly usable model, in addition to the massive data training process described above (called pretraining), it must also undergo fine-tuning and reinforcement learning from human feedback (RLHF). Roughly speaking, the pretraining process equips the model with foundational capabilities, while the main goal of fine-tuning and RLHF is to make it more focused on conversational scenarios, understand human intent, and adhere to social norms (for example, not providing harmful answers or information).

As the number of parameters in large models continues to increase, people have noticed the “emergence” of new capabilities (emergence), for instance, logical reasoning abilities that were not specifically targeted during training spontaneously emerge as the number of parameters and data increases. Another manifestation of emergence is the “transfer learning” of different abilities; for example, after extensive training in programming, it is found that the model’s logical reasoning ability in other contexts has also significantly improved. In a sense, the emergence of capabilities in large models is not a new phenomenon but a trend that has been ongoing since the deep learning revolution sparked by Fei-Fei Li’s creation of ImageNet in 2012: more data and more parameters lead to improved intelligence levels compared to manual design. In Chinese, this is often summarized as “great effort brings miraculous results.” OpenAI’s surpassing of Alphabet (Google), which is much larger in scale, is largely attributed to their earlier and more determined advancement along this path.

So does this mean that the problems of artificial intelligence have been solved, and that achieving human-level or superhuman-level intelligence only requires more data and more computation? Is the revolution we currently see in language models merely one of many models in AI development, or does it hold special significance? This article will attempt to explore these questions based on some immature insights from the author.(Some viewpoints in this article are based on the author’s previous article from last year[3].)

2. Critical Points of Information Complexity

The rapid development of large language models has excited many people and has been compared to significant historical moments such as the invention of the iPhone, the invention of the internet, and the industrial revolution. This comparison primarily considers their functionality. From a physics perspective, I hope to find an intrinsic criterion. It is akin to studying phase transitions in condensed matter physics, where we usually first identify an order parameter and then determine whether this order parameter undergoes some qualitative change. For AI, if we target a specific task, such as those listed in the test results in Figure 1, a simple critical point criterion would be whether the AI’s score can reach or exceed human levels; however, this is clearly not the goal of today’s language models. The greatest feature of language models, compared to earlier AI models, is their generality. Although the levels they can achieve in different tasks vary, their goal is clearly to encompass human capabilities in all fields, especially after recent significant advancements in multimodal models.(It should be clarified that the language models referred to in this article are broadly defined, including multimodal models built on similar principles. “Language” is a means of communication, just as it is for humans, and can take various forms such as video, audio, and text.)In such a broad field, if we are to seek a universally applicable criterion, I believe we should choose the perspective of information.

Let’s first review what information is. Essentially, information is a measure of reducing uncertainty. For the same seven characters, the information content of “It snowed in Sanya in summer” is much greater than that of “It snowed in Liaoning in winter,” because the latter is much more probable. Thus, the information content of a message i is a function of the probability of that event p_i: I(p_i). If an event has i=1, 2, ⋯, n different possibilities, then the average information content is Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

. If we require the information content of two unrelated messages i and a to be equal to the sum of their individual contents, this would require I(p_iq_a)=I(p_i)I(q_a), from which we learn that I(p_i) is a logarithmic function, which is the information entropy defined by Shannon. The amount of information contained in a message depends only on this probability and is unrelated to whether the message is conveyed through phone, text, or verbally. This reflects the particularly universal aspect of the concept of information. All human behaviors, as well as all physical processes, are accompanied by the propagation and evolution of information, or, to use a more precise term, we can refer to them as information dynamics (information dynamics) processes. For instance, the cosmic microwave background radiation observed today provides us with information about the early universe. The microwave background radiation originates from a moment in time when the universe became transparent. Before this moment, the universe was opaque, and photons would be scattered continuously, preventing us from receiving information from that time directly. From the perspective of information, we can say that at the moment the universe became transparent, a qualitative change occurred in information dynamics, transforming the information carried by photons from fleeting to capable of traversing billions of years. A similar qualitative change occurred at the “moment” of the emergence of human language (of course, this is not a specific moment but may be a long evolutionary process). Before the emergence of language, humans and other animals could communicate information, but the content of that information was too limited, and its use was confined to the present. In the long term, information could only be passed down across generations through genetic inheritance and variation. Therefore, an organism’s adaptation to a new environment could only occur through natural selection over a long time scale. The emergence of human language, or more accurately, the attainment of a universal level of language capable of describing various complex scenarios and thoughts, fundamentally changed this situation. Even in the era without written language, humans could accumulate valuable experiences through oral transmission, developing complex skills such as agriculture. If one person invented the wheel, all others would not need to invent the wheel again; they would only need to pass down the technology for making wheels. Today, compared to ten thousand years ago, the differences in genes and IQ can be largely neglected, but the ability to establish such complex social structures and create brilliant science, technology, and culture, from the perspective of information dynamics, is attributed to a new information carrier—language—and a new information dynamics process—human thought and communication. To summarize, the period from the emergence of life to the emergence of language can be termed the “DNA era,” during which the primary carrier of long-term effective information was DNA, and the decisive information dynamics process was genetic variation and natural selection. Since the emergence of language (about a hundred thousand years ago), we can refer to the era as the “human language era,” where the decisive information carrier is human language, and the key information dynamics process is the processing of language (through human brain thought and communication), recording and transmission.

Based on the above discussion, let us think about the significance of the language model revolution from the perspective of information. Since the invention of computers and the internet, the propagation and processing of information have become much faster than before, especially since the advent of the mobile internet era, many aspects of our lives have been profoundly changed by these new technologies. However, if we delve deeper into how machines can process information, we will find that before the emergence of large language models, the processing of information by machines was still quite different from that of humans. The key difference lies in complexity. Roughly speaking, the computational complexity of a task measures how many operations are needed to complete the task given a basic unit (such as a logic gate), while information complexity is defined as how many operations are needed to generate such information starting from a given initial condition. For example, a search engine needs to perform complex calculations based on a large number of web pages’ links and user usage data to provide recommendations, and this computational complexity far exceeds what a human brain can handle. However, when measuring complexity, we must also consider the complexity of the input and output information. Although the search engine has high computational complexity, the information it outputs is strictly limited—the recommended content is all human-created, and the machine is only responsible for ranking. Consider other functionalities we use daily (such as sending emails, hailing rides, and navigation); we find that smartphones and computers essentially act as information transporters: they help us improve efficiency but do not engage in complex information processing. Another example is AlphaGo: its information processing complexity is evidently higher than that of humans, but it is limited to the specific task of Go. In both examples, there are information bottlenecks: in the three stages of input, processing, and output, at least one stage’s complexity is limited, resulting in the overall tasks that machines can complete being limited, and they can only accomplish one task at a time, passing information to humans.

The emergence of large language models has revolutionized this situation: the input, processing, and output complexities of large language models have reached levels comparable to those of humans (Figure 4). As mentioned earlier, language is the carrier of human civilization, and everything humans do can be described in language. Although the natural language processing capabilities of large language models have not yet reached human intelligence levels, their complexities in processing language and text have reached levels comparable to those of humans, at least in conversational contexts. It can be said that large language models mark a crossing of the critical point in the information processing complexity of machines. Compared to earlier computers, large language models have eliminated information bottlenecks. If this judgment is accepted, its impact is immeasurable. With sufficiently complex input-output capabilities, the output of one model can directly become the input of another, allowing models to build complex collaborative networks, just as human individuals construct social organizations. Once the collaboration between models exhibits a synergistic effect, the development of intelligence will enter a new phase of exponential growth. This is akin to phase transitions in physics: the behavior of each electron spin in a magnetic material does not differ much above and below the critical point, but what determines the qualitative change in the macroscopic properties of the entire system is whether the order increases or decreases as the spatial scale expands and the degrees of freedom increase.

Figure 4: Comparison of the information input, processing, and output complexities between large language models (LLM) and previous machines (such as AlphaGo, Google). The dashed line represents human levels.

AI that crosses the critical point will quickly become an information processor on par with humans. Today’s language models, including multimodal models, process information with vectors as their basic units (vector). Human language and multimodal data are translated into vectors for computation through a mapping called embedding (embedding). It can be said that vectors are the language of AI. Today’s AI revolution signifies that the information carrier has partially shifted from human language to vectors, and the decisive information dynamics process has shifted from human brain thought to computations in GPUs. In this sense, the revolution of language models is of the same level of significance as the emergence of human language (Figure 5).

Figure 5: Phases of Earth’s history according to the decisive information dynamics processes.

3. The Fast and Slow of AI

3.1 Human Cognitive Systems

The next question is what differences exist between today’s AI and humans, and whether these differences are fundamental. To understand this question, we first need to understand the human cognitive system.

The human brain has different solutions for different tasks. At the shortest time scales (a fraction of a second), humans mainly rely on instincts to react, such as emergency evasive actions, subconsciously completing familiar tasks, and answering questions without thinking. This intuitive system has been named System 1 by Daniel Kahneman (system 1) (“Thinking, Fast and Slow”[4]). System 1 is characterized by fast responses but slow changes. For example, the habit formed by riding a bicycle cannot be immediately adjusted when switching to a tricycle; often, even if a person realizes what the correct approach is, they still cannot do it immediately and need new training to establish new habits to master new skills. When encountering more complex problems that cannot be solved intuitively, we activate another system to solve the problem through conscious thought, which is usually referred to as System 2 (system 2). Compared to System 1, System 2 has several main characteristics:

(1)Using language to think. System 1 may also involve language, but here language is only used for output, not for the inner world. System 2 uses language for reasoning, which is crucial for processing different tasks step by step.

(2)Calling on memory. System 1 also retains some information in memory, but memory is not essential for System 1. Many instinctive reactions do not remain in memory. For System 2, memory is essential because the thinking process must be preserved in memory and often requires recalling past experiences and reflecting on whether one’s actions have met expected outcomes. The ability to call on memory in System 2 is crucial because past actions accumulate experience, making it easier to perform the same task next time.

(3)Completing a task is slower than System 1, but can quickly change methods. For example, when solving a math problem, if I am accustomed to using one method, I can switch to a new method immediately if someone teaches me a new approach that makes sense, without needing a large amount of data for training.

From this analysis, we can see that the distinction between System 1 and System 2 is based on time scales. System 2 exists to handle more complex problems over longer time scales than System 1 (for example, several minutes, days, or years). The distinction between System 1 and System 2 is a rough binary classification; more accurately, System 2 encompasses various cognitive activities across time scales ranging from minutes to decades.

If we further inquire why humans need two different systems, it is essentially because having advanced intelligence like humans requires the ability to solve problems in a complex world, and the complexity of the world inevitably means that there are many phenomena across different time scales. If we consider a simple video game world, if we only need to focus on the current state without considering long-term planning, then completing the game does not require the division of labor between System 1 and System 2. A complex world is like a critical point in a physical system, where there are non-trivial correlations across all time scales, and a general intelligent system needs to understand and utilize these correlations across different scales. A typical feature of such a complex world is the power law distribution (power law): when correlations decay as a power law over time, it means there is no maximum time scale, and it is sufficient to predict phenomena shorter than this time scale. Interestingly, there is also a power law distribution in human language, known as Zipf’s law[5]: the nth most common word in a language occurs with a frequency proportional to 1/n. This power law distribution reflects the complexity of language and the world it describes: although most words are not common, they collectively account for a significant proportion, and there is no simple cutoff, meaning that a limited number of commonly used words cannot describe everything. It is precisely because a complex world inevitably involves the division of time scales that humans and future general artificial intelligence must have different cognitive systems for different time scales, which is the distinction between System 1 and System 2.

3.2 The Relationship Between System 1 and System 2

So, is System 2 a completely independent cognitive system from System 1? Not necessarily. For example, if we want to calculate 9 times 9, we can directly recall the answer 81 from memory without thinking, which is a task for System 1. However, if we want to calculate 999 times 999, we cannot rely solely on memory; we must begin invoking System 2 to think. We might break it down into the following steps:

(1) Use 999=1000-1 to transform the problem into calculating (1000−1)×(1000−1);

(2) Expand this expression using the distributive property;

(3) Calculate 1000×1000, 1000×1, and 1×1;

(4) Perform the addition to arrive at the result.

In this process, what we are doing is breaking the problem down into steps until each step (for example, calculating addition, applying the distributive property) becomes a task that System 1 can complete. From this example, we can see that System 2 operates by decomposing problems into a flowchart, where each node in the flowchart corresponds to an existing capability of System 1. In other words, System 2 is a network composed of System 1.

The relationship between System 2 and System 1 has another aspect: the capabilities acquired by System 2 will provide training data for System 1 through repeated applications, enabling System 1 to gain new abilities. For instance, the distributive property learned in elementary school becomes something System 1 can handle after sufficient practice. For example, calculating 2¹⁰, which initially requires System 2, can be computed step by step: 2×2×2… until I calculate 2¹⁰=1024. However, because this number is frequently used in my work, after using it multiple times, it becomes something System 1 can handle. Similar examples occur in more complex scenarios. For instance, in scientific research, an experienced researcher might intuitively choose a particular solution to a problem, possibly without realizing the reason, only to reflect later on why they made that choice. This is because past experiences have trained System 1’s intuition. Such training occurs across various time scales. Specific methods for solving a problem sediment into System 1, which we refer to as “experience” or “intuition,” while over longer periods, these experiences collectively form our “habits” and “character,” many aspects of which may remain stable throughout life but can also change due to significant internal or external factors. The relationship between System 1 and System 2 is summarized in Figure 6.

Figure 6: The relationship between human System 1 and System 2. System 2 is a network of System 1, and the data (experience) generated by using System 2 will, in turn, train System 1.

From this analysis, we can see that human cognitive processes can be divided into a continuous spectrum based on time scales, with the fastest, “instinctive” part termed System 1, and other parts referred to as System 2. The process of System 2 acquiring new abilities involves combining existing capabilities into a network. The data accumulated from applying System 2 further optimizes System 1. Humans rapidly learn and progress by iteratively optimizing the interplay between capabilities across different time scales to handle the complexities of the world. In Figure 7, we list some examples of tasks completed by humans across different time scales.

Figure 7: Comparison of time scales for humans and artificial intelligence. The System 2 of humans encompasses a range of time scales from 1 second to several decades, allowing cognitive time scales to be adjusted for different tasks. In contrast, there is a gap between AI’s fast behaviors (inference) and slow behaviors (fine-tuning and pretraining), and fine-tuning and pretraining require human intervention to complete.

This multi-scale system is somewhat analogous to a city’s road system. If all roads are grid-like with the same speed limit, it would result in a very inefficient traffic system. The most efficient road system has speed hierarchies: vehicles traveling short distances use slower local roads, those going further use faster roads, and those going even further use highways. This planning approach is universally applicable to every city because it addresses the problem (traffic demand) based on the scale (travel distance). In physics, renormalization group theory, which is crucial for understanding the states of matter, also analyzes the relationships between dynamics across different scales to exclude unimportant details and predict under what conditions phase transitions occur (for example, boiling water).

3.3 Time Scale Division in Artificial Intelligence Cognition

Now let us apply the same time scale perspective to large language models. We find that the working methods of large language models are very similar to those of human System 1: past experiences (training data) directly influence the model’s preferences. If an output is incorrect, the model does not automatically think and correct itself but rather outputs the answer it predicts to be most likely without thought. Regardless of whether the problem is simpler or more complex, the output speed of the language model does not vary. Although large models can complete complex tasks, such as programming, their working method remains “instinctive”; for instance, when faced with an unfamiliar task (with little training data), confusion with familiar tasks can easily occur. A typical example is when I screenshot a physics formula about a three-dimensional black hole and ask GPT-4 to convert it into LaTeX format. This is a very easy task, but GPT-4 consistently outputs the wrong formula due to being more familiar with four-dimensional black hole formulas. Comparing this to human cognition, we see it follows a System 1 pattern: changing the output-input relationship requires feeding in a large amount of data for training. Compared to the extensive data required for pretraining (pretraining), large models can also optimize performance in a particular area through fine-tuning (finetuning), which requires less data than pretraining and is a quicker way to change model behavior, but the changes it can bring are also more limited. Fine-tuning may also cause the model’s capabilities in other areas to degrade. According to the time scale division, we can place the reasoning (inference), fine-tuning, and pretraining of large models on the time axis (Figure 7). Compared to human cognitive patterns (Figure 1), we see that there are two main differences:

(1) Both fine-tuning and pretraining require human intervention. If the companies training large models do not perform fine-tuning and pretraining, the parameters of the large model will not automatically adjust over time through interaction with customers. In other words, for a large model to learn anything new, it must initiate the fine-tuning or pretraining process manually. If it is only performing inference, the large model is a stateless machine, with no other state parameters changing over time except for the content saved in chat records.

(2) There is a gap between the fast behavior of inference as a System 1 and the slow behavior of fine-tuning and even slower pretraining. Human System 2 can operate over any time scale longer than System 1, while AI currently lacks the ability to flexibly adjust the time scales for learning and applying new skills.

In comparison to human cognition, we see that what AI lacks is precisely System 2. Existing large language models (LLMs) resemble a city’s road system where all streets have the same speed limit; to improve traffic conditions, one can only renovate the roads as a whole or in part (pretraining or fine-tuning), and the efficiency of such improvements is far inferior to introducing different speed limits proportionately through fast roads and highways. Based on our analysis of human cognitive systems, System 2 is realized through a network of System 1. Building System 2 means enabling AI to organize its System 1 network to create new tools and solve new problems.

4. Towards System 2: AI Agents

To summarize the content discussed earlier, we see that today’s large language models have crossed the critical point of information complexity, training a powerful System 1, which also paves the way for the next step: building System 2.Based on the examples of human cognition, we can see that System 1 serves as the fundamental unit for constructing System 2. Therefore, AI’s System 2 is essentially a network composed of System 1 (large models), achieved by completing different sub-tasks through multiple calls to the large model to realize more complex functions. This direction has seen increasing research over the past year, commonly referred to as AI Agents (AI agents). By having multiple LLMs collaborate and possessing long-term memory, it is theoretically possible to achieve an expansion from System 1 to System 2. Below, I will explain the basic concepts of AI agents through a few examples.

The first example is the well-known “chain-of-thought” prompting strategy (Figure 8) [6]. For a given problem, such as a math question, instead of having the AI directly output the answer, it can improve the reasoning accuracy by outputting the intermediate processes step by step. In the simplest implementation, the AI outputs intermediate steps z₁, z₂,…,z_n based on the input x, and then reaches the conclusion y, so this can still be regarded as a single call to the LLM. If facing more complex problems, the AI can first write out the chain of intermediate steps and then refine the content for each step, which involves multiple calls to the LLM, and can be considered the simplest form of an agent. In a 2023 study[7], the authors extended this strategy to a “tree of thoughts,” where after each reasoning step, the AI generates several possible next steps, forming a tree structure, and then evaluates which strategy is more feasible. This approach can further enhance the accuracy of AI problem-solving. Following this direction, subsequent works have extended the tree of thoughts into a more general graph of thought (graph of thought) [8].

Figure 8: Different ways of invoking large models (a) directly outputting the answer to a given problem; (b) chain-of-thought prompting; (c) multiple chains of thought doing a majority vote; (d) tree of thoughts[7]

The second example is an AI virtual town designed by a research group at Stanford University (Figure 9) [9]. This work designed a virtual gaming environment where 25 AI agents live in a virtual town. Each agent has its own character settings (students, teachers, etc.), memories (things experienced daily, people encountered). The agents decide what to do next based on their memories and settings and need to reflect on their experiences, storing important information in memory. The social interactions between agents exhibit complex behaviors, such as organizing a birthday party. In this example, each agent needs to have System 2, using long-term memory, planning, and reflection to achieve complex social behaviors.

Figure 9: AI Virtual Town[9]

The third example is completing a complex task through multiple calls to LLMs and dialogues among several agents. There are many works in this area, with typical examples being the earliest proposed autoGPT[10] and Microsoft’s AutoGen (Figure 10) [11]. For a task posed by a human user, the AI first makes a plan, then executes it, troubleshooting problems as they arise, iterating through this process. LLMs can communicate through dialogue to solve problems; for instance, one LLM is responsible for writing code, while another runs the code and returns results or errors.

Figure 10: AutoGen demonstration[11] (a) AutoGen’s agents can include large models or other tools, and human input; (b) Agents in AutoGen can solve problems through dialogue.

Lastly, a physics example is provided in a 2024 study where a research group at Cornell University used GPT to perform step-by-step calculations of the Hartree-Fock approximation (Figure 11)[12]. In scientific research, many mature derivations or calculations can be automated in similar ways. Most of these tasks cannot be completed by directly invoking LLMs; instead, they require designing multi-step processes, which involves the use of AI agents.

Figure 11: LLM performing Hartree-Fock calculations step by step[12]

The importance of AI agents is increasingly recognized[13], but research in this area is still in its early stages. Current applications are still experimental. Compared to human System 2, we can see that for AI to develop a truly general System 2, it needs to overcome the following challenges:

(1)Self-organization capabilities. Current agent applications still rely on workflows designed by humans. For AI agents to become the System 2 of AI, they need to be able to plan themselves, design the workflows needed to accomplish tasks, and continuously improve this design based on feedback. To form such self-organization capabilities, AI must have a good grasp of the various abilities that its System 1 can achieve, accurately searching for and invoking the correct components from various basic abilities to realize more complex functions.

(2)System 2 sedimenting into System 1 capabilities. As discussed earlier, humans can sediment abilities that initially require System 2 into System 1 through practice. For AI to continually expand its capabilities, it must also possess this ability, gradually reducing the reasoning cost for common tasks instead of repeating the same calculations each time.

(3)Computational costs. Currently, the computational costs of AI are still much higher than those of humans. Humans can simultaneously entertain many thoughts and make rapid judgments, whereas current AI requires many iterations to achieve this, leading to difficulties in speed and accuracy when applying agents to real-world problems. However, it is worth noting that the computational costs of AI are rapidly decreasing, with major models seeing price reductions, and as computational demands continue to increase, the infrastructure production keeps pace, meaning that computational costs will continue to drop significantly in the coming years.

Addressing these challenges and building a general System 2 on the basis of large models is a crucial step toward achieving general artificial intelligence (AGI), and it is also a key focus of my current work.

5. Summary and Outlook

In summary, this article reviews the basic principles and recent developments of large language models, and analyzes the significance of large language models for AI development from the perspective of information dynamics. Based on the comparison between large language models and human cognitive systems, this article proposes that the next step for artificial intelligence is System 2, and the direction of AI agents is closely related to the development of System 2. This article provides an overview of some developments in the direction of AI agents and discusses the main challenges that need to be addressed in the next steps.

In the next 5-10 years, the development of artificial intelligence will profoundly impact various aspects of human society, potentially leading to transformative changes. Among these impacts, the influence on scientific research and other innovative work may be one of the most profound changes. How to apply artificial intelligence to assist scientific research is a question worthy of deep thought and exploration.

Author Biography

Qi Xiaoliang, Professor of Physics at Stanford University. Co-founder of Path Integral Technology Co., Ltd. Professor Qi’s early research focused on condensed matter theory, and he is one of the pioneers of topological insulator theory and topological superconductor theory. Recently, he has concentrated on the relationships between quantum information, quantum many-body systems, and quantum gravity. He has received several international awards, including the New Horizons in Physics Prize and the Sloan Fellowship.

Personal Homepage: https://profiles.stanford.edu/xiaoliang-qi

Professor Qi Xiaoliang previously gave a talk on the theme “Why ChatGPT is a Critical Point: The AI Revolution from the Perspective of Information Dynamics” at the AI By Complexity reading group. Friends interested in deep learning are welcome to study further.

Registration link: https://pattern.swarma.org/study_group_issue/480

References

[1] Shannon’s Source Coding Theorem. https://web.archive.org/web/20090216231139/; http://plan9.belllabs.com//cm//ms//what//shannonday//shannon1948.pdf

[2] Vaswani A, Shazeer N, Parmar N et al. Attention Is All You Need. 2023, arXiv:1706.03762

[3] Qi Xiaoliang. The Dawn of Artificial Intelligence: Looking at ChatGPT from the Perspective of Information Dynamics. https://mp.weixin.qq.com/s/DJRSqwo0cWGOAgZM4As-OQ

[4] Kahneman D. Thinking, Fast and Slow. Macmillan, 2011

[5] Steven P. Psychon. Bull. Rev., 2014, 21(5): 1112

[6] Wei J et al. Chain-of-thought Prompting Elicits Reasoning in Large Language Models. In: Advances in Neural Information Processing Systems 35, 2022

[7] Yao SY et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In: Advances in Neural Information Processing Systems 36, 2024

[8] Besta M et al. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(16): 17682

[9] Park JS et al. Generative Agents: Interactive Simulacra of Human Behavior. 2023, arXiv:2304.03442

[10] Yang H, Yue SF, He YZ. Auto-gpt for Online Decision Making: Benchmarks and Additional Opinions. 2023, arXiv:2306.02224

[11] Wu QY et al. AutoGen: Enabling Next-gen LLM Applications via Multiagent Conversation Framework. 2023, arXiv:2308. 08155

[12] Pan HN et al. Quantum Many-Body Physics Calculations with Large Language Models. 2024, arXiv:2403.03154

[13] Andrew Ng. What’s next for AI agentic workflows. https://www.youtube.com/watch?v=sal78ACtGTc

References can be scrolled up and down for viewing

AI By Complexity Reading Group Recruitment

With large models, multimodal, and multi-agent systems emerging one after another, various neural network variants are showcasing their strengths on the AI stage. The exploration of emergence, hierarchy, robustness, nonlinearity, evolution, and other issues in the field of complex systems is also ongoing. Excellent AI systems and innovative neural networks often possess characteristics of excellent complex systems to some extent. Therefore, how developing complex systems theoretical methods can guide the design of future AI is becoming a topic of great concern.

The AI By Complexity reading group was jointly initiated by Assistant Professor You Yizhuang from the University of California, San Diego, Associate Professor Liu Yu from Beijing Normal University, PhD student Zhang Zhang from the School of Systems Science at Beijing Normal University, Mu Yun, and Master student Yang Mingzhe, PhD student Tian Yang from Tsinghua University, to explore how to measure the “goodness” of complex systems? How to understand the mechanisms of complex systems? Can these understandings inspire us to design better AI models? Essentially helping us design better AI systems. The reading group started on June 10 and is held every Monday evening from 20:00 to 22:00. Friends engaged in related research fields or interested in AI+Complexity are welcome to sign up for the reading group exchange!

Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

For more details, please see: AI by Complexity Reading Group Launch: How to Quantify and Drive the Next Generation of AI Systems

AI+Science Reading Group

AI+Science is a trend that has emerged in recent years to combine artificial intelligence and science. On one hand, AI for Science refers to using machine learning and other AI technologies to solve problems in scientific research, from predicting weather and protein structures to simulating galaxy collisions, optimizing nuclear fusion reactors, and even making scientific discoveries like a scientist, known as the “fifth paradigm” of scientific discovery. On the other hand, Science for AI refers to how scientific principles, especially in physics, inspire machine learning theories, providing new perspectives and methods for the development of artificial intelligence.

The AI+Science reading group was jointly initiated by postdoctoral researcher Wu Tailin from Stanford University’s Computer Science Department (under the guidance of Professor Jure Leskovec), researcher He Hongye from the Harvard Quantum Initiative, and PhD student Liu Ziming from MIT’s Physics Department (under the guidance of Professor Max Tegmark), to explore important issues in this field and study relevant literature together. The reading group has concluded, but registration is still open to join the community and unlock replay video privileges.

Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

For more details, please see:

The New Paradigm of Inter-empowerment between Artificial Intelligence and Scientific Discovery: AI+Science Reading Group Launch

Post-ChatGPT Reading Group

On November 30, 2022, a phenomenal application was born on the internet: ChatGPT developed by OpenAI. From Q&A to programming, from summarizing to writing papers, ChatGPT has demonstrated diverse general intelligence. As a result, tech giants such as Microsoft, Google, Baidu, Alibaba, and iFlytek are gearing up to enter the field… However, let’s take a moment to reflect… Is it truly appropriate to go all in on large language models now? It is important to realize that the success of ChatGPT relies on deep learning, big data, and large models, all of which have been gaining traction since the era of AlphaGo five years ago. Those who missed the opportunity five years ago may wonder why they can now board the train of large language models.

The Post-ChatGPT reading group is specially organized by Professor Zhang Jiang from Beijing Normal University, founder of the AI Club, along with several other teachers including Xiao Da, Li Yanran, Cui Peng, Hou Yueyuan, Zhong Hanting, and Lu Yi, aiming to systematically analyze the technology behind ChatGPT and discover its weaknesses and shortcomings. The reading group has concluded, but registration is still open to join the community and unlock replay video privileges.

Time, Information, and AI: Future of Large Models from Information Dynamics Perspective

For more details, please see:

The Launch of the Post-ChatGPT Reading Group: From General Artificial Intelligence to Conscious Machines

Recommended Reading

1. Artificial Intelligence’s Dawn: Looking at ChatGPT from the Perspective of Information Dynamics

2. Information Dynamics on the Edge of Chaos: The Final Chapter of Information Theory in Complex Systems Research

3. Melanie Mitchell’s Article in Science: Debate on the Nature of General Artificial Intelligence

4. Toward Embodied General Intelligence: How Can Machines Learn World Models from Natural Modalities?

5. Zhang Jiang: The Foundation of Third Generation AI Technology—From Differentiable Programming to Causal Reasoning | New Course at the AI Club

6. With the Year of the Dragon Rising, It’s Time to Learn! Unlock All Content on the AI Club Website and Kickstart Your New Year Learning Plan

7. Join the AI Club, Let’s Explore Complexity Together!

Click “Read Original” to register for the reading group

1. Introduction to Large Language Models

2. Critical Points of Information Complexity

3. The Fast and Slow of AI

4. Towards System 2: AI Agents

5. Summary and Outlook

Related posts

Leave a Comment Cancel reply