Max Ma

Replying toTransformer Attention’s High School Math Mistake

Transformer Attention’s High School Math Mistake

DeepSeek V3 mitigated this mistake unknowingly. In their MLA, K, V shares the same nn.linear.

Transformer Attention’s High School Math Mistake

11mo

Each data point (input data, weights & bias) of a neural network has coordinates. Data alone, without coordinates, is almost meaningless. When attention mechanisms (Q, K, and V) undergo a linear transformation, they are projected into a different space with new coordinates. The attention score is then computed based on the distance between Q and K after transformation. However, this transformed distance does not necessarily reflect the true, original distance between the data points.

Each of Q, K, and V undergoes its own linear transformation with different weights, meaning they are projected into distinct spaces with different coordinate systems. As a result, the transformed values of Q and K may become closer or... (read more)

-13

Replying toAI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max Ma11mo

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Thanks... will look into

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max Ma

AI4Science has the potential to surpass current frontier models (text, video/image, and sound) by several magnitudes. While some may arrive at similar conclusions through empirical evidence, we derive this insight from our "Deep Manifold" and provide a theoretical foundation to support it. The reasoning is straightforward: for the first time in history, an AI model can integrate geometric information directly into its equations through model architecture. Consider the 17 most famous equations in physics (see below) —all of them lack inherent geometric information, which limits their ability to fully capture real-world complexities. Take Newton’s second law of motion as an example: its classical formulation assumes an object falling in a vacuum. However, in... (read 218 more words →)

Neural Network And Newton's Second Law

Max Ma

When Isaac Newton formulated his Second Law (𝐹 = 𝑚𝑎), observation and experimentation were crucial. Drawing on experiments, many influenced by Galileo and others, he concluded that force causes changes in motion, or acceleration. This process involved solving both forward and inverse problems through iterative refinement (experimentation and observation), much like the training of neural networks. In neural networks, forward and inverse iterations are combined in a similar way, and this is the primary reason behind their immense power. It’s even possible that a neural network could independently derive Newton's Second Law. That’s the true potential of this technology. This is what we’ve discovered and explored in greater detail in our recent... (read 168 more words →)

-10

Replying toThe ‘strong’ feature hypothesis could be wrong

Max Ma2y

The ‘strong’ feature hypothesis could be wrong

Firstly, the principle of 'no computation without representation' holds true. The strength of the representation depends on the specific computational task and the neural network architecture, such as a Transformer. For example, when a Transformer is used to solve a simple linear problem with low dimensionality, it would provide a strong representation. Conversely, for a high-order nonlinear problem with high dimensionality, the representation may be weaker.

The neural network operates as a power-efficient system, with each node requiring minimal computational power, and all foundation model pre-training is self-supervised. The neural network's self-progressing boundary condition imposes no restrictions on where incoming data is processed. Incoming data will be directed to whichever nodes are capable... (read more)

LESSWRONG
LW

LESSWRONG
LW

Max Ma

Max Ma

Transformer Attention’s High School Math Mistake

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Neural Network And Newton's Second Law

Max Ma

Max Ma

Max Ma

Transformer Attention’s High School Math Mistake

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Neural Network And Newton's Second Law