Copernicus is an interesting example. In some ways he actually took a step backward because he was most interested in preserving uniform speeds of the planets in their orbits, which he thought more elegant than Ptolemy’s system of equants. That’s actually wrong and meant that Copernicus wasn’t really much better than other models in the end, other than being right about that one big thing. I suppose LLMs fail because they can’t take that step backward to simplify the system.
Right, that’s kind of the claim I’m trying to make, that NNs have a bias towards complexity rather than simplicity. Sometimes people claim the opposite, that something about huge data sets or something about gradient descent biases the model towards simplicity, but I never find such claims to be convincing. Besides, simplicity is not always the right bias either. As you point out, the Copernican model, the simplest model, was not more accurate than epicycles. The “right” level of simplicity/complexity is something that is contextual.
for one, i’m looking into what is considered an elegant experiment, and seems like economy of means and simplicity are among their key features (as opposed to brute force experiments). so it’s more about how we probe reality rather than what our model of reality is as a result, though these are related.
second, have you seen this paper? https://arxiv.org/abs/2505.24832 it’s about the memorization vs generalization trade-off in LLMs which seems related to epicycles vs compression/generalization you wrote about here. i’m interested in this in light of using LLMs for literature-based discovery
Really cool read! I always perk up when I see you've got something new out.
Thank you for reading! An above-average amount of work went into this post, so I appreciate it.
Copernicus is an interesting example. In some ways he actually took a step backward because he was most interested in preserving uniform speeds of the planets in their orbits, which he thought more elegant than Ptolemy’s system of equants. That’s actually wrong and meant that Copernicus wasn’t really much better than other models in the end, other than being right about that one big thing. I suppose LLMs fail because they can’t take that step backward to simplify the system.
Right, that’s kind of the claim I’m trying to make, that NNs have a bias towards complexity rather than simplicity. Sometimes people claim the opposite, that something about huge data sets or something about gradient descent biases the model towards simplicity, but I never find such claims to be convincing. Besides, simplicity is not always the right bias either. As you point out, the Copernican model, the simplest model, was not more accurate than epicycles. The “right” level of simplicity/complexity is something that is contextual.
great post, and so lucid! thank you for writing it up. i will be pondering it, it’s actually related to something i’m researching atm
Thanks Ulkar! Would be curious to hear more about what you're researching.
for one, i’m looking into what is considered an elegant experiment, and seems like economy of means and simplicity are among their key features (as opposed to brute force experiments). so it’s more about how we probe reality rather than what our model of reality is as a result, though these are related.
second, have you seen this paper? https://arxiv.org/abs/2505.24832 it’s about the memorization vs generalization trade-off in LLMs which seems related to epicycles vs compression/generalization you wrote about here. i’m interested in this in light of using LLMs for literature-based discovery