Joseph R. Farah: machine learning

Showing posts with label machine learning. Show all posts

Monday, March 23, 2026

A viewer reached out with a paper that they had written with an LLM. When I looked closer, I got worried.

A few weeks ago I posted the results of a rather simple experiment designed to test some of the claims being made about LLMs. The response of the community was amazing--we got a ton of great feedback and ideas for how to continue exploring these ideas, and there was clear interest.

As a physicist, I am pretty constantly bombarded by emails from people effectively saying, "AI helped me write this paper about my huge discovery, can you endorse it for arXiv/tell me what you think?" I usually ignore these--the vast majority are wild grandiose claims that a glance are unlikely to be meaningful. However, this week I received a paper from a viewer that did not seem ridiculous. In fact, at first glance, it seemed quite reasonable, made a restrained, testable claim about a reasonable observation, and didn't have any super obvious red flags besides the usual LLM deficiencies (bad at citations, etc.). I decided to give this one a shot and proposed a challenge to the viewer: I'd review the paper on camera, and if it was good, I'd endorse him for arXiv. If not, I'd explain how the paper could be improved.

A very fair reaction you might be having now is, "this is a waste of time!" Certainly, I can't do this for every paper I get, nor do I want to fill my time reading AI slop. However, I think there's a valuable exercise here, one where a little effort can go a long way, and perhaps reach some people that really need to hear this. Despite a few comments which criticized the original video for deconstructing an argument they felt nobody was making (effectively, "nobody actually thinks these things can do science!") vixra submissions and my own email inbox would suggest otherwise. My intent for this discussion is to help crystallize the issues with LLM-driven science by taking one of the best attempts I've seen yet and showing problems that are common to this method. Hopefully, I can point future emailers to this video in the future, so that they can re-assess their own work without me needing to break down every LLM paper I receive.

I break down the paper in the video (including the science behind the claim), but the key issues are this:

Lots of inaccuracies. There are many wrong statements in the paper. The primary formula that the key result revolves around is a possibly incorrect simplification of a significantly more complex calculation, which is not addressed anywhere in the result. At worst, the methodology of the paper is incorrect; at best it is unjustified.
The paper is completely underwritten (a common LLM-driven paper problem). There's zero literature review (more on this later). Choices in methods and figures are left completely unjustified. The paper analyzes a sample of 175 galaxies but only includes 10 in the analysis without explaining why or how the selection was made. There is no quantitative discussion or attempts to compare with past results. The primary result is hand-wavingly stated without deeper exploration or motivation.
The primary result is simply uninteresting, bordering on tautological. The study takes a statistical correlation that has been very well-established on many galaxies in a sample, then looks at a few of the galaxies in the sample and find that the statistical correlation holds if you look at each galaxy individually. This is very obviously true and not a discovery at all, but it is presented like it is completely novel. The analogy I draw is: imagine it is well known that tall people tend to weigh more. Then a new paper comes along and measures someone's weight once a year, and finds that as they get taller they weigh more, and then claim it as a new discovery.
There is complete disengagement with the literature. As I mentioned earlier, there are basically no citations in the paper. This is a problem from an ethical and procedural perspective, and it makes it impossible to verify where certain statements are coming from. But the lack of literature review is very problematic for another reason: as I was catching up on the literature of this field to review the paper, I immediately came across several other papers that did exactly what this paper is claiming to do, but better and in a more interesting way. See for example, Li et al. (2018), published in A&A, called "Fitting the Radial Acceleration Relation to Individual SPARC Galaxies". Or Lelli et al. (2017), which literally made a movie showing how each individual SPARC galaxy adds to the RAR. The LLM paper's Figure 1 is essentially a static version of this animation, presented as a novel finding.

I go into this in more detail in the video, but this is the gist. I also present general advice to the viewer on how they can have more success doing a science project such as this. But the paper worried me significantly. LLM capabilities have not improved at all in terms of producing meaningful science in the last year or two, but their ability to produce meaningless science that looks meaningful has wildly improved. I am concerned that this will present serious problems for the future of science as it becomes impossible to find the actual science in a sea of AI slop being submitted to journals.

LLMs are painted as democratizing science, but I'm actually worried that soon journals won't even allow you to submit unless you have senior faculty at a major institution vouching for you because they can't compete with the tide of garbage that will be expedient to produce and submit at scale. If you were a journal, trying to maintain a standard of quality, while also making sure that the good papers get through, how would you do this without an army of reviewers working around the clock? I seriously worry that this will lead to academia becoming more closed, not less.

Previously on this blog:

Where is the LLLine?

Saturday, March 14, 2026

Where is the LLLine?

My personal uses and struggles with large language models (LLMs) cannot be fully expounded upon in one post and I will make no such attempt here. However, I have been mulling over the genuine utility of LLMs versus the undeniable impact they have on my personal ability to process and understand information. There are generally two diametrically opposed camps:

The AI-haters: these people believe that there is absolutely no place for LLMs regardless of utility. They point to not only the impact they have on people, but also the disastrous moral, ethical, environmental, and economical impact they could be argued to have. I won't evaluate that impact here; but this is their rationale. These people use generative LLMs for absolutely nothing and absolutely despise it in all forms.
The AI die-hards: these people believe that true AGI is right around the corner, and that these tools will completely reshape the way we use technology, and perhaps our role as human beings. They point to the impressive capabilities of LLMs and argue that a massive upheaval is right around the corner, as companies lay off workers left and right to be seemingly replaced by LLMs. I won't evaluate these claims here, but this is their rationale. These people use LLMs for everything they can think of, from genuinely useful work to helping choose their clothes to writing text messages to their moms.

I do not find either of these camps truly appealing. While I think I lean more hater than die-hard--as an artist I am extremely alarmed by the rise of slop and the willingness with which LLM companies disregard notions of intellectual property--I also do use LLMs myself, on a near-daily basis. They are objectively immensely useful, even after just a few years of development, and have made certain things about my work more enjoyable and efficient.

However, as with all things that seem too good to be true, there is a tradeoff. With each task I use LLMs for, I feel a token of practice escape, a chance to become better evaporate. The time I saved on the task is replaced by the weight of the knowledge that I have just made myself easier to replace, or robbed myself of an opportunity to learn. This feeling is not so strong on tasks I am already very good at--for example, making plots in Matplotlib--but it dramatically waxes when I am using LLMs to save time on the learning process itself; i.e., using an LLM to automate a task that I am not fully capable of myself. My wonder at the technology contorts into an intense, epistemic discomfort. Like a student cheating on an exam or a person skipping a workout, I know deep down that in the long run, I may be saving time and energy, but I am only hurting myself.

I have been struggling with these two extremes, knowing these tools can help me, but not knowing when I am taking it too far. Today, I had a discussion with my very close friend and colleague, the brilliant and handsome Joaquin. He presented an analogy for LLM usage that clicked so well I genuinely stopped in my tracks.

Joaquin compares using an LLM to using a chess engine (e.g., Stockfish, or Komodo). This comparison has an immediate appeal, as while they are both breathtaking examples of modern technology and innovation, they are just about equally close to being true artificial intelligence (that is to say, not at all). However, you might be fooled into thinking otherwise, whether you are seemingly "chatting" with Claude, or watching Magnus Carlsen get flattened by AlphaZero. Upon reflection and discussion with Joaquin, I realized this analogy was unbelievably 1:1, and indeed questions about LLMs that seem impossible to answer become almost trivial when applied to a chess engine--but the answers are plainly fungible.

A picture of me learning chess with my father in the early 2000s. Chess has always been a huge part of my life from a very young age, and thinking about LLMs terms of something that comes to me very intuitively has been a game-changer.

The particular question I am interested in answering is: in my own work as an academic, where does the line between "using LLMs to help me" and "abusing LLMs in a way that hurts me" exist? I consider the following litmus tests, based on how I have used chess engines as a player myself to get to a 1700 ELO.

You must use these tools to learn, and not to cheat. The analogy here is extremely obvious. Every serious chess player in the world uses engines to analyze their games; understand their weaknesses; hone their strengths. But there is a clear and intuitive point where this becomes cheating--when they take the engine onto the battlefield. Why is this so hard to transfer to LLMs? Because it is not clear where, as an academic, our battlefield is. Is it at our desk? In the conference room? In a presentation? Within the journal submission? The line is vague and unclear. I believe answering this question--where is the battlefield of academia?--is the key to understanding where the line is. My best answer is that the battlefield is in our ability as academics to communicate with our peers--in any format. Every successful academic has an intuitive understanding of understanding--a sense, a click, when you really understand something, when the concept goes from something you are regurgitating to something you have internalized. When you are reading a textbook, struggling to understand a concept, an LLM can be an extremely useful personal tutor, just as a chess engine can help you analyze a game (this also plays to an LLM's strength, since textbook knowledge tends to be over-represented in training data compared to more niche bleeding-edge concepts). But when it's time to take what you've learned to the battlefield, if you cannot leave the engine behind, then you are likely relying on it too much. When you stand up and talk to your colleague, can you defend the ideas you've learned with LLM assistance? Can you teach it to someone yourself? When your colleague asks you questions about your idea, can you answer with an innate sense of the boundary of your own knowledge? Or are you left stalling as you make a mental note to forward the question to your LLM of choice later? Framing the question in this way has helped me draw the line--your goal of using the LLM to learn should be to eventually not need it. These machines are the most powerful learning tools ever made. But when you ask it to teach you a concept, the goal must be true understanding, not automation or substitution. Every time, no exceptions.
LLMs can do some things better and faster, and that's okay. I struggled the most with this concept, as anything I offloaded onto LLMs felt like a skill I was giving up, or an opportunity to improve myself I was missing. But the reality is, there are just some things at which LLMs are fantastic. The chess engine analogy has helped me realize this isn't the end of the world. There are some things in chess we have offloaded to engines, likely permanently, such as the study of openings. It is simply not feasible to expect players to calculate the outcomes of common openings from scratch, every single time. So we use powerful tools like Chess.com's Opening Explorer to walk through every opening sequence imaginable, learning the right moves to make, the wrong moves to avoid. The engine helps us experiment--what happens if we try this? Make that move? Take that piece? Every question, answered immediately, with the cold confidence of a calculating machine vastly beyond our own capabilities. And yes, sometimes we just memorize some of these openings, like a Cuber memorizing a solution algorithm for a particular sequence of patterns. In the end, it doesn't change the fun of the game; it just helps you get a better start more often than not. In the same way, LLMs can genuinely improve your efficiency and productivity, when used carefully and in the right situations. And when you find those situations, it's okay to use them freely. However, I will finish this point on an interesting word of caution. While engine-explored openings are useful for 99% of situations, you will be vulnerable against someone who is a true master. Magnus Carlsen is well known for deliberately playing bizarre openings to force engine-reliant opponents into unfamiliar game scripts, where he has the upper hand due to his unmatched calculational ability. For whatever area you allow LLMs to supplement you more fully, you will have this same vulnerability. A real expert will be able to tell. Proceed with caution.
Sometimes, human intuition IS better. The reality is, LLMs have their limitations. This plays both ways--in one scenario, you may not want to use an LLM because they are simply incapable of performing a task (such as making a genuinely new discovery outside its training data in the way humans can), and in another scenario, you may not want to allow an LLM to teach you something a particular way because its enormous dataset-based approach is simply not ideal compared to the way humans think about things. There is a really direct parallel with chess engines here as well. Despite chess engines being many, many times stronger than any human opponent could ever hope to be, they can still fail or be useless in several key ways. First, their computations boil down to brute force (this is a drastic oversimplification, but it's largely what they do better than us). We can't do what they do--and that means, we can't always rely on their solutions! If I'm trying to understand the best move in a position, sometimes it isn't the best move mathematically, which the computer had to check 12 moves into the future to notice. Sometimes the best move is the one that is maybe 0.05 points less optimal, but I understand deeply and intuitively why it is a good move, and I know next time I am in this position I will be able to find the move right away. This reality is an accommodation of our own failings, but the limitations of engines go beyond this. There are real scenarios where humans are just better at assessing the situation than machines. Sometimes the human and machine disagree, and the human ends up being right, even though the machine should be able to calculate far more extensively than our soft wrinkly brains. I've drawn out this analogy a bit too far, but I do find it fascinating--and the conclusions are directly transferable to LLMs. Turning to an LLM will not always be the best decision, especially at the elite level of anything. Knowing where those gray areas and blind spots are requires developing your own expertise and understanding--but it is effort that will pay dividends in the short- and long-term.

I have more thoughts, but I will stop here for now. Reflecting on the question posed above in this way has given me significantly more confidence in how I choose to use LLMs and siginficantly more peace in the emotions and uncertainty this struggle has created within me. Thanks again to Joaquin for suggesting this amazing analogy!

Tuesday, April 4, 2023

Global Geomagnetic Perturbation Forecasting Using Deep Learning

AGU:

Geomagnetically Induced Currents (GICs) arise from spatio-temporal changes to Earth's magnetic field, which arise from the interaction of the solar wind with Earth's magnetosphere, and drive catastrophic destruction to our technologically dependent society. Hence, computational models to forecast GICs globally with large forecast horizon, high spatial resolution and temporal cadence are of increasing importance to perform prompt necessary mitigation.

Our model outperforms, or has consistent performance with state-of-the-practice high time cadence local and low time cadence global models, while also outperforming/having comparable performance with the benchmark models. Such quick inferences at high temporal cadence and arbitrary spatial resolutions may ultimately enable accurate forewarning of dB/dt for any place on Earth, resulting in precautionary measures to be taken in an informed manner.

Phys.org:

Like a tornado siren for life-threatening storms in America's heartland, a new computer model that combines artificial intelligence (AI) and NASA satellite data could sound the alarm for dangerous space weather.

The model uses AI to analyze spacecraft measurements of the solar wind (an unrelenting stream of material from the sun) and predict where an impending solar storm will strike, anywhere on Earth, with 30 minutes of advance warning. This could provide just enough time to prepare for these storms and prevent severe impacts on power grids and other critical infrastructure.

To help prepare, an international team of researchers at the Frontier Development Lab—a public-private partnership that includes NASA, the U.S. Geological Survey, and the U.S. Department of Energy—have been using artificial intelligence (AI) to look for connections between the solar wind and geomagnetic disruptions, or perturbations, that cause havoc on our technology. The researchers applied an AI method called "deep learning," which trains computers to recognize patterns based on previous examples. They used this type of AI to identify relationships between solar wind measurements from heliophysics missions (including ACE, Wind, IMP-8, and Geotail) and geomagnetic perturbations observed at ground stations across the planet.

Previously on this blog: