Showing posts with label ai. Show all posts
Showing posts with label ai. Show all posts

Monday, March 23, 2026

A viewer reached out with a paper that they had written with an LLM. When I looked closer, I got worried.

A few weeks ago I posted the results of a rather simple experiment designed to test some of the claims being made about LLMs. The response of the community was amazing--we got a ton of great feedback and ideas for how to continue exploring these ideas, and there was clear interest. 

As a physicist, I am pretty constantly bombarded by emails from people effectively saying, "AI helped me write this paper about my huge discovery, can you endorse it for arXiv/tell me what you think?" I usually ignore these--the vast majority are wild grandiose claims that a glance are unlikely to be meaningful. However, this week I received a paper from a viewer that did not seem ridiculous. In fact, at first glance, it seemed quite reasonable, made a restrained, testable claim about a reasonable observation, and didn't have any super obvious red flags besides the usual LLM deficiencies (bad at citations, etc.). I decided to give this one a shot and proposed a challenge to the viewer: I'd review the paper on camera, and if it was good, I'd endorse him for arXiv. If not, I'd explain how the paper could be improved.

A very fair reaction you might be having now is, "this is a waste of time!" Certainly, I can't do this for every paper I get, nor do I want to fill my time reading AI slop. However, I think there's a valuable exercise here, one where a little effort can go a long way, and perhaps reach some people that really need to hear this. Despite a few comments which criticized the original video for deconstructing an argument they felt nobody was making (effectively, "nobody actually thinks these things can do science!") vixra submissions and my own email inbox would suggest otherwise. My intent for this discussion is to help crystallize the issues with LLM-driven science by taking one of the best attempts I've seen yet and showing problems that are common to this method. Hopefully, I can point future emailers to this video in the future, so that they can re-assess their own work without me needing to break down every LLM paper I receive.

I break down the paper in the video (including the science behind the claim), but the key issues are this: 

  1. Lots of inaccuracies. There are many wrong statements in the paper. The primary formula that the key result revolves around is a possibly incorrect simplification of a significantly more complex calculation, which is not addressed anywhere in the result. At worst, the methodology of the paper is incorrect; at best it is unjustified.
  2. The paper is completely underwritten (a common LLM-driven paper problem). There's zero literature review (more on this later). Choices in methods and figures are left completely unjustified. The paper analyzes a sample of 175 galaxies but only includes 10 in the analysis without explaining why or how the selection was made. There is no quantitative discussion or attempts to compare with past results. The primary result is hand-wavingly stated without deeper exploration or motivation.
  3. The primary result is simply uninteresting, bordering on tautological. The study takes a statistical correlation that has been very well-established on many galaxies in a sample, then looks at a few of the galaxies in the sample and find that the statistical correlation holds if you look at each galaxy individually. This is very obviously true and not a discovery at all, but it is presented like it is completely novel. The analogy I draw is: imagine it is well known that tall people tend to weigh more. Then a new paper comes along and measures someone's weight once a year, and finds that as they get taller they weigh more, and then claim it as a new discovery.
  4. There is complete disengagement with the literature. As I mentioned earlier, there are basically no citations in the paper. This is a problem from an ethical and procedural perspective, and it makes it impossible to verify where certain statements are coming from. But the lack of literature review is very problematic for another reason: as I was catching up on the literature of this field to review the paper, I immediately came across several other papers that did exactly what this paper is claiming to do, but better and in a more interesting way. See for example, Li et al. (2018), published in A&A, called "Fitting the Radial Acceleration Relation to Individual SPARC Galaxies". Or Lelli et al. (2017), which literally made a movie showing how each individual SPARC galaxy adds to the RAR. The LLM paper's Figure 1 is essentially a static version of this animation, presented as a novel finding. 

I go into this in more detail in the video, but this is the gist. I also present general advice to the viewer on how they can have more success doing a science project such as this. But the paper worried me significantly. LLM capabilities have not improved at all in terms of producing meaningful science in the last year or two, but their ability to produce meaningless science that looks meaningful has wildly improved. I am concerned that this will present serious problems for the future of science as it becomes impossible to find the actual science in a sea of AI slop being submitted to journals.

LLMs are painted as democratizing science, but I'm actually worried that soon journals won't even allow you to submit unless you have senior faculty at a major institution vouching for you because they can't compete with the tide of garbage that will be expedient to produce and submit at scale. If you were a journal, trying to maintain a standard of quality, while also making sure that the good papers get through, how would you do this without an army of reviewers working around the clock? I seriously worry that this will lead to academia becoming more closed, not less.

 

Previously on this blog:

Tuesday, April 4, 2023

Global Geomagnetic Perturbation Forecasting Using Deep Learning

AGU:

Geomagnetically Induced Currents (GICs) arise from spatio-temporal changes to Earth's magnetic field, which arise from the interaction of the solar wind with Earth's magnetosphere, and drive catastrophic destruction to our technologically dependent society. Hence, computational models to forecast GICs globally with large forecast horizon, high spatial resolution and temporal cadence are of increasing importance to perform prompt necessary mitigation.

Our model outperforms, or has consistent performance with state-of-the-practice high time cadence local and low time cadence global models, while also outperforming/having comparable performance with the benchmark models. Such quick inferences at high temporal cadence and arbitrary spatial resolutions may ultimately enable accurate forewarning of dB/dt for any place on Earth, resulting in precautionary measures to be taken in an informed manner.

Phys.org

Like a tornado siren for life-threatening storms in America's heartland, a new computer model that combines artificial intelligence (AI) and NASA satellite data could sound the alarm for dangerous space weather.

The model uses AI to analyze spacecraft measurements of the solar wind (an unrelenting stream of material from the sun) and predict where an impending solar storm will strike, anywhere on Earth, with 30 minutes of advance warning. This could provide just enough time to prepare for these storms and prevent severe impacts on power grids and other critical infrastructure.

To help prepare, an international team of researchers at the Frontier Development Lab—a public-private partnership that includes NASA, the U.S. Geological Survey, and the U.S. Department of Energy—have been using artificial intelligence (AI) to look for connections between the solar wind and geomagnetic disruptions, or perturbations, that cause havoc on our technology. The researchers applied an AI method called "deep learning," which trains computers to recognize patterns based on previous examples. They used this type of AI to identify relationships between solar wind measurements from heliophysics missions (including ACE, Wind, IMP-8, and Geotail) and geomagnetic perturbations observed at ground stations across the planet. 


Previously on this blog:

Sunday, February 19, 2023

Man beats machine at Go in human victory over AI

 ArsTechnica:

A human player has comprehensively defeated a top-ranked AI system at the board game Go, in a surprise reversal of the 2016 computer victory that was seen as a milestone in the rise of artificial intelligence.

The triumph, which has not previously been reported, highlighted a weakness in the best Go computer programs that is shared by most of today’s widely used AI systems, including the ChatGPT chatbot created by San Francisco-based OpenAI.

The tactics that put a human back on top on the Go board were suggested by a computer program that had probed the AI systems looking for weaknesses. The suggested plan was then ruthlessly delivered by Pelrine.

This approach to identifying flaws in LeelaZero and similar networks strongly reminds me of a generative adversarial neural network, where networks are pitted against each other in a zero-sum game. I really believe this approach will represent the next big step forward for AI; neural networks that only require an objective function and initial conditions, while all the normal fine-tuning that typically has to be done by humans will be handled by automated GANs. 

Bojan Tunguz (via Twitter):

Very interesting story. It seems that we can still beat AIs, but we’d need the help of another AI to teach us how.

The sleep mask that solved my sleep problems

As previously reported on this blog, I've been actively seeking ways to unwind and, in particular, improve my quality of sleep. I've...