Monday, March 23, 2026

A Cozy Drive Through Misty Ireland, by Stephen Dalton

Stephen Dalton

Tonight, we’ll journey to the misty valleys of Ireland, where you’ll embark on a tranquil drive through the countryside in your own campervan. As rain gently taps on the roof and the mist rolls over the hills, you’ll discover the serene beauty of Ireland’s west. Feel the peaceful rhythm of the van beneath you, and allow the calming landscape to lull you into a restful sleep. 😴 

My journey in attempting to relax and decompress continues. I think I've finally found a cure to my insomnia, and it's these lovely sleep stories by Stephen Dalton. Often, I have trouble sleeping, not due to any underlying medical condition (at least, I don't think so) but because my brain refuses to slow down. I feel like I spend all day red-lining in first gear and it is really hard to come down from the adrenaline and cortisol. I've been experimenting with different nighttime routines to help get my eyes off screens and allow my body to start resting, but the effect wasn't too significant. My sleep was unfortunately still light and rather patchy at best (I tracked my sleep with a smartwatch to confirm). 

On a whim, I found out the meditation app I've been using (Insight Timer) had sleep stories, and I randomly decided to try one by Stephen Dalton. I was sleepily blown away. For those that don't know, these are like bedtime stories for adults. They feature soothing stories, relaxing music, tranquil sound design, and sometimes a little wind-down meditation at the beginning. 

Stephen's stories seemed to work particularly well for me, I'm not quite sure why. The little relaxation session at the beginning is exactly the right tone and length to help my mind slow down a bit--like he's reaching out a hand and catching me in frantic flight, slowing me down just enough to be receptive to the story. And then the story itself just slips in and before I know it, I'm out like a light. I don't even need to track my sleep anymore--I can tell where I knocked out based on the last detail of the story I remember. I was shocked to find it was usually no more than six or seven minutes in, since it has taken me thirty to forty-five minutes to fall asleep for years. Sometimes I'm out in the relaxation session, and I don't even get to hear the story!

They also just put me in a good mood for sleep, even if I don't use them all the way through. More often than not these days, I don't quite fall asleep, but I get sleepy enough, pop my earbuds back in their case (they're a little uncomfortable to sleep with but not disruptively so) and I'll fall straight asleep on my own. Sometimes, they slip out on their own, and I wake up with them underneath me. I hope that won't damage them.

This story (A Cozy Drive Through Misty Ireland) that I posted was the first one I ever listened to, and for some reason I keep coming back to it. The piano is perfect, and the peaceful rolling thunder in the distance is so relaxing coupled with the tapping rain. In this story, you drive a camper van through the misty western mountains of Ireland. He takes you over swelling roads, by large dark loches, and along ancient stone walls marking boundaries that have stood for hundreds of years. Something about the mist and the fog and the "verdant green of the land", as Stephen describes it, fills my heart in the best way and finally gives me permission to relax. 

The video is available in 4k, which seems...counterproductive.

The full list of stories I've been using is available at this YouTube playlist. I add any stories I like; though there's quite a few there, I have a few favorites, including:

  • The Scribe of Alexandria (absolutely love this one)
  • Saul the Sleepy Sloth (short but perfection)
  • A Magical Forest Night with a Sleepy Owl
  • Readings from the Shipping Forecast
  • Finding Harmony in the Himalayas
  • The Sleepy Donkey

I might write reviews for a few of these since I find the very act of reflecting on the sleep stories and writing a few words about them wonderfully relaxing in and of itself, a perfect way to wind down before sleep.

 

Previously on this blog:

A viewer reached out with a paper that they had written with an LLM. When I looked closer, I got worried.

A few weeks ago I posted the results of a rather simple experiment designed to test some of the claims being made about LLMs. The response of the community was amazing--we got a ton of great feedback and ideas for how to continue exploring these ideas, and there was clear interest. 

As a physicist, I am pretty constantly bombarded by emails from people effectively saying, "AI helped me write this paper about my huge discovery, can you endorse it for arXiv/tell me what you think?" I usually ignore these--the vast majority are wild grandiose claims that a glance are unlikely to be meaningful. However, this week I received a paper from a viewer that did not seem ridiculous. In fact, at first glance, it seemed quite reasonable, made a restrained, testable claim about a reasonable observation, and didn't have any super obvious red flags besides the usual LLM deficiencies (bad at citations, etc.). I decided to give this one a shot and proposed a challenge to the viewer: I'd review the paper on camera, and if it was good, I'd endorse him for arXiv. If not, I'd explain how the paper could be improved.

A very fair reaction you might be having now is, "this is a waste of time!" Certainly, I can't do this for every paper I get, nor do I want to fill my time reading AI slop. However, I think there's a valuable exercise here, one where a little effort can go a long way, and perhaps reach some people that really need to hear this. Despite a few comments which criticized the original video for deconstructing an argument they felt nobody was making (effectively, "nobody actually thinks these things can do science!") vixra submissions and my own email inbox would suggest otherwise. My intent for this discussion is to help crystallize the issues with LLM-driven science by taking one of the best attempts I've seen yet and showing problems that are common to this method. Hopefully, I can point future emailers to this video in the future, so that they can re-assess their own work without me needing to break down every LLM paper I receive.

I break down the paper in the video (including the science behind the claim), but the key issues are this: 

  1. Lots of inaccuracies. There are many wrong statements in the paper. The primary formula that the key result revolves around is a possibly incorrect simplification of a significantly more complex calculation, which is not addressed anywhere in the result. At worst, the methodology of the paper is incorrect; at best it is unjustified.
  2. The paper is completely underwritten (a common LLM-driven paper problem). There's zero literature review (more on this later). Choices in methods and figures are left completely unjustified. The paper analyzes a sample of 175 galaxies but only includes 10 in the analysis without explaining why or how the selection was made. There is no quantitative discussion or attempts to compare with past results. The primary result is hand-wavingly stated without deeper exploration or motivation.
  3. The primary result is simply uninteresting, bordering on tautological. The study takes a statistical correlation that has been very well-established on many galaxies in a sample, then looks at a few of the galaxies in the sample and find that the statistical correlation holds if you look at each galaxy individually. This is very obviously true and not a discovery at all, but it is presented like it is completely novel. The analogy I draw is: imagine it is well known that tall people tend to weigh more. Then a new paper comes along and measures someone's weight once a year, and finds that as they get taller they weigh more, and then claim it as a new discovery.
  4. There is complete disengagement with the literature. As I mentioned earlier, there are basically no citations in the paper. This is a problem from an ethical and procedural perspective, and it makes it impossible to verify where certain statements are coming from. But the lack of literature review is very problematic for another reason: as I was catching up on the literature of this field to review the paper, I immediately came across several other papers that did exactly what this paper is claiming to do, but better and in a more interesting way. See for example, Li et al. (2018), published in A&A, called "Fitting the Radial Acceleration Relation to Individual SPARC Galaxies". Or Lelli et al. (2017), which literally made a movie showing how each individual SPARC galaxy adds to the RAR. The LLM paper's Figure 1 is essentially a static version of this animation, presented as a novel finding. 

I go into this in more detail in the video, but this is the gist. I also present general advice to the viewer on how they can have more success doing a science project such as this. But the paper worried me significantly. LLM capabilities have not improved at all in terms of producing meaningful science in the last year or two, but their ability to produce meaningless science that looks meaningful has wildly improved. I am concerned that this will present serious problems for the future of science as it becomes impossible to find the actual science in a sea of AI slop being submitted to journals.

LLMs are painted as democratizing science, but I'm actually worried that soon journals won't even allow you to submit unless you have senior faculty at a major institution vouching for you because they can't compete with the tide of garbage that will be expedient to produce and submit at scale. If you were a journal, trying to maintain a standard of quality, while also making sure that the good papers get through, how would you do this without an army of reviewers working around the clock? I seriously worry that this will lead to academia becoming more closed, not less.

 

Previously on this blog:

Saturday, March 14, 2026

Where is the LLLine?

My personal uses and struggles with large language models (LLMs) cannot be fully expounded upon in one post and I will make no such attempt here. However, I have been mulling over the genuine utility of LLMs versus the undeniable impact they have on my personal ability to process and understand information. There are generally two diametrically opposed camps: 

  1.  The AI-haters: these people believe that there is absolutely no place for LLMs regardless of utility. They point to not only the impact they have on people, but also the disastrous moral, ethical, environmental, and economical impact they could be argued to have. I won't evaluate that impact here; but this is their rationale. These people use generative LLMs for absolutely nothing and absolutely despise it in all forms. 
  2. The AI die-hards: these people believe that true AGI is right around the corner, and that these tools will completely reshape the way we use technology, and perhaps our role as human beings. They point to the impressive capabilities of LLMs and argue that a massive upheaval is right around the corner, as companies lay off workers left and right to be seemingly replaced by LLMs. I won't evaluate these claims here, but this is their rationale. These people use LLMs for everything they can think of, from genuinely useful work to helping choose their clothes to writing text messages to their moms. 

I do not find either of these camps truly appealing. While I think I lean more hater than die-hard--as an artist I am extremely alarmed by the rise of slop and the willingness with which LLM companies disregard notions of intellectual property--I also do use LLMs myself, on a near-daily basis. They are objectively immensely useful, even after just a few years of development, and have made certain things about my work more enjoyable and efficient. 

However, as with all things that seem too good to be true, there is a tradeoff. With each task I use LLMs for, I feel a token of practice escape, a chance to become better evaporate. The time I saved on the task is replaced by the weight of the knowledge that I have just made myself easier to replace, or robbed myself of an opportunity to learn. This feeling is not so strong on tasks I am already very good at--for example, making plots in Matplotlib--but it dramatically waxes when I am using LLMs to save time on the learning process itself; i.e., using an LLM to automate a task that I am not fully capable of myself. My wonder at the technology contorts into an intense, epistemic discomfort. Like a student cheating on an exam or a person skipping a workout, I know deep down that in the long run, I may be saving time and energy, but I am only hurting myself.

I have been struggling with these two extremes, knowing these tools can help me, but not knowing when I am taking it too far. Today, I had a discussion with my very close friend and colleague, the brilliant and handsome Joaquin. He presented an analogy for LLM usage that clicked so well I genuinely stopped in my tracks. 

Joaquin compares using an LLM to using a chess engine (e.g., Stockfish, or Komodo). This comparison has an immediate appeal, as while they are both breathtaking examples of modern technology and innovation, they are just about equally close to being true artificial intelligence (that is to say, not at all). However, you might be fooled into thinking otherwise, whether you are seemingly "chatting" with Claude, or watching Magnus Carlsen get flattened by AlphaZero. Upon reflection and discussion with Joaquin, I realized this analogy was unbelievably 1:1, and indeed questions about LLMs that seem impossible to answer become almost trivial when applied to a chess engine--but the answers are plainly fungible. 

 

A picture of me learning chess with my father in the early 2000s. Chess has always been a huge part of my life from a very young age, and thinking about LLMs terms of something that comes to me very intuitively has been a game-changer. 

The particular question I am interested in answering is: in my own work as an academic, where does the line between "using LLMs to help me" and "abusing LLMs in a way that hurts me" exist? I consider the following litmus tests, based on how I have used chess engines as a player myself to get to a 1700 ELO. 

  1. You must use these tools to learn, and not to cheat. The analogy here is extremely obvious. Every serious chess player in the world uses engines to analyze their games; understand their weaknesses; hone their strengths. But there is a clear and intuitive point where this becomes cheating--when they take the engine onto the battlefield. Why is this so hard to transfer to LLMs? Because it is not clear where, as an academic, our battlefield is. Is it at our desk? In the conference room? In a presentation? Within the journal submission? The line is vague and unclear. I believe answering this question--where is the battlefield of academia?--is the key to understanding where the line is. My best answer is that the battlefield is in our ability as academics to communicate with our peers--in any format. Every successful academic has an intuitive understanding of understanding--a sense, a click, when you really understand something, when the concept goes from something you are regurgitating to something you have internalized. When you are reading a textbook, struggling to understand a concept, an LLM can be an extremely useful personal tutor, just as a chess engine can help you analyze a game (this also plays to an LLM's strength, since textbook knowledge tends to be over-represented in training data compared to more niche bleeding-edge concepts). But when it's time to take what you've learned to the battlefield, if you cannot leave the engine behind, then you are likely relying on it too much. When you stand up and talk to your colleague, can you defend the ideas you've learned with LLM assistance? Can you teach it to someone yourself? When your colleague asks you questions about your idea, can you answer with an innate sense of the boundary of your own knowledge? Or are you left stalling as you make a mental note to forward the question to your LLM of choice later? Framing the question in this way has helped me draw the line--your goal of using the LLM to learn should be to eventually not need it. These machines are the most powerful learning tools ever made. But when you ask it to teach you a concept, the goal must be true understanding, not automation or substitution. Every time, no exceptions.
  2.  LLMs can do some things better and faster, and that's okay. I struggled the most with this concept, as anything I offloaded onto LLMs felt like a skill I was giving up, or an opportunity to improve myself I was missing. But the reality is, there are just some things at which LLMs are fantastic. The chess engine analogy has helped me realize this isn't the end of the world. There are some things in chess we have offloaded to engines, likely permanently, such as the study of openings. It is simply not feasible to expect players to calculate the outcomes of common openings from scratch, every single time. So we use powerful tools like Chess.com's Opening Explorer to walk through every opening sequence imaginable, learning the right moves to make, the wrong moves to avoid. The engine helps us experiment--what happens if we try this? Make that move? Take that piece? Every question, answered immediately, with the cold confidence of a calculating machine vastly beyond our own capabilities. And yes, sometimes we just memorize some of these openings, like a Cuber memorizing a solution algorithm for a particular sequence of patterns. In the end, it doesn't change the fun of the game; it just helps you get a better start more often than not. In the same way, LLMs can genuinely improve your efficiency and productivity, when used carefully and in the right situations. And when you find those situations, it's okay to use them freely. However, I will finish this point on an interesting word of caution. While engine-explored openings are useful for 99% of situations, you will be vulnerable against someone who is a true master. Magnus Carlsen is well known for deliberately playing bizarre openings to force engine-reliant opponents into unfamiliar game scripts, where he has the upper hand due to his unmatched calculational ability. For whatever area you allow LLMs to supplement you more fully, you will have this same vulnerability. A real expert will be able to tell. Proceed with caution. 
  3. Sometimes, human intuition IS better. The reality is, LLMs have their limitations. This plays both ways--in one scenario, you may not want to use an LLM because they are simply incapable of performing a task (such as making a genuinely new discovery outside its training data in the way humans can), and in another scenario, you may not want to allow an LLM to teach you something a particular way because its enormous dataset-based approach is simply not ideal compared to the way humans think about things. There is a really direct parallel with chess engines here as well. Despite chess engines being many, many times stronger than any human opponent could ever hope to be, they can still fail or be useless in several key ways. First, their computations boil down to brute force (this is a drastic oversimplification, but it's largely what they do better than us). We can't do what they do--and that means, we can't always rely on their solutions! If I'm trying to understand the best move in a position, sometimes it isn't the best move mathematically, which the computer had to check 12 moves into the future to notice. Sometimes the best move is the one that is maybe 0.05 points less optimal, but I understand deeply and intuitively why it is a good move, and I know next time I am in this position I will be able to find the move right away. This reality is an accommodation of our own failings, but the limitations of engines go beyond this. There are real scenarios where humans are just better at assessing the situation than machines. Sometimes the human and machine disagree, and the human ends up being right, even though the machine should be able to calculate far more extensively than our soft wrinkly brains. I've drawn out this analogy a bit too far, but I do find it fascinating--and the conclusions are directly transferable to LLMs. Turning to an LLM will not always be the best decision, especially at the elite level of anything. Knowing where those gray areas and blind spots are requires developing your own expertise and understanding--but it is effort that will pay dividends in the short- and long-term. 

I have more thoughts, but I will stop here for now. Reflecting on the question posed above in this way has given me significantly more confidence in how I choose to use LLMs and siginficantly more peace in the emotions and uncertainty this struggle has created within me. Thanks again to Joaquin for suggesting this amazing analogy!