• 6 Posts
  • 415 Comments
Joined 3 years ago
cake
Cake day: August 29th, 2023

help-circle

  • Theoretically if the people responsible for that training and reinforcement did their jobs well then those patterns should only include true statements but if it was that easy then you wouldn’t have [insert the entire intellectual history of the human species].

    I’m chiming in to agree with Architeuthis and mention a citation explaining more. LLMs have a hard minimum rate of hallucinations based on the rate of “monofacts” in their training data (https://arxiv.org/html/2502.08666v1). Basically, facts that appear independently and only once in the training data cause the LLM to “learn” that you can have a certain rate of disconnected “facts” that appear nowhere else, and cause it to in turn generate output similar to that, which in practice is basically random and thus basically guaranteed to be false.

    And as Architeuthis says, the ability of LLMs to “generalize” basically means they compose true information together in ways that is sometimes false. So to the extent you want your LLM to ever “generalize”, you also get an unavoidable minimum of hallucinations that way.

    So yeah, even given an even more absurdly big training data source that was also magically perfectly curated you wouldn’t be able to iron out the intrinsic flaws of LLMs.


  • -3 upvotes and 0 karma, but the article is absolutely right (they hate this post because it tells the truth). If Eliezer wants to influence public discourse and policy on an international level, he absolutely does need a respectable image (with maybe a touch of eccentricity in an allowable way). But apparently (what he thinks is) the literal end of the world isn’t enough to make him actually try for normie public image. Or maybe he has some galaxy brain plan about how looking like a weirdo actually helps his cause? If he does, I strongly suspect it is a rationalization.




  • Zitron’s analogy is excellent because the bubble is multifactorial and the analogies that we can make are factor-to-factor. Here’s some things that caused the dot-com bubble; people were overly optimistic about:

    Ed has also been clear there are a few factors that make this bubble worse (for the economy and the general public) than the dotcom bubble. For one, Ed is strongly convinced that GPU lifecycles are much shorter and worse than fiber optic life cycles. You build fiber optic infrastructure and it will last for decades. Meanwhile, GPUs used constantly at max load have life cycles of 3-5 years. The end result of the internet is also much more useful and less of a double-edged sword than the slop generators which churn out propaganda and spam.


  • I am a pretty big fan of Ed’s work, so I’m going to hold my nose and read Kelsey’s work thoroughly enough to do a line by line debunking:

    Over the last two years, he has called the top repeatedly:

    Well yes, but he has also explicitly said that the bubble peaking and popping would be a multiyear process. I’ve only kept up with his every article for the past year, but in the past year, his median guess for the bubble pop becoming undeniable was 2027. I guess making timelines with big events in 2027 and hedging on the median number is only for the rationalists? Also, we are already starting to see the narrative fray as Anthropic and OpenAI experiment with price hikes and struggle with getting ready for IPO, which would count as meeting his predictions for the start of the bubble pop.

    In 2026, the focus is much more on alleging widespread, Enron- or FTX-tier outright fraud.

    This is basically an admission that he can’t make the case in terms of the economics anymore.

    ??? Ed has been making the case for circular financing and investors being deceived because he thinks there are circular financing deals and investors being deceived. Ed has slightly softened on his position on exactly how useless or not LLMs are, but he is still holding to his economic case that the amount they cost isn’t worth the value they provide, extremely blatantly so once consumers start paying the real cost and not the VC-subsidized cost.

    By almost every metric, AI progress from 2024 to 2026 has been much faster than AI progress from 2022 to 2024.

    And she is quoting a rat-adjacent think-tank for proof that AI improvement has been exponential. Even among the rationalist, the case has been made that the benchmarks are not reflective of real world usage/value and that costs are growing with “capabilities”.

    It can no longer argue that costs aren’t falling; they are.

    Even accepting the premise that real costs have fallen, Kelsey fails to address Ed’s case that the costs LLM companies charge is massively subsidized. If real costs are 10x the current subsidized costs (which have already been pushed up as far they can be without losing customers), and model inference prices miraculously drop 5x (which Kelsey would treat as a given, but I think is pretty unlikely barring some radical paradigm shifts), that is still a 2x gap.

    It is a straightforward crime to claim $2 billion in monthly revenue if you mean that you are giving away services that would have a $2 billion market value.

    Yes, exactly. Technically OpenAI and Anthropic play games with ARR and “gross” revenue (i.e. magically excluding the cost of training the model in the first place), but in a just nation it would straightforwardly be a crime. Why does she find this hard to believe?

    Epoch AI has an in-depth analysis of the same financial questions from the same public information

    (Looks inside the Epoch AI article):

    So what are the profits? One option is to look at gross profits. This only considers the direct cost of running a model

    Ed has gone into detail repeatedly about why excluding the cost of training the model is bullshit.

    (More details from the article)

    But we can still do an illustrative calculation: let’s conservatively assume that OpenAI started R&D on GPT-5 after o3’s release last April. Then there’d still be four months between then and GPT-5’s release in August,22 during which OpenAI spent around $5 billion on R&D.23 But that’s still higher than the $2 billion of gross profits. In other words, OpenAI spent more on R&D in the four months preceding GPT-5, than it made in gross profits during GPT-5’s four-month tenure.24

    Oh that is surprising, the Epoch AI article actually acknowledges the point that these models are wildly unprofitable once you account for the training cost! Of course, they throw away their point in the next section by just magically assuming LLMs will prove to massively valuable in the near future! (One of the exact things Ed has complained about!)

    He’s found too many grounds for dismissing all the financial information we have as dishonest or irrelevant to seriously engage with what any of it would imply if it were true.

    He has shown in detail how the companies use barely technically not lying obfuscated bullshit metrics like gross profit or ARR to inflate their numbers and if you try un-obfuscate them the numbers look a lot worse.

    Kelsey goes on to try to claim how much value LLMs provide

    Making them more productive is a big deal, and in 2026, AI makes them more productive.

    Zitron can’t really contest this with contemporary data, so he cites 2024 and 2025 studies of much weaker AIs with much weaker productivity impacts.

    Two years to… 4 months ago! Such outdated information! In the first place there has been very few rigorous studies of how much of a productivity boost LLM coding agents actually provide, and one of the few studies with even a passing attempt at rigor (while still below good academic standards), was METR’s study (and keep in mind they are a rat-adjacent think tank and not proper academics), which showed programmers thought they got a productivity boost but actually got a net productivity decrease!

    From this set of beliefs, you could, in fact, defend a delightful bespoke AI bubble take: that AI would have been a catastrophic investment bubble, but the AI companies were saved from their mistakes by the determined NIMBYs of America killing off the excess data center build-out.

    But that’s not Zitron’s stance. He seems to account “the build-out is too aggressive” and “the build-out is not happening as planned” as both independent strikes against AI — both things that show it’s bad, and the more of those he finds, the more bad it is.

    It could in fact be all 3! The hyped-up build out, such as that indicated by OpenAI’s and Oracle’s 300 billion dollar detail was completely insanely too aggressive (for it to pay off, Ed calculated LLMs would have to drastically exceed Netflix+Microsoft Office in terms of ubiquity and price point), not achievable given realistic build times for data centers (Ed has also brought the numbers here), and even at the reduced actually rate of build out, still not actually financially viable (is simply because the LLM companies aren’t charging enough). So yes, both things are bad, and one type of badness partway mitigates the other, but it is still all bad!


  • I advise being very cautious about consuming Zitron’s posts

    He has got a dramatic and vitriolic style, but as dgerard says, he has also dug through the numbers. I see lots of criticism of Ed’s style, but not nearly so substantial criticism about the hard numbers he has come up with. The LLM companies put out contradictory and obfuscated numbers, and taken naively they seem to contradict Ed’s numbers, but as Ed has shown, many, many times, when you start trying to un-obfuscate them they start looking really bad for everyone betting on LLMs.

    Many coders are using chatbots, but I don’t know of evidence that it makes them more productive

    So more and more coders are coming around to “actually AI code is okay”… but as we’ve seen repeatedly with LLM generated content, it is very easy for people to “Clever Hans” themselves and convince themselves LLMs are contributing more than they actually are, so I am not going to trust anecdotal reports.





  • I am somewhat surprised by the number of upvoted comments pushing back. But most of that push-back is in the typical rationalist style loaded with jargon, full of charitability to what habryka must have really meant or how they may not be understanding him right, and altogether not nearly close enough to a cleanly stated: “no, wtf is wrong with you”. So I’m still disgusted and annoyed. And it is still a highly upvoted article, so nah, I’m not giving any of the lesswrongers any credit on this one, even the ones trying to push back.









  • LLMs generate the next most probable token given the previous context of tokens they have (not an average of the entire internet). And post-training shifts the odds a bit further in a relatively useful direction. So given the right context the LLM will mostly consistently regurgitate content stolen from PhDs and academic papers, maybe even managing to shuffle it around in a novel way that is marginally useful.

    Of course, that is only the general trend given the righttm prompt. Even with a prompt that looks mostly right, one seemingly innocuous word in the wrong place might nudge the odds and you get the answer of a moron /r/hypotheticalphysics in response to a physics question. Or a asking for a recipe gets you elmer’s glue on your mozarella pizza from a reddit joke answer.

    if they took the time and energy to curate it out the way they would need to to correct that they wouldn’t be left with a large enough sample to actually scale off of

    They do steps like train the model generally on the desired languages with all the random internet bullshit, and then fine-tuning it on the actually curated stuff. So that shifts the odds, but again, not enough to actually guarantee anything.

    So tldr; you’re right, but since it is possible to get somewhat better than average internet junk with curating and post-training and prompting, llm boosters and labs have convinced themselves they are just a few more iterations of data curation and training approaches and prompting techniques away from entirely eliminating the problem, when the best they can do is make it less likely.