deepseek with Crush (agentic code) made me a complete python pipeline for less than 1$

Just to keep my fellow lemmygradians updated on what AI tools are capable of, and also because I’m pretty stoked for this project.

I put 5$ in the deepseek api (sidenote: I like that you have to top up a credit balance and they don’t auto bill) then downloaded crush. Crush is an agentic coding tool, meaning basically it instructs the LLM to do stuff automatically.

It made me a complete python script to first download all of the ProleWiki content pages into txts (which also means we can do backups, even if it’s a little hacked together).

Then with a second script we are running these txts through a (local) LLM to translate them for our French instance. The problem is there are 5000 pages on the EN instance and a grand total of 3 in French, so nobody is interested in joining and writing pages from scratch when you could “just” find them on the EN instance.

For these two scripts (which are running right now) I’ve paid a whopping 67 cents on API. It amounts to a few hours of prompting and then of course waiting for the agent to work.

Cache hit on deepseek is a godsend for agentic work as it’s basically free (less than 2 cents per 1M token), and with a codebase you constantly feed it the same code over and over. This is why my cache hit is so high.

Compare to GPT-5 which costs 12 cents per 1m cache hit.

What’s pretty amazing (and scary, it’s very scary using crush) is that you can just go do something else while it works and puts everything together. Go have dinner while the agent is on the task, or watch a youtube video.

The third and final script will be used to upload the translated files to the wiki. I still need to think about what exactly I want it to do (write API access is not a problem, the problem is just the logic of it all).

As for running the translation job if you’re curious, it saves its progress so I can stop and resume any time I want and I estimate around 6-8 days of continuous running to go through everything (there’s a lot of material). Yes we could use an API or even rent a GPU and multithread but eh, I figured I only have to do this once. And there’s a LOT of tokens to translate, you won’t escape that. Even using a cloud API it would probably take a few days of continuous querying.

But compare to doing it by hand which, well, we haven’t even started despite the instance existing for 4 years. So it’s basically 4+ years vs 8 days of work.

Later I can adapt this specific script to work on books to bring more exclusive theory to English like we did for the CIA’s Shining Path which was done with what is now an almost obsolete model lol (and I definitely improved the prompting since that one). I might actually redo CIA’s Shining Path with mistral just to see how it differs.

The problem if anything is this is making me learn stuff like git to make it FOSS and downloadable and make it more robust to handle more usecases lol

About crush:

Before I started using crush I didn’t really get what an agent actually did or helped. So this isn’t just putting a prompt into the web interface and asking it to generate python code. The agent makes sure to take care of everything, including writing functions tests and fixing bugs. That’s right, this thing fixes its code automatically.

It calls tools and terminal commands by itself, and can edit files. When it does you get a git-like preview of the lines edited.

to use crush you just prompt the LLM. “Okay now I want to do this, now I want to do that, there’s a bug here’s the log” and it will work through the problem by itself. It’s scary how fast it does it.

You can extend its capabilities with LSPs and MSPs but I haven’t looked into that yet. Which it was more user-friendly to set up, but I got there in the end.

Caveats:

deepseek boasts a pretty comfy 128k tokens context window, but you run through it quickly because it has to read and understand the entire project. Crush handles this (it makes the LLM write to a crush.md file and then restarts the last command sent when context resets), but you’re still limited. However with tools like deepseek-ocr, if they ever start integrating it, you have potentially infinite context. Clearly they’re going to come up with something, they’re already working on it. But you won’t be recreating twitter with an LLM yet.

You don’t want a specifically coding fine-tune for this as it needs to understand the file structure and the readmes. However I have run into situations where the LLM did stuff it shouldn’t have done, for example deleting the database that keeps track of which files we’ve already worked through because it doesn’t know this is the ‘live’ prod.

Mind you I’m pretty much cobbling this together so I don’t git it or anything, it’s just a one-time script for our specific needs and I shouldn’t put the content files in the same folder as the script, it’s just good practice. I def recommend keeping two copies of your project if you’re not going to git. Crush works on one copy and then you can copy the files over to the other folder.

Oh also no chance of crush deleting system32 as it opens in a specific folder and can’t leave it. Before running a script it also lets you review the code and asks for permission to run.

This is not replacing devs. It’s a great addition to non-devs and devs alike. For non-devs it lets us write our scripts and solve our problems. For devs you spend more time thinking through and planning your app and then send the writing of it to the LLM. As a designer this speaks to me because we plan things a lot lol. And if you know your stuff, you can avoid some of the pitfalls the LLM might go into if you don’t specifically prompt it for it.

If you also don’t know some libraries or APIs very well it can handle them for you. You can totally give crush working code you wrote yourself, it’s just that it might not be the most efficient way to use it since it could also write that code for you.

Your workflow is basically 3-10x more efficient with this and that’s valuable - take a coffee break while it works, you deserve it. You become more of an engineer than a coder and imo this is where dev work is heading.

Translation work:

As for the translation, which is handled by mistral-3.2-instruct (a 24B model that fits on my 16GB and generates at 15 tokens per second, honestly good job France I gotta hand it to you), it’s pretty good but you have to prompt it first. The prompt for this task is ~600 tokens, which is a lot but also not a lot considering I can easily have a 16k context window with this tool.

imo a lot of the “we spend more time fixing the translation than we would have spent doing it ourselves” comes from clients incorrectly prompting stuff (but what else is new lol), translators not necessarily using tools to automate bulk edits, and older models not doing as good a job - deepseek is actually pretty solid at translating because of the thinking, though we didn’t use a thinking model for this task.

Translating the filenames is messier and more prone to hallucinating random characters. I think it’s because it just doesn’t have a lot to work on, you’re asking it to translate 5 seemingly random words. Translating the page content is much better, some pages that I checked are pretty amazing.

Not all languages work equally. I used Mistral specifically bc it’s french so we assume it understands french better. Some languages don’t have ‘enough’ presence to be trained on effectively, and others are just not a priority for devs. Chinese LLMs are seemingly better at Persian for example but still not ‘great’.

Another thing is it sometimes translates jargon two different ways. It would need a dictionary or something like that that says “this word is always translated as X”. I’m sure this will come, and in fact a simple dictionary is probably an old-school method for an LLM already. But you would also need to build that dictionary and when you have 5000 pages of content I just don’t know where you would even begin.

Even with those caveats it gets us 80-90% of the way there and the remaining work will be to fix stuff manually as we come across it. Or with mass regex edits. If we can get interest to the FR instance with this as one of our editors has alluded to, then we can also count on crowdsourcing the rest of it over time.

Conclusion:

We’re doing pretty exciting things for 67 cents.