yeah LLM’s are great, I still have weird ego moments when I open up google or the documentations and I’m trying so hard to kill that bad habit, gpt and claude runs circles around any documentation as you can keep asking it to clarify and simplify until you reach a good base of understanding. The only flaw with an LLM is that it can confidently lie to you & give you information or explanation that sounds reasonable and then it turns out it’s false , which always leaves me paranoid. But then again documentation can also lie cause of human error
I havent looked at local LLM’s I imagine they are even cheaper to train and run for local context
Oh I think that Grok is a sleeper. It is IME way better. More straightforward, less supplicative in the way it responds. And surprisingly non-US-centric. A couple of months ago I used Grok to rework the document taxonomy (categories and subcategories) for property-management purposes. Whereas ChatGPT kept adding US-centric elements (lots of compliance and legal sub-options, for example, that are not as pertinent in Greece) and was oblivious to Greece’s particularities (tax types, permits), I found Grok not only a better discussion counterpart (more to the point, to put it simply), but also surprisingly knowledgeable about Greek specificities that I wasn’t even aware of (confirmed that they weren’t BS with Google Search).
Not to mention that 3 months ago I was trying to get 4o to implement something in TSX and it got me running around in circles for a couple of hours. It couldn’t get unstuck and would give me back the same or older solutions it had offered that didn’t work as it thought they should. I then started going back and forth between 4o and Gemini Flash 1.5. Gemini added tons of comments but also couldn’t solve the issue.
Finally, I took the best response from Gemini, gave it to Grok and told it what 4o and Gemini had tried, and why their “solutions” didn’t work. Grok fixed it in one shot.
That was the moment I started paying for SuperGrok.
I find that it pays off to be paranoid, both with an LLM-generated piece of code and with that generated by a human, be it by myself or by someone junior. There’s a lot of sensationalist FUD about LLMs making people think less and for sure that will happen for many (most?) LLM users, but I find that it can be the exact opposite. Prompting and the back-and-forth makes me explain a problem and in the process I understand more. It also costs very little to explore alternative approaches.
Not infrequently I start typing in my next prompt and half-way through writing I realize what I need to change, without any LLM help.
If there’s a “bicycle for the mind”, LLMs are it (or can be it).
Local LLMs can also be great. codegemma:7b can be useful, but I turned off the Privy extension in Codium and don’t use the Ollama API in Zed, because having a model autocomplete suggestions breaks my flow.
There are also specific instruction-tuned LLMs that deliver great results. We self-host the Ollama API with an RTX 4070 Super and Breek.gr uses a Greek-finetuned model for document classification, titling and summarization, and other purposes. No worries about API tokens, or surprise bills. The RTX was a one-off 600 € business expense a year ago and it has paid back in spades. With smaller models, even CPU inference is tractable (that’s the main premise behind Elixir Chatbot Alchemy, with qwen2.5:1.5b).
For multilingual purposes, aya-expanse:8b also does a great job at generating summaries, though it’s not instruction-tuned, so if you ask it for JSON it generates "Certainly, here’s the JSON you requested… " etc.
treat them as if they are only capable of doing things like summarization and pattern transformation
They are not even capable of this, only of giving the appearance of being so. It’s on the prompter to check the accuracy.
I think the biggest direct risk for most users of LLMs are that the LLM companies are aiming to become gatekeepers. All these efforts to integrate with LLMs are free work for these companies, who will increase the rents once they get enough people dependent on them.
I haven’t read the article, but I’m getting great results from Claude Code tonight on my first elixir project with it.
I have some custom instructions: basically use TDD and YAGNI. (The instructions I’d give to a junior programmer.) I now have the role of a programming manager: approving and discussing todo lists and MVP feature sets. Making observations about algorithmic improvements.