A task class that's going to wait for at least a year before I try giving it to Claude again

Claude is really good. It has definitely become an indispensable part of my toolkit. Maybe even the most indispensable after the OS itself, the editor and the compiler (and git of course).

However, over the last two days I’ve definitely learned the type of tasks it (still) can’t handle well. This one was not even too complex, it’s just it takes a bit of that “returning back to the problem and wrapping your head around it again”:

The change request was effectively to refactor a specific pattern typically having two LiveComponents, one of which is additionally wrapped into a function component and turning it all inside out - effectively inverting the slotted child with its parent. It required some thinking about the assigns, the order and syncing of the rendering relative to fetching the data, taking care something that didn’t use to flash/get hidden while waiting/syncing doesn’t do so this time either (both LC’s being self-managed in terms of fetching their respective contents and reporting any errors to the user), and so on.

My personal estimate for doing this “manually” was 4-5 hrs for the pattern and the first UC leveraging it, together with all the testing.

Claude and I have been on it for ca. 2 work days now and it’s almost done (almost because it’s finally working but we’re now polishing Claude’s left-overs).

The take away:

  1. When faced with across-the-board changes - applying changes over several related elixir modules, some of which “inherit” from (use) other modules, some having their templates slotted into the templates of others, etc., it only seemingly does this better than me (because I initially feel no burden myself), but the payback time comes as soon as the testing begins.
  2. Some of the bugs (types of bugs) it introduces are of the kind a skilled developer would never make such as placing conditional LV renders at will with zero acknowledgement of possible repercussions, freely inventing new naming/terms (that mask the old ones used elsewhere sometimes even in the inverted fashion when booleans) in what seems like an OO paradigm encapsulation (directly conflicting the easy-to-search-for-and-find-later “paradigm”)
  3. Even when pointed to patterns/modules to reuse, it tends to forget it after a while and implement it all over (generate redundant and completely pattern-non-conforming code).
  4. Does such tiny little messy things such as adding conditionally-hidden <div/> wrappers where only one side of the condition is covered (i.e. “hidden”, while forgetting the “contents” for a flex layout item).
  5. Last but not least, it’s way too slow for many cross-template or cross-module operations I can do faster and better.

On the other hand, what it’s truly good at is diagnostics, and adding incremental code/features within a limited, well-defined and isolated context.

UPDATE: Almost forgot - the main issue remains the state tracking (or the lack of).

1 Like

Btw, Claude just made me laugh:

Yes, and the decomposition is small:

Today XYModule conflates two roles:
- (a) ..
- (b) ..
- (c) ..

:joy:

I have never used Claude or GPT, being a total cheapskate, but even with some of the open sourced models that are newer I do find it easy to fall into the trap of thinking about the LLM as something that thinks. I have lately been trying to avoid that trap by referring to the LLMs as PSEs (probabilistic selection engines) or other funny names. They are not capable of thinking and the design and implementations make it certain that you will get wrong answers and unintended results. I think they will get better and better over time, but in their current form you just can’t trust the results without verification. It’s sort of like playing a game of charades. What gesturing do I need to perform in order to get the answer I want. It’s a fascinating piece of technology.

4 Likes

Yeah, they’re far from perfect, but at the same time they (Claude at least) are very helpful. I just tried to point it out and share at what kind of tasks it doesn’t really excel.

Still trying to find the formula for making better informed decisions on when to use it and when not.

For the time being what I vaguely seem to understand is that the quality of its output is inversely proportional to the complexity of the input - not the task itself (in natural human language), but the complexity of the codebase input. It seems that the less it needs to adapt to the better its output.

1 Like

Yeah, that’s a good guard-rail to keep in your brain, I agree.

However, ultimately to me it does not matter one bit whether it “thinks” or not. I have gradually with time found a really good formula to commune with Claude in particular that maximize my returns from it.

One universal law: you must think about the problem and shape it in your head… in details. If you leave an LLM with open-ended questions, they start to assume. This only gets worse the longer the session is and you absolutely will find yourself being super confused WTF is that code that you got at the end of it even supposed to do, and how, and why.

As mentioned in other threads: construct a closed feedback loop for the LLM and it does excellently. Any “looseness” is going to work against you and sabotage you, hard.

4 Likes