Even that screenshot is bogus. When there's no understanding there can be no misunderstanding either. It's misleading to treat the LLM like there is understanding (and for the LLMs themselves to claim they do, although this anthropomorphization is part of their success). It's like asking the LLM "do you know about X?" It just makes no sense.
This is no different from reviewing code from actual humans: someone could have written great looking code with excellent test coverage and still have missed a crucial edge case or obvious requirement. In the case of humans, there's obvious limits and approaches to scaling up. With LLMs, who knows where they will go in the next couple of years.
It is, because a human would have used "thinking" to create this piece of code. There could be error and mistakes but at least you know there there is a human logic behind and you just have to check for things that can easy mistakes for a human.
With AI, in the current state at least, there is no global logic involved with the whole thing that was created. It is a random set of probabilities that generated a somehow valid code. But there is not a global "view" about it that it makes sense.
So when reviewing, you will basically have to do in your head the same mental process as would have done an original human contributor to check that the whole thing makes sense in the first place.
Worst than that, when reviewing such change, you should imagine that the AI probably generated a few invalid versions on the code and randomly iterated until there is something that passed the "linter" definition of valid code.
At first I thought this was a typo, but actually I fully agree with this. If we use LLMs (in their current state) responsibly we won’t see much benefit, because the weight of that responsibility is roughly equivalent to the cost of doing the task without any assistance.
That’s what AI fanboys say every single time I make this point. But “it’s the same for humans” argument only works if you are referring to little children.
Indeed, my airline pilot brother once told me that a carefully supervised 7 year old could fly an airliner safely, as long as there was no in-flight emergency.
And indeed hiring children, who are not accountable for their behavior, does create a supervision problem that can easily exceed the value you may get, for many kinds of work.
I can’t trust AI the way I can trust qualified adults.
Well, you employ different adults than I do, then. Every person I know (including me) can be either thorough, or fast, as the post says, and there's no way to get both.
Even that screenshot is bogus. When there's no understanding there can be no misunderstanding either. It's misleading to treat the LLM like there is understanding (and for the LLMs themselves to claim they do, although this anthropomorphization is part of their success). It's like asking the LLM "do you know about X?" It just makes no sense.
This is no different from reviewing code from actual humans: someone could have written great looking code with excellent test coverage and still have missed a crucial edge case or obvious requirement. In the case of humans, there's obvious limits and approaches to scaling up. With LLMs, who knows where they will go in the next couple of years.
It is, because a human would have used "thinking" to create this piece of code. There could be error and mistakes but at least you know there there is a human logic behind and you just have to check for things that can easy mistakes for a human.
With AI, in the current state at least, there is no global logic involved with the whole thing that was created. It is a random set of probabilities that generated a somehow valid code. But there is not a global "view" about it that it makes sense.
So when reviewing, you will basically have to do in your head the same mental process as would have done an original human contributor to check that the whole thing makes sense in the first place.
Worst than that, when reviewing such change, you should imagine that the AI probably generated a few invalid versions on the code and randomly iterated until there is something that passed the "linter" definition of valid code.
In order to get the full benefit of AI we must apply it irresponsibly.
That’s what it boils down to.
At first I thought this was a typo, but actually I fully agree with this. If we use LLMs (in their current state) responsibly we won’t see much benefit, because the weight of that responsibility is roughly equivalent to the cost of doing the task without any assistance.
Which has also always been the same with people.
That’s what AI fanboys say every single time I make this point. But “it’s the same for humans” argument only works if you are referring to little children.
Indeed, my airline pilot brother once told me that a carefully supervised 7 year old could fly an airliner safely, as long as there was no in-flight emergency.
And indeed hiring children, who are not accountable for their behavior, does create a supervision problem that can easily exceed the value you may get, for many kinds of work.
I can’t trust AI the way I can trust qualified adults.
Well, you employ different adults than I do, then. Every person I know (including me) can be either thorough, or fast, as the post says, and there's no way to get both.