Following are very interesting insights from Tencent Tech lead。
For the single-point efficiency improvement of LLM in software development, here are some potential applications:
- Intelligent code prompting
- Intelligent generation of code snippets
- Intelligent generation and optimization of SQL statements
- More efficient and accurate static code checking and automatic repair (not rule-based)
- Intelligent assistance in code review and code refactoring
- Automatic generation of unit test and interface test codes
- More advanced duplicate code checking (semantic duplicate checking)
- Automatic analysis and attribution of failed cases
- More precise technical Q&A
Looking at this, you may conclude that programmers will be largely unemployed. Is that really the case? To answer this question, we need to look at the problem from a global perspective. What has LLM changed in software development? What has not changed?
From the above examples, you should be able to appreciate the various possibilities of LLM’s single-point efficiency improvement in software development. These capabilities show us the changes in software development, which I summarize as: the democratization of basic coding knowledge, leading to localized efficiency improvement.
In the past, individual engineers needed a long learning cycle to master a computer language and its corresponding data structures and algorithms. Many experiences and patterns required individual engineers to summarize through extensive practice. Every individual engineer was repeating this process. Now, LLM allows an individual who has not received systematic training to have the same ability. The difference in ability between individuals is leveled by LLM. This is the democratization of knowledge.
If ChatGPT has achieved the democratization of knowledge in the digital age, then large code language models like Codex have achieved the democratization of basic coding ability, thereby bringing about local efficiency improvement in software development.
LLM has lowered the threshold for software development, allowing more people interested in software development to participate more easily. At the same time, LLM has improved the efficiency and quality of programming, allowing us to complete more work in less time, thus leaving us more time to think.
Not long ago, Matt Welsh, a former computer science professor at Harvard University who has held senior engineering positions at Google and Apple, released a video. The main point was that “LLM will represent the end of programming”. He believes that programmers will be eliminated, and only product managers and code reviewers will remain in the future. I don’t know what you think about this?
In my opinion, while holding a sense of awe, we should not rush to conclusions. Why? Because there is still much in software development that has not changed, and these unchanged aspects are the core issues and main contradictions in software engineering.
We are facing the problems of software engineering. Programming is not equal to software engineering, it is just a part of it. The four inherent characteristics of software engineering (complexity, inconsistency, changeability, and invisibility) have not fundamentally changed with the advent of LLM. These are the main contradictions facing software engineering.
From the perspective of complexity, the complexity of the problem domain itself has not changed, and the essential complexity has not changed. What may have changed is only a part of the accidental complexity. Although local coding has become simpler, or more efficient, requirements analysis and software design have not become simpler due to LLM. We’ll discuss this later.
From the perspective of consistency, due to the essence of software development still being “mass collaboration of knowledge craftsmen”, we need consistency. If the system is consistent, it means that similar things are done in similar ways. Making mistakes is not terrible, what’s terrible is the myriad ways of making mistakes. The emergence of LLM has not improved the consistency of software development. In fact, due to the probabilistic nature of LLM itself, the inconsistency issue in code generation using LLM is amplified. We’ll expand on this later.
From the perspective of changeability, software evolves and changes with requirements, so architecture design and module abstraction can only face the present. They are inherently short-sighted or limited. Even the best architects cannot overcome this limitation.
In the agile development model, this problem is even more prominent. Moreover, requirements are scattered, and targets are vague. Without a global view, architecture is naturally limited, so it needs to be iteratively changed. Each iteration only provides a tiny piece of information in the grand view, far from the whole picture, and LLM is helpless in this.
From the perspective of invisibility, the objective existence of software does not have spatial physical characteristics. Different focus points will have different diagrams. It’s difficult to overlay these diagrams, and forced visualization will result in exceptionally complex diagrams, losing the value of visualization. The inability to visualize design restricts effective communication and exchange.
If you add the scale effect of large software, which includes the scale of the software system itself and the scale of the software development team, the problem becomes more serious. It significantly increases the communication cost, decision cost, cognitive cost, and trial and error cost in the software development process. These are the essence of software engineering problems. These essential problems have never changed, and LLM is largely powerless against them.
Based on the above analysis, we can see that the core contradiction of software engineering has not changed. Modern software engineering addresses various problems in a large-scale scenario. The programming efficiency achieved by LLM is just a small part of it. The most important requirements and code evolution patterns have not fundamentally changed. Let’s discuss each of these next
Only when our requirements are clear enough, will the generated code be accurate. Therefore, accurately and comprehensively describing requirements becomes crucial. For natural language programming, firstly, you need the ability to articulate well. But the question is: can you?
Through some practices, we found that the workload to describe the requirements to the extent that it can write the correct code seems to have reached or even exceeded the coding. There are two main reasons for this.
Firstly, most code implementation is imperative, while requirement description is declarative. These two put entirely different demands on people. The education we, as a programmer group, receive is programming, not requirement description. This means that programmers are inherently better at coding, not describing requirements.
Secondly, under the current development model, programmers implicitly compensate for the requirements (product managers) with code. Much of the content not explicitly mentioned in the requirements has been implemented directly by the programmer (compensation). Now it requires a reversal where the details of the requirement must be fully clarified first, which may not be a programmer’s current work habit. Besides, the information entropy of code is actually greater than that of natural language. Programmers are better at describing matters with code rather than natural language.
For example, how do you clearly describe the requirements of a sorting function sort? The numbers output by sort must be arranged from small to large. Is this enough to describe the requirements? Far from it. How do you handle repeated numbers? Is there an upper limit to the number of sorted data? If so, how to prompt? Does the sorting duration need a timeout design? Is it a pre-judgment or a mid-judgment? Are there specific requirements for algorithm complexity? Does the algorithm need to deal with concurrency? What’s the scale of concurrency? And so on.
The requirements of software are not just functional. There are many non-functional requirements that need to be clearly described. Moreover, when implementing code, consideration must be given to design for testability, extensibility, maintainability, observability, etc. A lot of these were previously compensated for by development. Now, to generate code from requirements, you must explain these in advance.
Therefore, our conclusion is: “Software practitioners overestimate the complexity of programming, but underestimate the profundity of function and design”.
For the current software development paradigm, when requirements change, it is usually modified based on the existing code, rather than generating all the code from scratch. At this time, what LLM essentially does is auxiliary to local programming (pair programming). In the process of local programming assistance, it is often necessary to make local modifications to the code, which is often not easy.
We know that the information entropy of code is greater than that of natural language. It is difficult to describe code, especially accurately describe several positions in a large section of code, with natural language of lower information entropy. Imagine how inefficient it would be to tell others where to modify the code in the chat online, compared to pointing at the screen or using a dedicated CR tool, the efficiency gap is huge.
Describing how to modify further would be more difficult, because it probably needs a lot of descriptions related to the code context, so the requirements for the prompt’s expression and length are high.
Besides, the output after LLM accepts the modification suggestion (prompt) is unstable and non-convergent, and it is uninterpretable. LLM does not rewrite based on the modification suggestion (prompt), but rewrites a new one based on the modification suggestion (prompt). The output code requires people to repeatedly read and understand, making the cognitive cost higher.
At the same time, the principle of LLM determines its nature of “talking nonsense seriously”, mixing up some non-existent things. The mixture of falsehoods in AI can be said to be a “confidence” response of AI in ignorance, and this point is disastrous in code generation. For example, it will mix different types of SQL statements together, or confuse Go’s os.Kill with Python’s os.kill(). This problem may need to be alleviated by using AI to audit AI.
As mentioned earlier, to modify based on the existing code, it is necessary to use the existing code context, not to start from scratch. To achieve this, a simple way is to paste the entire project code into the prompt, but it’s not realistic. Because GPT-3.5 can only accommodate up to 4096 tokens, and GPT-4 up to 8192 tokens, unless the project is very small, it will not fit. This problem might need langchain to solve.
LangChain is a middleware linking user-oriented programs and LLM. It “customizes” its own LLM by inputting its own knowledge base. Langchain uses embedding to establish a vector knowledge base specific to the project and realizes “question-answering based on specific documents”.
In the software development process, once the pseudocode-level design is completed, the final kilometer of coding implementation will be replaced by LLM, because simple repetitive coding based on memory is not a human advantage, but a machine advantage.
This part of the job currently belongs to coding monkeys, also known as CRUD workers and API Boys in layman’s terms, so many coders who do not involve design may be replaced by large models.
Engineers, on the other hand, need to focus on business understanding, requirement breakdown, architectural design, and design trade-offs, and learn to cooperate with AI based on these foundations, thereby achieving a 1+1 >2 effect of “Engineer + LLM”. This is symbiosis.
It is worth noting that this kind of symbiosis must always maintain human subjectivity, machines must be Copilots, that is, intelligent co-pilots, and humans must be the main drivers, only such a human-machine relationship can develop healthily in the long term. This is also the fundamental reason why Microsoft’s current CEO, Satya Nadella, emphasizes that Copilot is more advanced than Autopilot.
In addition, it is worth mentioning that: In the short term, engineers who first learn to use LLM will benefit, but soon everyone will master it, and the ability level will be leveled again. This is very similar to the point of view in the previous article “Delivery Riders Stuck in the System”, so as symbiotic engineers, we need to strengthen our abilities in the following three aspects:
Ability to understand, analyze, and break down requirements
Ability to design architecture, analyze architecture, make design trade-offs, and promote the documentation and standardization of design
Understand the essence of the problem, not just learning to apply (Teach a man to fish is better than giving a man a fish)
Consideration 2: Beneficial for Controlling the Scale of R&D Teams and Maintaining the Advantage of Small Teams
As a software scale expands, more and more people participate in the software project, and the division of labor becomes finer and finer, and the amount of communication needed between people also increases exponentially. Soon you will find that the time spent on communication gradually becomes more than the time saved by the division of labor. In plain words, after a certain point, the more people, the more chaotic, not the more helpful. A job that can be completed by one person in 12 months does not necessarily mean that it can be completed by 12 people in 1 month, let alone in 12 months.
The Mythical Man-Month suggests a kind of organization called “surgical-style team”. Just like a surgery, there is a chief surgeon, and a software project should also have a chief programmer, with everyone else providing support. This way, you can both get the product integrity produced by a few minds, and the overall productivity of multiple assistants, while completely reducing the amount of communication.
But as software scales up, more programmers will inevitably be needed, and the team scale will definitely accelerate its expansion. However, the emergence of LLM, which automates basic programming work to some extent, is very beneficial for controlling the scale of R&D teams and maintaining the efficiency advantage of small teams.
The success of large models largely comes from learning from existing Internet text corpora and professional books and other materials. Correspondingly, in the field of software engineering, what needs to be learned is not just code, but also requirements and design.
However, many requirements and designs do not exist in the form of documents, but often exist in the minds of programmers and architects, or during discussions. Even if there are documents, the documents and codes are highly likely to be out of sync. Even if the documents are synchronized, there is often a lot of plan comparison and deliberation behind the documents (requirements and designs), and even many design compromises based on the original debt base, and these decision-making processes are generally not explicitly recorded. This kind of knowledge that has not been documented, we call it “tacit knowledge”.
Although we say that as long as there is enough data, large models can learn the knowledge of requirements and design. But these “tacit knowledge” are difficult to capture in themselves, and the premise of “enough data” may be difficult to meet in requirement analysis and software design.
In addition, in actual software development, requirements may not be expressed clearly at one time, and need to be gradually written clearly while developing. This is especially true for agile development. So for some general problems that do not require specific domain knowledge, LLM’s performance will be better, but for those specialized problems that require specific domain knowledge (private domain knowledge), LLM may not be very good.
In summary, “You can think of more than you can say, you can say more than you can write down.” So this naturally limits the upper limit of LLM’s ability.
Let’s make a bold assumption, if when software requirements change, we no longer change the code, but directly modify the prompt corresponding to the requirements, and then directly generate the complete code based on the prompt, this will be a change in the paradigm of software development.
Under this paradigm, we need to ensure that the code cannot be modified by humans, and must all be directly generated by the prompt. At this time, we also need to version control the prompt, and maybe a new species like git’s prompt version control will appear.
At this point, fundamentally speaking, prompt is code, and the original code is no longer code, which truly realizes programming based on natural language (prompt), and the programming paradigm will change from prompt to code to prompt as code.
To think further, when prompt as code is implemented, do we still need code, and are many engineering practices related to code still important? Now we think code engineering is important because code is written by humans and maintained by humans. But when code is written by LLM and maintained by LLM, is the existing software architecture system still applicable? At this time, maybe the evolution of the software development paradigm has truly been realized.
Consideration 5: Directly executable, the possibility of prompt to executable software development paradigm
Thinking one step further, will the infrastructure of direct execution, prompt to executable appear?
Code is just a part of software engineering, far from all of software engineering. Think about how much time you spend coding. Generally speaking, after the coding is completed, it often has to go through a series of engineering practices such as CI and CD to deliver value to end users.
So can the new software paradigm realize the direct transition from prompt to executable program instance? At present, Serverless may be one of the possible architectures.
After the emergence of LLM, I think there are two levels of reflection on computer education:
Firstly, the change of research direction in computer science. Previously, NLP, knowledge graph, code understanding, code defect discovery, test oracle generation, etc. were all independent research directions. However, the AGI ability shown by LLM seems to make the research of these vertical fields lose its meaning, because the AGI ability of LLM can solve them, perhaps even better.
So where will these research directions go is what we need to think about. Some people say that LLM is a new milestone in NLP, but others think it is more like the epitaph of NLP, which very well expresses my view.
Secondly, LLM has repeatedly proved that by “memorizing + simple reasoning”, it can pass most human exams and technical interviews. So what is the ultimate goal of education? Advanced artificial intelligence attempts to cultivate machines into humans, while backward education attempts to cultivate humans into machines. Computer education, in fact, our entire education is at a time when we need to reflect thoroughly.
Or are we all wrong?!
Peter Drucker once said, “The greatest risk of turbulent times is not turbulence itself, but trying to cope with turbulence with yesterday’s logic.” Today’s impact of LLM on software engineering, I am still analyzing with the previous logic, this foundation may be wrong in the first place, a new era requires a new way of thinking, and then we wait and see.