AI Engineering Mindsets and Practices for Computational Biologists (Part I)

Over the past month, I worked on a few curiosity-driven side projects that were not directly related to my computational biology or bioinformatics research. None of them were particularly complex, nor did they require an especially high technical barrier. But they made me realize something quite strongly: what I truly lack may not be mastery of a specific programming language, package, or command, but a more top-down engineering mindset.

I had seen the argument before: in the age of AI, the most important ability may no longer be writing code itself, but the ability to define and organize problems in an engineering-oriented way. For a long time, I did not feel this very deeply. To be honest, I also did not find AI dramatically more convenient for day-to-day bioinformatics analysis.

But when I was developing computational biology algorithms, I had already felt something similar. Code is only a small part of the work. The harder questions are: What problem am I actually trying to solve? How can this problem be decomposed into modules? Which parts should be built from scratch, and which parts can rely on existing tools? How should different modules communicate with each other? How do I know whether the entire system actually works, rather than merely looking successful in a local demo?

At the time, because AI was not yet powerful enough, these questions were partly obscured by the equally important challenge of writing the code itself. However, since late last year or early this year, the agent ecosystem has entered a phase of explosive growth. I suddenly realized that coding ability itself is becoming less central than before. This is especially true for computational biology, where our code usually does not need to deal with many real-world business constraints. In most cases, we mainly need to focus on the scientific question itself.

So, at least from the perspective of computational method development, engineering design has become a core issue that sits above the act of writing code. AI can help with implementation, but it is still very difficult for AI to replace me in defining the problem, structuring the system, and deciding what should be built.

Looking back, I think this may have something to do with my undergraduate training. A science background trains us very well to understand and explain a problem: why a phenomenon matters, what mechanism may underlie it, how we can test it through experiments or analysis, and how we eventually arrive at an explanatory answer.

But engineering problems follow a different logic. Engineering begins with a goal: Can I use the tools, code, models, and knowledge available to me to build a system that runs reliably? Scientific problems often pursue “why.” Engineering problems first ask, “Can this be built, and can it work stably?” Only after that do we iterate.

A great example is OpenClaw. Today, OpenClaw and its related projects are still developing rapidly. But half a year ago, it was still just a relatively ordinary agent project based on Pi, with the goal of connecting to messaging platforms. If we managed this kind of project purely with a scientist’s mindset, perhaps it would never have been released at all. Every time it was about to ship, we might discover that its benchmark performance was not as strong as some newly released framework. Every time a new feature was added, we might also find it hard to fully explain why it worked. The project would then be postponed indefinitely.

Even when we are simply “building a tool,” computational biologists may have very different design orientations. Some tools are driven by biological meaning, some by algorithmic novelty, and others by community extensibility. These differences may seem subtle, but even though I have not been in this field for that long, I can already feel them.

And these subtle differences in design orientation are precisely what begin to distinguish people when low-level code implementation becomes increasingly easy to delegate to AI. What truly determines the final product may no longer be how many syntactic details or command-line flags one can remember, but whether one can define a clear goal, design a reasonable structure, identify weak points in the system, and turn an idea into a genuinely usable tool through testing and iteration.

Starting from this post, I will share some reflections on engineering practices. The first topic in this series is SDD: Spec-Driven Development.

Spec-Driven Development: From Slot-Machine Coding to Controlled System Building

One major reason many people resist using AI for serious work is that using AI often feels like gambling. You write a prompt, pull the lever, and never quite know what will come out. It may generate an impressive demo in one shot. But it may also choose the wrong tech stack, misunderstand your requirements, introduce strange abstractions, or turn what should have been a long-term maintainable project into a disposable script.

But this problem is not entirely caused by the model being “not smart enough.” Very often, the real problem is that we have not given the model a stable, persistent, and verifiable project context.

GitHub’s Spec Kit describes SDD as an approach that puts specifications at the center of AI-assisted software development. The first sentence of their repository says:

An open source toolkit that allows you to focus on product scenarios and predictable outcomes instead of vibe coding every piece from scratch.

Put simply, the idea explained in the GitHub Blog is very direct: a spec is no longer a document that is written once and then thrown away. Instead, it is a living artifact that evolves with the project. It is the shared source of truth among humans, tools, and AI agents. It records project details and development requirements, such as API conventions, approximate frontend UI and features, testing standards, and acceptance criteria.

Many people are familiar with Codex’s Plan Mode. In my view, it can be understood as a one-off SPEC markdown file. A SPEC folder, by contrast, is more like a persistent collection of requirement documents and implementation guides (Table 1, cite).

Dimension Vibe Coding SDD
Primary artifact Natural language prompts Executable specifications
Scope Full application generation System-wide architectural contracts
Validation mechanism Manual review, if any Build fails on spec divergence
AI governance None built in Constitutional constraints and checkpoints
Where truth lives Prompt history Versioned specification

A Very Simple Example: Building a Web-Based Calculator

Let’s use a very simple example. Suppose you are taking a class, and the assignment is to build a basic web-based calculator. Under the mindset of vibe coding, you might open an agent and directly type:

Please help me build a web-based calculator. I want it to be simple, but fully functional.

The agent will probably produce a calculator. But very soon, you may notice a series of problems: it uses JavaScript, while you prefer TypeScript; the color scheme is not what you had in mind; the buttons are arranged in a simple grid rather than laid out like a physical calculator; it does not consider mobile devices; and the state management design is not suitable for adding more operators later.

So you start adding requirements round after round, until the result finally looks good enough to submit.

The real problem appears next week, when the instructor asks you to continue building on the same calculator and add more operators and rules. You open the agent again, only to find that it has long forgotten the implicit agreements you went back and forth on last week. Maybe the color can still be reproduced, but the component structure, button layout, and extensibility of the calculation logic start drifting again.

Eventually, you realize that fixing the old project may cost more than rewriting it. For a small assignment, that may still be acceptable. But what if this were a large system?

This is the failure of prompt history as project memory.

SDD approaches the problem very differently. Even if you do not use Spec Kit, you can simply say to the agent:

I want to build a web-based calculator. Before writing any code, please ask me questions to define the project requirements. After I answer, write a detailed SPEC.md file as the long-lasting constitution for this project.

This step may look slow, but what it is really doing is turning vague requirements into stable context. A sufficiently capable agent will begin asking questions: Do you want TypeScript or JavaScript? Should React be used? Should the buttons imitate a physical calculator? Should keyboard input be supported? Do you need calculation history? Should parentheses and operator precedence be supported? How should the interface behave on mobile? How should invalid input be handled? Will scientific functions be added in the future?

You do not even need to know, at the very beginning, what aspects should be considered when building a calculator. You only need to answer the questions. The requirements gradually emerge through the conversation and are eventually organized into a SPEC file.

Next week, when the assignment asks you to continue developing the calculator, you ask the agent to update the SPEC first, and then generate the plan and tasks based on the updated SPEC. In this way, continuous development becomes possible.

Why SDD Matters for Bioinformaticians

Bioinformatics is naturally compatible with SDD. The reason is simple: biological intent is already a form of specification.

The problem is that, in the past, these specifications were often scattered across many places. Some were in the biological question. Some were in the experimental design. Some were in the Methods section of a paper. Some were discussed during lab meetings. Some were written as comments in notebooks. And a large part of them existed only in our own heads.

For example, in a single-cell RNA-seq analysis project, the real requirement is never as simple as “help me run a Seurat pipeline.” Behind that sentence, there are many questions:

- What biological question is this project actually trying to answer?
- Which conditions, time points, tissues, or treatments do the samples come from?
- Which samples should be included, and which should be excluded?
- How should QC thresholds be set? Why?
- Should doublet detection be performed?
- How should batch effects be handled?
- How should the clustering resolution be chosen?
- What criteria should define marker genes?
- What are the comparison groups for differential expression?
- Which intermediate results must be saved?
- Should the final deliverable be figures, tables, a report, or a reusable pipeline?
- What counts as success? What indicates that the pipeline has failed?

For bioinformatics projects, the most dangerous errors are often not cases where the code crashes. The most dangerous errors are cases where the code runs without any error, while the analysis goal has quietly drifted.

When people talk about an “experienced bioinformatics engineer,” I think this is exactly what they mean: someone who can complete the project robustly without letting the goal drift. By contrast, the biological interpretation of the results may actually be something that scientists are better positioned to handle.

Therefore, I think the significance of SDD for bioinformaticians is not merely that it makes AI-generated code more stable. More importantly, it helps us translate biological intent into engineering constraints.

It allows us to persistently record the biological meaning of a project, the overall pipeline design, parameter choices, QC standards, replaceable modules, intermediate results, and acceptance criteria. In this way, the agent is no longer just a temporary assistant that helps you write scripts. It becomes more like a collaborator working under the same project constitution.

Of course, learning how to write good SPEC files also requires experience and foundational knowledge. In fact, the entry barrier may be no lower than when we first learned to write code by hand. But I believe this working logic is absolutely part of the future.

For computational biology algorithm development, some tasks are even more naturally suited to this approach. Benchmarking is one example: the goal is clear, the tools are clear, the metrics are clear, and the acceptance criteria are clear. In principle, after receiving a good SPEC, an agent can complete all the experiments. I would even say that, in some cases, its stability and objectivity may be better than a human’s.

In short, we should open up our thinking. For tasks that are suitable for agents, we should learn to assign them through a SPEC system.

As a famous Chinese saying goes: taking time to sharpen the axe does not delay the work of chopping wood. In the age of AI coding, perhaps this saying can offer us a new kind of inspiration.

Page Last Updated: