How the Research Radar Works: An Automated Literature Curation Pipeline (Not Just for Fun)

Every morning at 09:15, a scheduled task starts running on my computer.

It scans the local NetNewsWire database for new papers from the past 24 hours, pulls unread entries from more than twenty journals and preprint sources, then sends the titles and abstracts to a model so it can help me choose the 13 to 15 papers most worth reading that day. I call this small agent Clawdie. The name comes from Claude Code, because that was the first tool that really pulled me into the AI agent rabbit hole.

The scheduling layer runs on Hermes Agent, the timing is controlled by cron, and the summarization and judgment are done by DeepSeek-V4-Pro. The final result is written into my Jekyll website repository, then GitHub Pages builds it automatically. By the time I open my computer in the morning, the Research Radar page is usually already updated.

It does not read papers for me. Its job is narrower: remove noise, filter out papers that are completely unrelated to my interests, and leave me with the day’s themes, recommendation reasons, and DOI links. My real job is still to decide which papers deserve opening and reading carefully.

Why I Built This

Academic publishing really does move too fast.

Nature, Science, Cell, and all of their sibling journals publish new articles every day. Add bioRxiv, medRxiv, and specialized journals on top of that, and manually checking journal websites quickly becomes mechanical labor. The return is not even that good.

The hardest part is not simply that there are too many papers. It is that most papers have nothing to do with the question I actually care about on a given day. I tried several approaches before: RSS readers, email alerts, Zotero keyword searches, and posts people shared on social media. I still use RSS, mainly NetNewsWire. It works well, but the problem is straightforward: I still have to screen everything myself.

The RSS reader I use every day on macOS: NetNewsWire.

Looking at dozens of titles every day, opening some of them, scanning abstracts, and deciding “irrelevant”, “maybe relevant”, or “later” is not hard in isolation. Repeating it every day consumes attention. Worse, the few truly important papers are often mixed into a pile of unrelated material, which makes them easy to miss.

So the goal of Research Radar is simple. It automates the most annoying first layer of screening: the machine reads the abstracts first and ranks them by my research interests; I decide which ones are worth serious reading. I do not think an agent can tell you which scientific questions are important, so it can only serve as an initial filter, helping you spend time on more important problems.

What the Daily Page Looks Like

The morning digest is roughly split into two parts.

The first part is the day’s hot topic. Clawdie looks at the papers selected that day, combines them with my recent Zotero additions and the research-direction hints written into the system prompt, and finds the clearest thread connecting several papers. It usually generates a short summary and a few signal points.

This part is useful to me because it is not just a paper list. It answers a more practical question: among today’s new papers, is there any trend worth noticing?

The second part is the article list. Right now I split it into several sections:

  • Computational, usually 5 papers, mainly methods, algorithms, models, and AI tools.
  • Biomedicine, usually 5 papers, focused on biological findings related to my research.
  • Other Fields, usually 3 papers, mostly AI or computational work that does not fully belong to biomedicine but still seems worth a glance.

On Fridays, there is an extra Biotech News Delivery section with about 5 items, mainly industry updates.

Each paper includes several fields: a very short contribution summary, a “why it matters” note, a more personal “why for Yiru” note, and a recommendation level, such as read carefully, skim, or awareness.

Research Radar example: a daily digest with hot topic and curated article sections.

I deliberately ask the model to write “why”, rather than just give a score. A plain relevance score is not that helpful, because it is hard for me to make a real decision from the difference between 8 and 9. But if it explains how a paper connects to spatial omics, single-cell analysis, computational immunology, or the tumor microenvironment, I can decide much faster whether to open the original paper.

The Workflow

The system is not especially complicated. At a high level, it has three parts: collect sources, let the model curate, and generate the web page.

1. Sources: Control the Entry Point First

All paper sources enter NetNewsWire first.

I subscribe to more than twenty RSS feeds, mainly including AOP feeds from the Nature family, Cell Press, Science, and several computational biology journals and preprint sources, such as PLOS Computational Biology and the methods-oriented sections of bioRxiv.

These feeds are organized into folders in NetNewsWire. The advantage is that I do not need to maintain a separate paper crawling system. NetNewsWire already handles RSS updates, read/unread state, and local storage for me. More conveniently, the NetNewsWire article database is just a local SQLite file. This is important. My pipeline can read that SQLite database directly, without applying for API keys, using OAuth, or worrying that some third-party service will change its interface one day. As long as NetNewsWire updates normally, Research Radar can get the latest article state.

In short, NetNewsWire is responsible for “what is new”; my script is responsible for “what is worth reading”.

2. AI in the Loop: Let the Model Read Abstracts First

Every day at 09:15, a Hermes Agent cron task starts. It connects to the NetNewsWire SQLite database, retrieves unread articles from the past 24 hours, formats their titles, sources, abstracts, and links as input, and sends them to DeepSeek-V4-Pro.

The prompt contains my research profile. Right now it mainly includes spatial omics, single-cell analysis, computational immunology, tumor microenvironment, biomedical AI, and the use of large models and foundation models in scientific research.

The model does more than score papers. It needs to complete several steps:

  • First, judge how relevant each paper is to my research interests.
  • Then, select the papers most worth reading for each section.
  • Next, write a contribution summary, importance explanation, and personal relevance note for each paper.
  • Finally, extract the day’s hot topic from the selected papers.

These contents are written into a Markdown file with YAML metadata at the top. This format works well for later Jekyll rendering.

There is one practical problem, though: model-generated YAML is not always stable.

Early on, it once missed a closing ---, and the entire digest rendered as a blank page on the website. Sometimes the array format was wrong, or the DOI field was empty. So I later added a second cron task that runs at 10:00 specifically for QA.

The QA task checks YAML structure, delimiters, required fields, DOI values, array format, and similar details. If the issue is small, it fixes it automatically. If the issue is more serious, the bad file should not reach the live page.

It is usually invisible, but necessary. After a week, it has not caused any problems.

3. Publishing: From Markdown to the Website

The Markdown file generated by the model is placed in the website repository under _research_radar/YYYY-MM-DD.md.

Hermes Agent commits this file and pushes it to GitHub. After that, GitHub Pages automatically rebuilds the website. Under normal conditions, it takes only a few dozen seconds from push to live page.

On the Jekyll side, I use collections to manage Research Radar entries, then Liquid templates generate the index page, individual pages, and RSS feed.

The final result is a fixed-format digest every day, automatically archived, with permanent links and RSS subscription support.

I try to keep this part boring. Content generation is already unstable enough; the publishing path does not need extra tricks.

Design Tradeoffs

This system has been able to run stably mostly because of several conservative choices.

The first is using SQLite instead of connecting a pile of APIs.

NetNewsWire’s local database already records which articles are new and which ones have been read. Reading it directly is enough. There are no API rate limits, no token rotation, and no sudden breakage because some service redesigned its interface. For this kind of personal automation, a local file is actually the most reliable interface.

The second is asking the model to summarize, not just filter.

At first I thought it would be enough to ask the model to decide “relevant” or “irrelevant”. Later I realized that was not enough. What I really needed was not a binary classifier, but a filter that could explain its judgment.

I want to know why a paper is worth reading, or why it can be skipped. If the explanation is clear enough, then even when I decide not to read the paper, I still roughly know what I am missing.

The third is separating generation, checking, and display.

The model only generates content. The QA task only checks structure and repairs problems. Jekyll only renders the content into pages. That means any layer can be replaced independently.

For example, if I later want to switch from DeepSeek to another model, the website template does not need to change. If I want to redesign the Research Radar page, the prompt does not need to change.

The fourth is not letting automation cross the boundary.

Research Radar only adds papers, ranks them, labels them, and writes explanations. It does not automatically add papers to Zotero, decide research directions for me, or delete anything.

I still want the final judgment to stay with me. Automation can save attention, but it should not use attention on my behalf.

How It Feels So Far

Research Radar is not a complicated system. In essence, it is cron, SQLite, one model call, Markdown, Jekyll, and GitHub Pages stitched together.

But it solves a problem I face every day: I do not want to spend the clearest part of my morning filtering out dozens of irrelevant papers.

Now that part goes to the machine first. It narrows the scope, gives reasons, and puts the result somewhere I already check every day. Which paper to actually read, how to read it, and whether to follow up afterward are still my decisions.

I think this is the right place for this kind of research automation. It is not about letting the machine do research for you, but about letting it waste a little less of your attention. I hope this post helps, and I am happy to talk more about it. This pipeline is not open source for now, because it feels fairly simple, and the idea here should be enough to build something similar.