nXsi
HomeProductsBlogGuidesMCP ServersServicesAbout
HomeProductsBlogGuidesMCP ServersServicesAbout
nXsi

Practical guides, automation tools, and self-hosted products for developers and homelabbers.

Content

  • Blog
  • Products
  • Guides
  • MCP Servers

Resources

  • About
  • Services
  • Support
  • Privacy Policy

Newsletter

Weekly AI architecture insights. No spam.

© 2026 nXsi Intelligence. All rights reserved.
  1. Home
  2. Blog
  3. Build an AI News Digest with n8n and Cla…
TutorialintermediateFebruary 21, 2026·27 min read·30 min read hands-on

BuildanAINewsDigestwithn8nandClaude:CompleteTutorial(27Nodes,$0.10/day)

Step-by-step guide to building an automated AI news pipeline that monitors 10 RSS feeds, deduplicates articles, extracts full text, analyzes each one with Claude, and delivers a curated digest to Discord, Slack, or email.

n8nclaudeautomationrssai-newstutorial
Share
XLinkedIn
Table of Contents

Build an AI News Digest with n8n and Claude: Complete Tutorial (27 Nodes, $0.10/day)

In the build log, I walked through the 7-hour process and every error. This tutorial is the clean version — how to build it yourself without repeating my mistakes.

The result is a 27-node n8n workflow that monitors 10 RSS feeds, extracts full article text, analyzes each article with Claude Haiku, compiles a professional digest with Claude Sonnet, and delivers it to Discord, Slack, or email. It runs daily at 7 AM, takes about 2 minutes, and costs roughly $0.10 per run.


What You Are Building

The complete pipeline:

Schedule Trigger (7 AM daily)
  |
  v
Init Run --> Insert Run Record (PostgreSQL)
  |
  v
Feed List (10 categorized RSS/API sources)
  |
  v
Fetch Feed (HTTP Request, sequential, error-tolerant)
  |
  v
Parse Articles (handles RSS, Atom, HN API, Reddit JSON)
  |
  v
24-Hour Filter --> Any Articles? (IF)
  |                     |
  v                     v [no articles]
Dedup Articles         Update Run "no_articles" --> stop
  |
  v
DB Dedup Check (PostgreSQL, 7-day lookback)
  |
  v
Remove DB Dupes --> Select Top 25
  |
  v
Jina Extract (full-text extraction, free tier)
  |
  v
Assemble Extracted --> Prep Analysis Batch
  |
  v
Analyze Article (Claude Haiku, per-article, structured JSON)
  |
  v
Score and Rank (topic weights + social signals)
  |
  v
Select Top 12 --> Compile Digest (Claude Sonnet)
  |
  v
Format Outputs (Discord embeds, Slack blocks, markdown)
  |
  v
Send Discord / Send Slack / Send Email
  |
  v
Save Articles + Update Run Stats (PostgreSQL)

Expected output: A curated digest with a lead story analysis, 4 top stories with 2-sentence summaries, and 6-8 quick-hit one-liners. Color-coded by importance on Discord, Block Kit formatted on Slack, clean HTML for email. Cost and article count in the footer.

Total nodes: 27 (21 standard nodes + 2 AI chain nodes + 2 LLM sub-nodes + 2 delivery nodes)

Daily cost: ~$0.10 (Haiku for batch analysis, Sonnet for the final compilation)


Prerequisites

Before you start, make sure you have these set up. See the Guides for detailed setup links and project-specific notes.

  • n8n -- self-hosted (Docker recommended) or n8n Cloud. I built and tested this on v2.6.4.
  • PostgreSQL -- version 14+. If you do not already have one running, I will show you a Docker Compose setup in Phase 1.
  • Anthropic API key -- sign up at console.anthropic.com and generate an API key. A few dollars covers months of runs.
  • At least one delivery channel -- Discord webhook, Slack bot token, or SMTP credentials.

You do not need to be an n8n expert. I will explain every node configuration. But you should be comfortable navigating the n8n editor and have a basic understanding of what a Code node does.


Phase 1: Database Setup

The digest needs 4 tables to track feeds, log runs, archive articles, and store configuration.

The 4 Tables

TablePurpose
digest_feedsFeed registry with URL, category, feed type, health tracking
digest_runsExecution log -- one row per daily run, with cost and stats
digest_articlesArticle archive with dedup hashes, AI analysis, importance scores
digest_configKey-value configuration (topic weights, limits)

Why separate tables instead of one big one? digest_feeds is your feed registry -- you manage it. digest_articles is ephemeral processed data that grows daily. digest_runs is the audit trail so you can answer "what did Tuesday's digest look like?" and "how much did I spend last month?" digest_config means you can change topic weights without editing workflow code.

Run the Migration

Create a file called 001_create_tables.sql and run it against your database:

BEGIN;

CREATE TABLE IF NOT EXISTS digest_feeds (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name VARCHAR(200) NOT NULL,
  url TEXT NOT NULL UNIQUE,
  category VARCHAR(50) NOT NULL DEFAULT 'general',
  feed_type VARCHAR(20) DEFAULT 'rss',
  is_active BOOLEAN DEFAULT TRUE,
  last_fetched_at TIMESTAMPTZ,
  last_success_at TIMESTAMPTZ,
  last_error TEXT,
  consecutive_failures INTEGER DEFAULT 0,
  article_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS digest_runs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  run_date DATE NOT NULL,
  status VARCHAR(30) DEFAULT 'running',
  feeds_checked INTEGER DEFAULT 0,
  feeds_failed INTEGER DEFAULT 0,
  articles_found INTEGER DEFAULT 0,
  articles_after_dedup INTEGER DEFAULT 0,
  articles_extracted INTEGER DEFAULT 0,
  articles_analyzed INTEGER DEFAULT 0,
  articles_selected INTEGER DEFAULT 0,
  lead_story_title VARCHAR(500),
  digest_markdown TEXT,
  cost_input_tokens INTEGER DEFAULT 0,
  cost_output_tokens INTEGER DEFAULT 0,
  cost_usd DECIMAL(10,6) DEFAULT 0,
  jina_tokens_used INTEGER DEFAULT 0,
  duration_seconds INTEGER,
  error_log JSONB DEFAULT '[]',
  started_at TIMESTAMPTZ DEFAULT NOW(),
  completed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS digest_articles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  url_hash VARCHAR(64) NOT NULL,
  title_normalized VARCHAR(500) NOT NULL,
  url TEXT NOT NULL,
  title VARCHAR(500) NOT NULL,
  source_name VARCHAR(200),
  published_at TIMESTAMPTZ,
  full_text TEXT,
  summary TEXT,
  importance_score INTEGER DEFAULT 5,
  categories JSONB DEFAULT '[]',
  sentiment VARCHAR(20),
  key_entities JSONB DEFAULT '[]',
  why_it_matters TEXT,
  reading_time_min INTEGER DEFAULT 3,
  digest_id UUID REFERENCES digest_runs(id),
  included_in_digest BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_digest_articles_url_hash
  ON digest_articles(url_hash);

CREATE TABLE IF NOT EXISTS digest_config (
  key VARCHAR(100) PRIMARY KEY,
  value JSONB NOT NULL,
  description TEXT,
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

COMMIT;

Seed 10 Starter Feeds

INSERT INTO digest_feeds (name, url, category, feed_type) VALUES
  ('Hacker News',     'https://hn.algolia.com/api/v1/search_by_date?tags=story&hitsPerPage=50', 'tech-general', 'hn_api'),
  ('TechCrunch AI',   'https://techcrunch.com/category/artificial-intelligence/feed/',           'ai-news',      'rss'),
  ('The Verge AI',    'https://www.theverge.com/rss/ai-artificial-intelligence/index.xml',       'ai-news',      'rss'),
  ('Ars Technica',    'https://feeds.arstechnica.com/arstechnica/index',                         'tech-general', 'rss'),
  ('Simon Willison',  'https://simonwillison.net/atom/everything/',                              'ai-tools',     'atom'),
  ('MIT Tech Review', 'https://www.technologyreview.com/feed/',                                  'ai-research',  'rss'),
  ('r/LocalLLaMA',    'https://www.reddit.com/r/LocalLLaMA/hot.json?limit=50',                  'ai-models',    'reddit'),
  ('r/selfhosted',    'https://www.reddit.com/r/selfhosted/hot.json?limit=50',                  'devops',       'reddit'),
  ('Anthropic Blog',  'https://www.anthropic.com/rss.xml',                                       'ai-research',  'rss'),
  ('OpenAI Blog',     'https://openai.com/blog/rss.xml',                                         'ai-research',  'rss')
ON CONFLICT (url) DO NOTHING;

INSERT INTO digest_config (key, value, description) VALUES
  ('max_articles_extract', '25',   'Max articles to send to Jina for full-text extraction'),
  ('max_articles_digest',  '12',   'Max articles to include in final digest'),
  ('topic_weights', '{"ai-models":1.5,"ai-agents":1.5,"ai-tools":1.3,"ai-research":1.2,"security":1.3,"open-source":1.2,"devops":1.1}', 'Topic weight multipliers for importance scoring')
ON CONFLICT (key) DO NOTHING;

Notice the feed_type column. This is important -- each feed returns a completely different data format. Hacker News uses a JSON API, Reddit returns JSON with a nested structure, Simon Willison's blog uses Atom XML, and most others use RSS XML. The parser needs to know which format to expect.

Common Mistake: Do not skip the ON CONFLICT clauses. If you run the seed more than once (which you will during development), duplicates will cause constraint violations without them.


Phase 2: RSS Fetching (4 Nodes)

Open n8n and create a new workflow. We are going to build this left to right, one node at a time.

Node 1: Schedule Trigger

Click the canvas and add a Schedule Trigger node. Set it to Cron Expression mode and enter 0 7 * * * -- that is 7:00 AM daily in whatever timezone your n8n instance uses.

FieldValue
Trigger IntervalCron Expression
Expression0 7 * * *

n8n uses the GENERIC_TIMEZONE environment variable. If you set GENERIC_TIMEZONE=America/Chicago, then 0 7 * * * means 7 AM Central.

Node 2: Init Run (Code)

Add a Code node connected to the Schedule Trigger. Rename it "Init Run."

var now = new Date();
return [{
  json: {
    runDate: now.toISOString().split('T')[0],
    startedAt: now.toISOString()
  }
}];

This normalizes the current timestamp into a clean YYYY-MM-DD date string for the database INSERT. The Schedule Trigger's output does not give us that format.

Node 3: Insert Run (Postgres)

Add a Postgres node. Set Operation to "Execute Query."

INSERT INTO digest_runs (run_date)
VALUES ('{{ $json.runDate }}')
RETURNING id, run_date, started_at

The RETURNING clause is important -- it gives us the auto-generated UUID for this run. Every downstream node that writes to the database needs this ID to link records together.

Node 4: Feed List (Code)

Add a Code node and rename it "Feed List." This outputs one item per feed, each carrying the run ID so it survives through the entire pipeline.

var items = $input.all();
var runId = items[0].json.id;
var feeds = [
  { url: 'https://hn.algolia.com/api/v1/search_by_date?tags=story&hitsPerPage=50',
    name: 'Hacker News', category: 'tech-general', feedType: 'hn_api' },
  { url: 'https://techcrunch.com/category/artificial-intelligence/feed/',
    name: 'TechCrunch AI', category: 'ai-news', feedType: 'rss' },
  { url: 'https://www.theverge.com/rss/ai-artificial-intelligence/index.xml',
    name: 'The Verge AI', category: 'ai-news', feedType: 'rss' },
  { url: 'https://feeds.arstechnica.com/arstechnica/index',
    name: 'Ars Technica', category: 'tech-general', feedType: 'rss' },
  { url: 'https://simonwillison.net/atom/everything/',
    name: 'Simon Willison', category: 'ai-tools', feedType: 'atom' },
  { url: 'https://www.technologyreview.com/feed/',
    name: 'MIT Tech Review', category: 'ai-research', feedType: 'rss' },
  { url: 'https://www.reddit.com/r/LocalLLaMA/hot.json?limit=50',
    name: 'r/LocalLLaMA', category: 'ai-models', feedType: 'reddit' },
  { url: 'https://www.reddit.com/r/selfhosted/hot.json?limit=50',
    name: 'r/selfhosted', category: 'devops', feedType: 'reddit' },
  { url: 'https://www.anthropic.com/rss.xml',
    name: 'Anthropic Blog', category: 'ai-research', feedType: 'rss' },
  { url: 'https://openai.com/blog/rss.xml',
    name: 'OpenAI Blog', category: 'ai-research', feedType: 'rss' }
];
var result = [];
for (var i = 0; i < feeds.length; i++) {
  feeds[i].runId = runId;
  result.push({ json: feeds[i] });
}
return result;

When n8n processes the next HTTP Request node, it will loop through these 10 items one at a time, fetching each feed URL sequentially.

Node 5: Fetch Feed (HTTP Request)

Add an HTTP Request node. This is where the feed data actually gets downloaded.

FieldValue
MethodGET
URL={{ $json.url }}
Response FormatText
Timeout15000

Critical setting: Go to the node settings (the three-dot menu at the top right of the node), scroll to On Error, and set it to Continue (Using Regular Output). This is the single most important configuration in the entire workflow. Without it, if TechCrunch is temporarily down, the entire workflow dies. With it, that feed outputs an error and the pipeline continues with the other 9.

Also add a custom header -- some feeds block requests without a User-Agent:

HeaderValue
User-AgentAI-News-Digest/1.0

Why "Text" instead of "JSON" or "XML" for the response format? Because we are fetching 4 different formats (RSS XML, Atom XML, HN JSON, Reddit JSON). Setting it to "Text" gives us the raw response body, and we parse it ourselves in the next node.

Node 6: Parse Articles (Code)

This is the largest node in the workflow -- about 120 lines. Add a Code node, rename it "Parse Articles," and set Mode to Run Once for All Items (this is critical -- we need all feeds at once to cross-reference).

The code handles 4 completely different feed formats:

Feed TypeStructureTitle LocationDate FieldSocial Score
hn_apiJSON hits[]hit.titlehit.created_atpoints + comments * 2
redditJSON data.children[]post.titlepost.created_utc * 1000score + comments * 2
atomXML <entry> tags<title> content<published> or <updated>0
rssXML <item> tags<title> content<pubDate>0

Here is the core parsing logic. I am showing the key helper functions and the HN/RSS parsers -- the Atom and Reddit parsers follow the same pattern:

var items = $input.all();
var cutoff = Date.now() - 86400000; // 24 hours ago
var allArticles = [];
var feedErrors = [];

function extractTag(xml, tag) {
  var re = new RegExp('<' + tag + '[^>]*><!\\[CDATA\\[([\\s\\S]*?)\\]\\]></' + tag + '>|<' + tag + '[^>]*>([\\s\\S]*?)</' + tag + '>', 'i');
  var m = xml.match(re);
  if (!m) return '';
  return (m[1] || m[2] || '').trim();
}

for (var i = 0; i < items.length; i++) {
  var feed = items[i].json;
  var body = feed.data || feed.body || '';
  if (typeof body !== 'string') body = JSON.stringify(body);

  try {
    if (feed.feedType === 'hn_api') {
      var hn = typeof body === 'string' ? JSON.parse(body) : body;
      var hits = hn.hits || [];
      for (var h = 0; h < hits.length; h++) {
        var hit = hits[h];
        var pubTime = new Date(hit.created_at).getTime();
        if (pubTime < cutoff) continue;
        allArticles.push({
          title: hit.title || '',
          url: hit.url || ('https://news.ycombinator.com/item?id=' + hit.objectID),
          source: 'Hacker News',
          category: feed.category,
          publishedAt: hit.created_at,
          socialScore: (hit.points || 0) + ((hit.num_comments || 0) * 2),
          excerpt: hit.title || '',
          runId: feed.runId
        });
      }
    } else if (feed.feedType === 'rss') {
      var rssItems = body.split(/<item[\s>]/i);
      rssItems.shift(); // remove preamble
      for (var r = 0; r < rssItems.length; r++) {
        var chunk = rssItems[r];
        var pubDate = extractTag(chunk, 'pubDate');
        if (pubDate) {
          var pubTime = new Date(pubDate).getTime();
          if (pubTime < cutoff || isNaN(pubTime)) continue;
        }
        allArticles.push({
          title: extractTag(chunk, 'title'),
          url: extractTag(chunk, 'link'),
          source: feed.name,
          category: feed.category,
          publishedAt: pubDate || new Date().toISOString(),
          socialScore: 0,
          excerpt: extractTag(chunk, 'description').substring(0, 500),
          runId: feed.runId
        });
      }
    }
    // ... similar blocks for 'atom' and 'reddit' feed types
  } catch (err) {
    feedErrors.push({ feed: feed.name, error: err.message });
  }
}

if (allArticles.length === 0) {
  return [{ json: { url: '', _noArticles: true, runId: items[0].json.runId } }];
}
return allArticles.map(function(a) { return { json: a }; });

Why regex XML parsing? n8n Code nodes run in a sandboxed VM where DOMParser does not exist. There is no way to import external libraries. Regex is the only option, and it handles the 4 known feed formats reliably.

The 24-hour filter (cutoff = Date.now() - 86400000) drops anything older than yesterday. Without it, feeds with slow update cycles would pad the digest with week-old articles.

Social score matters. HN and Reddit articles carry engagement data (points, upvotes, comments). RSS and Atom feeds do not. The social score gets used later in ranking -- an article with 500 HN points is probably more significant than one with 3.

Node 7: Any Articles? (IF)

Add an IF node. Condition: {{ $json.url }} is not empty. Set Type Validation to Loose.

The true branch continues the pipeline. The false branch connects to a Postgres node that updates the run record with status = 'no_articles' and stops.

A common trap here: n8n's IF node v2.2 with strict type validation fails when comparing different types (like number vs string "0"). Always use Loose validation unless you have a specific reason not to.


Phase 3: Deduplication (3 Nodes)

The raw parse can produce 50-80 articles from 10 feeds. Many are duplicates -- the same breaking story appears on TechCrunch, The Verge, and Hacker News with slightly different headlines. This phase removes them.

Node 8: Dedup Articles (Code)

This implements two-tier in-memory deduplication. Mode: Run Once for All Items.

Tier 1 -- URL Hash (djb2):

function djb2(str) {
  var hash = 5381;
  for (var i = 0; i < str.length; i++) {
    hash = ((hash << 5) + hash) + str.charCodeAt(i);
    hash = hash & hash;
  }
  return Math.abs(hash).toString(16).padStart(8, '0');
}

Two articles with the same URL produce the same hash. Simple, fast, catches exact URL duplicates.

Tier 2 -- Fuzzy Title (Levenshtein Distance):

After normalizing titles (lowercase, strip punctuation, collapse whitespace), we compute the edit distance between every pair. If the distance is less than 20% of the longer title's length, it is a duplicate.

Why 20%? I tested different thresholds against real data:

  • "OpenAI Releases GPT-5" vs "OpenAI Has Released GPT-5" -- about 8% distance, correctly caught
  • "OpenAI Releases GPT-5" vs "Google Announces Gemini 3" -- about 60% distance, correctly ignored
  • 20% is the sweet spot between catching minor headline rewrites and producing false positives

The output is a single item containing { articles: [...], hashListSql: "'hash1','hash2',...", count: N }. That hashListSql string is pre-formatted for the SQL query in the next node.

Node 9: DB Dedup Check (Postgres)

SELECT url_hash FROM digest_articles
WHERE url_hash IN ({{ $json.hashListSql }})
AND created_at > NOW() - INTERVAL '7 days'

This catches articles that appeared in previous days' digests. A breaking story might appear in Monday's and Tuesday's feeds -- the 7-day lookback prevents it from showing up twice.

Critical setting: Enable alwaysOutputData in the node settings. Without it, if the query returns 0 rows (no duplicates found), n8n considers the node empty and stops execution. This is one of the most common n8n gotchas with Postgres nodes.

Node 10: Remove DB Dupes (Code)

Takes the known hashes from the Postgres query and filters them out:

var pgResults = $input.all();
var knownHashes = {};
for (var i = 0; i < pgResults.length; i++) {
  var hash = pgResults[i].json.url_hash;
  if (hash) knownHashes[hash] = true;
}
var dedupData = $('Dedup Articles').first().json;
var articles = dedupData.articles;
var fresh = [];
for (var i = 0; i < articles.length; i++) {
  if (!knownHashes[articles[i].urlHash]) fresh.push(articles[i]);
}
return [{ json: { articles: fresh, count: fresh.length, runId: dedupData.runId } }];

Notice $('Dedup Articles').first().json -- this is how you access data from a non-adjacent node in n8n. The Postgres node's output is URL hashes, but we need the full article list from the earlier Dedup node. n8n's $() function lets you reach back to any named node in the workflow.

One thing that will bite you: $json in a node after Postgres refers to the Postgres output, not the previous Code node output. If you need data from an earlier node, always use $('Node Name').first().json.


Phase 4: Full-Text Extraction (3 Nodes)

RSS feeds only give you a title and maybe a 200-character excerpt. For Claude to do a meaningful analysis, it needs the actual article text. That is what Jina Reader provides -- for free.

Node 11: Select for Extraction (Code)

Sorts articles by social score and takes the top 25:

var data = $input.first().json;
var articles = data.articles;
articles.sort(function(a, b) { return (b.socialScore || 0) - (a.socialScore || 0); });
var selected = articles.slice(0, 25);
var result = [];
for (var i = 0; i < selected.length; i++) {
  selected[i].runId = data.runId;
  result.push({ json: selected[i] });
}
return result;

Why 25? Jina Reader makes a real HTTP request to each article URL, renders the page, and extracts the text. That takes 1-3 seconds per article. 25 articles means about 1-2 minutes of extraction time. That is enough to guarantee 12 quality candidates after analysis. Testing showed feeds return 30-80 raw articles; extracting all of them would take too long and potentially hit rate limits.

Each article becomes a separate n8n item so the HTTP Request node processes them individually.

Node 12: Jina Extract (HTTP Request)

FieldValue
MethodGET
URL=https://r.jina.ai/{{ $json.url }}
Response FormatJSON
Timeout30000
On ErrorContinue (Regular Output)

Add these headers:

HeaderValue
Acceptapplication/json
X-Return-Formatmarkdown

Jina Reader works by prepending r.jina.ai/ to any URL. It fetches the page, strips navigation and ads, and returns clean markdown content. (I tried Mozilla's Readability library first, but it choked on JavaScript-heavy sites like TechCrunch — Jina handles those fine because it actually renders the page.) The free tier works without an API key and allows about 1,000 requests per day -- more than enough for our 25 daily articles.

I originally had an X-Token-Budget: 4000 header to limit response size, but it caused 409 BudgetExceededError on almost every article (typical articles are 10,000-40,000 tokens). Removing it lets Jina return the full content, and we truncate in the next node instead.

Common Mistake: Setting X-Token-Budget too low on Jina Reader. If the article exceeds the budget, Jina returns a 409 error instead of a truncated response. Either remove the header entirely or set it to something generous like 50000.

Node 13: Assemble Extracted (Code)

Collects Jina results back into a single array, matching each extraction to its source article:

var jinaResults = $input.all();
var articles = $('Select for Extraction').all();
var assembled = [];
var totalJinaTokens = 0;

for (var i = 0; i < articles.length; i++) {
  var article = articles[i].json;
  var jina = jinaResults[i] ? jinaResults[i].json : {};
  var fullText = '';
  var hasFullText = false;

  if (jina.data && jina.data.content) {
    fullText = jina.data.content.substring(0, 8000);
    hasFullText = true;
    if (jina.data.usage) totalJinaTokens += jina.data.usage.tokens || 0;
  } else if (jina.content) {
    fullText = String(jina.content).substring(0, 8000);
    hasFullText = true;
  } else {
    fullText = article.excerpt || '';
  }

  article.fullText = fullText;
  article.hasFullText = hasFullText;
  assembled.push(article);
}

return [{ json: { articles: assembled, jinaTokens: totalJinaTokens, count: assembled.length } }];

The fallback chain is important: if Jina succeeded, use the full text. If Jina returned content in a different structure, try an alternate path. If Jina failed entirely, fall back to the RSS excerpt. Every article gets some text for analysis, even if extraction failed.

The substring(0, 8000) truncation keeps token costs predictable. A full extracted article can be 20,000+ characters. 8,000 characters gives Claude enough context for a good analysis without blowing up the per-article cost.


Phase 5: AI Analysis (3 Nodes)

The architecture decision in this phase is the difference between $0.10/day and $0.25/day.

The Basic LLM Chain vs AI Agent Decision

n8n offers two ways to call an LLM: the AI Agent node and the Basic LLM Chain node. I initially built this with the AI Agent node and discovered a critical cost problem.

The AI Agent (conversationalAgent) accumulates conversation history across batch items. When you process 25 articles sequentially:

  • Article 1: 777 input tokens (system prompt + 1 article)
  • Article 5: 3,800 input tokens (system + all 5 previous articles and responses)
  • Article 25: 15,905 input tokens (system + all 25 articles and responses)
  • Total: 203,000 input tokens when it should be 32,000

The AI Agent was designed for multi-turn conversations, not batch processing. It sends the entire chat history with every call because that is how conversational context works. For batch analysis where each article is independent, this is pure waste.

The Basic LLM Chain has no memory. Each call is a fresh system prompt + user message. Same quality, roughly the same node configuration, but 6x fewer tokens because there is no conversation history accumulating.

Use Basic LLM Chain for batch processing. Use AI Agent for conversational workflows. This is probably the most expensive mistake you can make in n8n AI workflows, and I have not seen it documented anywhere.

Node 14: Prep Analysis Batch (Code)

Builds a per-article prompt and splits articles into individual items:

var data = $input.first().json;
var articles = data.articles;
var result = [];

for (var i = 0; i < articles.length; i++) {
  var a = articles[i];
  var prompt = 'Analyze this article and return ONLY valid JSON:\n\n'
    + 'TITLE: ' + a.title + '\n'
    + 'SOURCE: ' + a.source + '\n'
    + 'CATEGORY: ' + a.category + '\n'
    + 'URL: ' + a.url + '\n\n'
    + 'FULL TEXT:\n' + (a.fullText || a.excerpt || 'No text available') + '\n\n'
    + 'Return this exact JSON structure:\n'
    + '{"summary":"2-3 sentence summary","importance":<integer 1-10>,'
    + '"categories":["primary","secondary"],"sentiment":"positive|negative|neutral|mixed",'
    + '"key_entities":["entity1","entity2"],'
    + '"why_it_matters":"One sentence on practical impact","reading_time_min":<integer>}';

  result.push({ json: { prompt: prompt, article: a, runId: data.runId, index: i } });
}
return result;

Why build prompts in a Code node? The Basic LLM Chain's text parameter can only hold a simple expression. Complex string construction with article data, field fallbacks, and structured output templates needs JavaScript.

Node 15 + 16: Analyze Article (Basic LLM Chain) + Haiku Analyzer

Add a Basic LLM Chain node from the AI section. Then click into it and add an Anthropic Chat Model sub-node underneath.

Basic LLM Chain configuration:

FieldValue
Prompt TypeDefine
Text={{ $json.prompt }}

Anthropic Chat Model (Haiku Analyzer) configuration:

FieldValue
Modelclaude-haiku-4-5-20251001
Max Tokens2048
Temperature0.2

Set the system message on the Basic LLM Chain to this:

You are an expert news analyst. You analyze technology articles
and output structured JSON assessments.

SCORING GUIDE (importance 1-10):
10: Industry-reshaping announcement (new major model, regulation, acquisition >$1B)
8-9: Significant development with broad impact (major product launch, critical vulnerability)
6-7: Notable development in a specific domain (new tool, research, meaningful OSS release)
4-5: Incremental update or niche development (version bump, minor feature, limited audience)
1-3: Routine news, announcements, or opinion pieces with no actionable insight

CATEGORIES (use these exact strings):
ai-models, ai-tools, ai-research, ai-agents, security, devops, open-source,
cloud, hardware, startups, regulation, programming

RULES:
- Return ONLY valid JSON. No markdown formatting, no backticks, no explanation.
- If the article text is missing, score importance lower (max 5) and note
  "limited analysis" in summary.
- Be ruthlessly honest about importance. Most articles are 4-6. Reserve 8+
  for genuinely significant events.
- The "why_it_matters" should be actionable: "means X for developers" not
  "this is interesting."

That "ruthlessly honest" instruction matters. Without it, LLMs rate everything 7-8. I went through three iterations of this prompt before the scores spread properly. In the final version, test runs produced scores ranging from 3 to 9 with an average of 6.48 -- genuine differentiation.

Error handling on the Basic LLM Chain node: Set On Error to Continue (Using Regular Output) and enable Retry on Fail (2 retries, 3-5 second wait). Anthropic occasionally returns HTTP 529 (overloaded) during peak hours. The retry handles transient failures, and the error-continue prevents one failed analysis from killing the entire digest.

Why not use the AI Agent node? Because it accumulates conversation history across batch items. It works, but it costs 6-10x more than the Basic LLM Chain. If your AI node processes multiple items sequentially and each item is independent, the Agent node is burning tokens on conversation context that adds zero value.

Node 17: Score and Rank (Code)

Parses AI responses, applies topic weights, and ranks articles. The JSON parsing is defensive because LLMs sometimes produce imperfect output:

// 3-level JSON parsing fallback
var responseText = items[i].json.text || '';

// Level 1: Clean parse
var cleaned = responseText.replace(/```json\n?/g, '').replace(/```\n?/g, '').trim();
var js = cleaned.indexOf('{');
var je = cleaned.lastIndexOf('}');
if (js !== -1 && je !== -1) {
  try {
    analysis = JSON.parse(cleaned.substring(js, je + 1));
  } catch(e) {
    // Level 2: Bracket repair
    var partial = cleaned.substring(js);
    var opens = (partial.match(/\{/g) || []).length;
    var closes = (partial.match(/\}/g) || []).length;
    while (closes < opens) { partial += '}'; closes++; }
    try { analysis = JSON.parse(partial); } catch(e2) { analysis = null; }
  }
}

// Level 3: Safe defaults
if (!analysis) {
  analysis = { summary: 'Analysis unavailable', importance: 3,
    categories: ['unknown'], sentiment: 'neutral', key_entities: [],
    why_it_matters: 'Unable to analyze', reading_time_min: 3 };
}

Why three levels? Level 1 handles clean responses (most of the time). Level 2 repairs truncated JSON -- when the LLM output gets cut off mid-object, adding missing closing brackets often fixes it. Level 3 provides safe defaults when everything else fails.

Topic weight multipliers:

var topicWeights = {
  'ai-models': 1.5, 'ai-agents': 1.5, 'ai-tools': 1.3,
  'ai-research': 1.2, 'security': 1.3, 'open-source': 1.2, 'devops': 1.1
};

The scoring formula: weightedScore = (importance * maxTopicWeight) + socialBoost where social boost is min(socialScore / 200, 2). An article about a new AI model (1.5x weight) with 300 HN points (+1.5 social boost) gets a significant ranking advantage over a routine DevOps update with no social engagement.

The output is a single item with all articles sorted by weightedScore descending.


Phase 6: Digest Compilation (2 Nodes)

Ranked articles go in. A human-quality digest comes out.

Node 18: Select Digest Articles (Code)

Takes the top 12 and classifies them into three tiers:

  • Lead story: #1 by weighted score -- gets a full analysis paragraph
  • Top stories: #2 through #5 -- each gets a 2-sentence summary
  • Quick hits: #6 through #12 -- one line each, just the key fact

The node builds the compilation prompt with all article summaries, importance scores, and "why it matters" text, then outputs it as a single item.

Node 19 + 20: Compile Digest (Basic LLM Chain) + Sonnet Compiler

Same pattern as the analysis phase -- Basic LLM Chain with an Anthropic Chat Model sub-node. But this time we use Sonnet instead of Haiku.

Sonnet Compiler configuration:

FieldValue
Modelclaude-sonnet-4-5-20250929
Max Tokens8192
Temperature0.4

System prompt:

You compile daily AI & tech news digests. Your output is the final digest
that gets sent to Discord, Slack, and email.

VOICE: Professional but approachable. Write like a knowledgeable colleague
summarizing what they read today, not a news anchor. Be direct and opinionated
about why things matter.

RULES:
- Lead analysis should have genuine insight, not just restate the headline
- "Why it matters" must be actionable: what should the reader know or do differently
- Top story summaries: exactly 2 sentences each, no fluff
- Quick hits: maximum one line each, start with the key fact
- If you spot a trend across stories, mention it in trend_note
- Return ONLY valid JSON

Why Sonnet for compilation when Haiku did the analysis? The digest is the user-facing product. Sonnet writes noticeably better lead analysis, catches thematic connections between stories, and produces more engaging "why it matters" insights. Since compilation runs once per digest (not per article), the cost is about $0.03-0.06 per run. It is the one place where paying for quality makes a measurable difference in the output.

The dual-model economics: Haiku for 25 batch analyses ($0.06) + Sonnet for 1 compilation ($0.03-0.06) = ~$0.10/run. Using Sonnet for everything would cost ~$0.45/run. Using Haiku for everything would produce a noticeably worse lead analysis.


Phase 7: Multi-Channel Delivery (3 Nodes)

Node 21: Format Outputs (Code)

The second-largest Code node at about 150 lines. It transforms Claude's digest JSON into delivery formats.

Discord embeds -- color-coded by importance:

  • Red (0xFF0000) for scores 9-10
  • Orange (0xFF8C00) for 7-8
  • Blue (0x3498DB) for 5-6
  • Grey (0x95A5A6) for quick hits

The lead story gets its own embed with the full analysis text and a "Why It Matters" field. Top stories get individual embeds with 2-sentence summaries. Quick hits are consolidated into a single embed with bullet points. A stats embed at the bottom shows articles analyzed, articles selected, and API cost.

Slack Block Kit -- header block with the subject line, context block with stats, section blocks per story with <url|title> rich links, and the trend note as a context block.

Markdown -- clean format with standard headings, links, blockquotes. Good for email or archiving.

Cost calculation built into the output:

var haikuCost = (analysisInputTokens * 1 + analysisOutputTokens * 5) / 1000000;
var sonnetCost = (compileInputTokens * 3 + compileOutputTokens * 15) / 1000000;
var totalCost = Math.round((haikuCost + sonnetCost) * 1000000) / 1000000;

Every digest footer includes the actual API cost. Transparency builds trust and helps you spot if something is costing more than expected.

Node 22: Send Discord (HTTP Request)

FieldValue
MethodPOST
URLYour Discord webhook URL
Content TypeJSON
Body{{ JSON.stringify({ embeds: $json.discordEmbeds }) }}
On ErrorContinue (Regular Output)

To get a webhook URL: go to your Discord server, Server Settings, Integrations, Webhooks, New Webhook. Copy the URL.

Node 23: Send Slack (Slack Node)

This one has multiple gotchas. Here is the correct configuration:

FieldValue
AuthenticationAccess Token
ResourceMessage
OperationPost
ChannelYour channel ID
Message TypeBlock
Blocks={{ JSON.stringify({ blocks: $json.slackBlocks }) }}
Text={{ $json.subjectLine }}

Three things that will trip you up:

  1. The channel ID must be in n8n's resource locator format: {"__rl": true, "mode": "id", "value": "C0123456789"}. A plain string fails silently.

  2. messageType must be "block", not "text". The blocksUi field expects a JSON string wrapping the blocks in {"blocks": [...]} format. n8n parses it internally.

  3. The text field is required. Slack's API rejects messages without it, even when blocks are present. The text value shows in push notifications and screen readers.

I spent 90 minutes debugging the Slack delivery. First it rejected the message entirely (no_text error). Then I got text to work but the blocks did not render. The fix was switching from messageType: "text" with blocks in otherOptions to messageType: "block" with the JSON string in blocksUi.

Do not put blocks in otherOptions.blocks with messageType: "text". This sends the text but silently drops the blocks. You must use messageType: "block" with the blocksUi field. I lost 90 minutes to this exact mistake.


Phase 8: Persistence (3 Nodes)

Node 24: Save Articles (Code)

Builds a batch INSERT query with proper SQL escaping:

function esc(str) {
  if (str === null || str === undefined) return 'NULL';
  return "'" + String(str).replace(/'/g, "''") + "'";
}

Every field is escaped and cast to the correct Postgres type. JSONB fields like categories and key_entities are cast with ::jsonb.

Node 25: Save Articles DB (Postgres)

Executes the dynamically-built query:

INSERT INTO digest_articles (url_hash, title_normalized, url, title, source_name, ...)
VALUES (...), (...), (...)
ON CONFLICT (url_hash) DO NOTHING

The ON CONFLICT DO NOTHING clause makes the workflow idempotent. If a run partially completed and you re-run it, articles already saved are skipped silently.

Node 26: Update Run Complete (Postgres)

Updates the run record with final stats:

UPDATE digest_runs SET
  status = 'completed',
  feeds_checked = 10,
  articles_found = {{ $('Format Outputs').first().json.articlesFound }},
  articles_selected = {{ $('Format Outputs').first().json.articlesSelected }},
  lead_story_title = {{ $('Format Outputs').first().json.leadTitle }},
  cost_usd = {{ $('Format Outputs').first().json.costUsd }},
  jina_tokens_used = {{ $('Format Outputs').first().json.jinaTokens }},
  completed_at = NOW(),
  duration_seconds = EXTRACT(EPOCH FROM (NOW() - started_at))::int
WHERE id = '{{ $('Insert Run').first().json.id }}'

All values come from the Format Outputs node using $('Node Name').first().json cross-references. The run ID comes from the Insert Run node at the beginning of the pipeline.


Testing and Activation

Manual Test Run

  1. Click Execute Workflow (the play button in n8n)
  2. Watch the execution flow -- it takes about 2 minutes
  3. Check each node's output by clicking on it:
    • Parse Articles should show 30-80 articles from the 10 feeds
    • Dedup Articles should reduce that count
    • Score and Rank should show importance scores from 3-9 (not all 7s)
    • Format Outputs should have discordEmbeds, slackBlocks, and markdown fields
  4. Check your delivery channels for the digest
  5. Verify the database:
    SELECT status, articles_found, articles_selected, cost_usd, duration_seconds
    FROM digest_runs ORDER BY created_at DESC LIMIT 1;
    

What to Expect

From my test runs:

MetricTypical Value
Feeds checked10
Articles parsed50-80
After dedup25-40
Extracted via Jina25
Selected for digest12
Duration90-130 seconds
API cost$0.08-0.12
Score distribution3 to 9, average ~6.5

Activate the Schedule

  1. Toggle the Active switch in the top-right corner of the workflow editor
  2. If you are running self-hosted n8n in Docker: restart the container after activation. Cron triggers sometimes do not register until after a restart.
    docker restart n8n
    
  3. After restart, verify the workflow is still active in the n8n UI

From my experience: after restarting n8n, deactivate and reactivate the workflow via the UI. This forces n8n to re-register the cron trigger. I had a case where the workflow showed "Active" but the cron never fired until I toggled it off and on again.


Common Mistakes Reference

Here are the errors I hit while building this, ranked by how much time they wasted:

90 minutes: Slack blocks not rendering. Fix: use messageType: "block" with blocksUi, not messageType: "text" with otherOptions.blocks.

60 minutes: Cron trigger not firing after activation. Fix: restart n8n, then deactivate and reactivate the workflow via the n8n UI or API. Add timezone: "America/Chicago" (or your timezone) to the workflow settings.

30 minutes: Manual API execution does not work in n8n queue mode. Fix: use temporary near-future cron triggers for testing, or just click the Execute button in the editor.

20 minutes: Cannot read properties of null in Score and Rank. Fix: add an explicit null check after JSON parsing. If the AI response contains no { character, analysis stays null and crashes.

15 minutes: Anthropic 529 rate limits during batch analysis. Fix: enable Retry on Fail (2-3 retries, 3-5 second wait) on the Analyze Article node.

10 minutes: Jina Reader 409 BudgetExceededError. Fix: remove the X-Token-Budget header entirely.


Cost Summary

ComponentPer RunMonthly (30 days)
Claude Haiku (25 article analyses)~$0.06~$1.80
Claude Sonnet (1 digest compilation)~$0.04~$1.20
Jina Reader (25 extractions)$0.00$0.00
PostgreSQL$0.00$0.00
Total~$0.10~$3.00

For comparison: if you used Sonnet for everything (analysis and compilation), the per-run cost jumps to about $0.45. If you used the AI Agent node instead of Basic LLM Chain for batch analysis, the conversation history accumulation would push it to $0.25+ per run. The dual-model approach with the right node type keeps costs at $0.10.


What You Can Customize

Once the workflow is running, here are the things most worth tuning:

  • Add feeds by adding entries to the Feed List code node. Swap out feeds that are not relevant to your interests.
  • Change topic weights in the Score and Rank node. If you care more about security than AI research, bump security to 1.5 and drop ai-research to 1.0.
  • Adjust the schedule by changing the cron expression. 0 7 * * 1-5 runs weekdays only. 0 8,18 * * * runs twice daily.
  • Change article counts. The 25 in Select for Extraction and 12 in Select Digest Articles are easy to modify. More articles = slightly higher cost.
  • Tune the prompts. The scoring guide and voice instructions in the system prompts are where most of the output quality comes from. The prompts/ directory in the product kit has annotated versions.

Skip the Build -- Get the Kit

If you would rather import the finished workflow and be running in 30 minutes instead of building each node by hand, I have packaged everything into a ready-to-go product kit:

  • Complete 27-node workflow JSON (import and go)
  • Database migrations and seed data
  • Docker Compose for standalone PostgreSQL
  • Full prompt files with customization annotations
  • 30+ recommended feeds across 8 categories
  • Troubleshooting guide with every error from this tutorial and its fix
  • Cost optimization guide

Get the AI News Digest Kit — Free Download


Built and documented by Dyllan at nxsi.io. This is the exact workflow I run daily. The numbers, errors, and costs in this tutorial are real -- not theoretical.

On this page

Get weekly AI architecture insights

Patterns, lessons, and tools from building a production multi-agent system. Delivered weekly.

Series: AI News Digest with n8n + ClaudePart 2 of 3
← Previous

I Built an AI News Digest with n8n and Claude API — Here's Everything That Went Wrong (and How I Fixed It)

Next →

Practical Claude API Prompt Engineering: Lessons from 500+ Automated Article Analyses

Series: AI News Digest with n8n + ClaudePart 2 of 3
← Previous

I Built an AI News Digest with n8n and Claude API — Here's Everything That Went Wrong (and How I Fixed It)

Next →

Practical Claude API Prompt Engineering: Lessons from 500+ Automated Article Analyses

Related Product

AI News Digest — n8n Workflow Kit with Claude API

Replace 45 minutes of daily news scanning with an automated, ranked digest. 10 RSS feeds, full-text analysis with Claude, multi-channel delivery. 7 hours of debugging already done for you.

Get Free Download

Read Next

Build Log14 min

I Built an AI News Digest with n8n and Claude API — Here's Everything That Went Wrong (and How I Fixed It)

A chronological build log of creating a 27-node n8n workflow that fetches 10 RSS feeds, deduplicates with Levenshtein matching, extracts full text via Jina Reader, analyzes with Claude Haiku, compiles with Claude Sonnet, and delivers to Slack — for $0.10/day.

Guide14 min

Practical Claude API Prompt Engineering: Lessons from 500+ Automated Article Analyses

Hard-won prompting patterns from building a production pipeline that analyzes 25+ articles daily with Claude. Scoring calibration, structured JSON output, dual-model economics, and the single phrase that fixed everything.

Guide4 min

Prerequisites Reference: API Keys, Tools, and Infrastructure for nxsi.io Projects

Central reference for every API key, tool, and infrastructure component used across nxsi.io tutorials and build logs. Direct links to official setup pages.