Methodology

How Pulvian collects, scores, and surfaces workplace signals — including confidence, freshness weighting, and current limitations.

Signal-oriented, not definitive. Pulvian surfaces patterns in publicly available data. Scores are directional indicators, not certified assessments. Always combine these signals with your own research.

Scoring pipeline — overview

Collect

RSS, HN, GDELT, SEC filings

Normalize

Tokenise, deduplicate, entity-match

Weight

Recency decay · 90-day half-life

Score

Keyword engine → stress + risk

Confidence

Source breadth + review volume

Where the data comes from

Pulvian aggregates publicly available information from the sources listed below. No private data, no scraping of non-public content, and no paid insider information is used. Each data point is timestamped and stored as a snapshot — the score reflects conditions at a specific point in time, not a real-time feed.

Hacker News

Tech community discussion and sentiment (via Algolia HN Search API)

GDELT

Global news event database — monitors press coverage of companies worldwide

RSS feeds

Company newsrooms, press releases, and industry publications

SEC EDGAR

Public company filings — 10-K annual reports, 8-K material events, and proxy statements

GitHub

Public repository activity — commit cadence, contributor signals, and open-source health indicators

App Store

Apple App Store customer reviews — for consumer-facing companies, sustained low ratings (< 3.0) serve as an operational risk proxy via Apple's official RSS feed

Yahoo Finance

Financial news headlines from Yahoo Finance's public RSS feed — layoffs, leadership changes, earnings, regulatory events for publicly traded companies

Company metadata (founding year, headcount range, industry) is sourced from Wikidata and supplemented by a curated seed dataset. Metadata is not used in scoring — only in filtering and display.

Worked example — one signal traced through the pipeline

“Constant crunch, no work-life balance — management keeps adding scope with no extra time or headcount.”

Hacker News · Acme Corp · 34 days before scoring run

CollectIngested from HN adapter · timestamped 2026-04-21 · entity-matched to slug "acme-corp"

NormalizeLowercased + tokenised · duplicate fingerprint: none found · source weight: 1.0 (HN, primary tier)

WeightAge = 34 days · decay = e^(−0.693 × 34/90) ≈ 0.77 · signal carries 77% of full strength

Score"crunch" → stress +8 · "no work-life balance" → stress +12 (negation check: none found) · "adding scope" → stress +5 · raw +25 × 0.77 ≈ +19 stress pts contributed

ConfidenceSignal count +1 → confidence ticks up; HN already represented so source-breadth score unchanged · final confidence: 58

Every signal in the dataset goes through the same deterministic path. The same document always produces the same output — no LLM, no randomness.

The three scores

Every company snapshot produces three scores, each on a 0–100 scale:

Raw signals

Reviews

Growth

Sentiment

↑ older · recent ↓ · recency-weighted

Scoring model

deterministic ·
90-day half-life

Output scores

Stress

Employee strain & management friction

Risk

Retention risk & instability signals

Confidence

How much data backs this snapshot

Older signals fade; recent signals carry more weight. Scores update when new data is ingested.

Stress

0 – 100

Aggregate signal of employee strain, management friction, and workload pressure. Higher = more stress indicators found in the data.

Risk

0 – 100

Combination of retention risk, leadership instability, and growth deceleration signals. Higher = more risk flags.

Confidence

0 – 100

How much data backs this snapshot. Higher = more sources, more reviews, more corroborating signals. Low confidence means treat the score with extra caution.

Confidence — check this first

Confidence is the most important score to look at before interpreting stress or risk. A stress score of 75 backed by 3 reviews is very different from the same score backed by 800 reviews across multiple sources. The confidence score encodes that difference.

When confidence is below 30, treat the directional finding as a weak or early signal. When confidence is above 70, the finding is more reliable — though still not a guarantee. The coverage badge visible on each company page gives a plain-language summary (Strong / Moderate / Limited) derived from source count, review count, and confidence score.

Freshness and recency weighting

Not all signals carry equal weight over time. Pulvian applies exponential decay to older data points using a 90-day half-life: a signal that is 90 days old contributes roughly half as much to the score as an equivalent signal collected today.

This means recent spikes in negative sentiment matter more than a stale review from two years ago — which better reflects how workplace conditions actually evolve. The snapshot date shown on each page reflects when the most recent data was ingested, not when the company was first added.

Stress levels and risk levels

In addition to numeric scores, each snapshot carries a categorised level (low / medium / high) derived from fixed thresholds applied to the numeric score. These thresholds do not currently vary by industry or company size.

Low

0 – 34

Few stress or risk indicators found in the available data.

Medium

35 – 64

A moderate number of signals present — worth monitoring.

High

65 – 100

Elevated concentration of stress or risk indicators.

Future versions may introduce industry-relative benchmarking, which would make the level more contextually meaningful.

Summaries

Where available, snapshots include a short summary of the signals found. Unlike the scores — which are fully deterministic and use no machine learning — the optional summaries are AI-generated: a large language model writes them from the same public signals, with a deterministic rule-based fallback when no model is available. Each summary is labelled with the model (or "deterministic") that produced it. They are supplementary context that may not capture the full picture — not editorial opinions, and not an input to the scores.

Positives and negatives listed in a summary reflect patterns in the source data. When no summary is available, the scorecard is still shown — scores do not depend on summary generation.

Signal extraction and negation

Signals are extracted from public documents using a deterministic, keyword-based rule engine. Each rule maps a set of phrases (e.g. "layoffs", "hiring freeze", "record revenue") to a sentiment value. When a keyword is matched, the surrounding context is checked for negation cues — words like "avoids", "no", "not", or "despite" that reverse the meaning. Negated matches are suppressed to reduce false positives.

This approach is transparent and reproducible: the same document always produces the same signal. It does not use machine learning or large language models for scoring.

Company comparison and discovery

The Similar companies section on each company page shows other companies in the same industry, ordered by score proximity. This helps you explore related organisations and notice patterns across a sector.

The Compare feature lets you view two or more companies side by side. Scores, levels, and metadata are presented in parallel columns so differences are immediately visible. Comparison does not introduce any additional scoring logic — it is a presentation layer over the same underlying snapshots.

Current limitations

Uneven coverage. Publicly traded and large established companies typically have higher review volumes and therefore stronger confidence scores. Startups and smaller companies are tracked wherever public data exists, but confidence scores will reflect the available coverage — always check the coverage badge before interpreting the score.
No private data. Pulvian only sees what is publicly posted. Internal culture that is never written about online is invisible to the model.
No real-time feed. Snapshots have a date. Significant events (layoffs, leadership changes) that occur after the last snapshot are not reflected until the company is rescored.
Fixed thresholds. Stress and risk levels use static cut-offs, not industry or region-adjusted benchmarks.
English-language bias. Review platforms skew toward English-language content. Companies whose employees primarily write in other languages may be underrepresented.

Methodology versioning

Every snapshot is stamped with the scoring-methodology version that produced it (exposed as methodology_version in the API). Scores are directly comparable within the same version; when the formula changes, the version is bumped and recorded here so you always know whether two scores were computed the same way.

v4 — July 2026 (current). Formalises two June changes: confidence shrinkage (scores regress toward neutral 50 when confidence is low) and evidence-scaled risk boosts (a couple of negative articles can no longer saturate the risk score on thin evidence).
v3 — April 2026. Correctness pass on the scoring pipeline.
v2 — March 2026. Freshness decay: signals lose weight exponentially with age (half-life weighting).
v1 — March 2026. Initial methodology: weighted stress, risk, and confidence scores from public signals.

What this is not

Not a replacement for due diligence or professional HR advice
Not a ranking of "best" or "worst" employers
Not derived from proprietary, private, or leaked data
Not a real-time service — always check the snapshot date

Legal notice

The scores, summaries, and signals on Pulvian represent opinions derived from publicly available data. They are not certified assessments, statements of fact about any employer, or professional advice of any kind. Nothing on this site constitutes financial, investment, employment, or legal advice. Use these signals as one input among many, and always conduct your own due diligence before making career, investment, or business decisions.

If you believe a score is inaccurate or a company's data should be corrected or removed, submit a correction or removal request.

Methodology

How Pulvian collects, scores, and surfaces workplace signals — including confidence, freshness weighting, and current limitations.

Scoring pipeline — overview

Collect

RSS, HN, GDELT, SEC filings

Normalize

Tokenise, deduplicate, entity-match

Weight

Recency decay · 90-day half-life

Score

Keyword engine → stress + risk

Confidence

Source breadth + review volume

Where the data comes from

Hacker News

Tech community discussion and sentiment (via Algolia HN Search API)

GDELT

Global news event database — monitors press coverage of companies worldwide

RSS feeds

Company newsrooms, press releases, and industry publications

SEC EDGAR

Public company filings — 10-K annual reports, 8-K material events, and proxy statements

GitHub

Public repository activity — commit cadence, contributor signals, and open-source health indicators

App Store

Apple App Store customer reviews — for consumer-facing companies, sustained low ratings (< 3.0) serve as an operational risk proxy via Apple's official RSS feed

Yahoo Finance

Financial news headlines from Yahoo Finance's public RSS feed — layoffs, leadership changes, earnings, regulatory events for publicly traded companies

Company metadata (founding year, headcount range, industry) is sourced from Wikidata and supplemented by a curated seed dataset. Metadata is not used in scoring — only in filtering and display.

Worked example — one signal traced through the pipeline

“Constant crunch, no work-life balance — management keeps adding scope with no extra time or headcount.”

Hacker News · Acme Corp · 34 days before scoring run

CollectIngested from HN adapter · timestamped 2026-04-21 · entity-matched to slug "acme-corp"

NormalizeLowercased + tokenised · duplicate fingerprint: none found · source weight: 1.0 (HN, primary tier)

WeightAge = 34 days · decay = e^(−0.693 × 34/90) ≈ 0.77 · signal carries 77% of full strength

Score"crunch" → stress +8 · "no work-life balance" → stress +12 (negation check: none found) · "adding scope" → stress +5 · raw +25 × 0.77 ≈ +19 stress pts contributed

ConfidenceSignal count +1 → confidence ticks up; HN already represented so source-breadth score unchanged · final confidence: 58

Every signal in the dataset goes through the same deterministic path. The same document always produces the same output — no LLM, no randomness.

The three scores

Every company snapshot produces three scores, each on a 0–100 scale:

Raw signals

Reviews

Growth

Sentiment

↑ older · recent ↓ · recency-weighted

Scoring model

deterministic ·
90-day half-life

Output scores

Stress

Employee strain & management friction

Risk

Retention risk & instability signals

Confidence

How much data backs this snapshot

Older signals fade; recent signals carry more weight. Scores update when new data is ingested.

Stress

0 – 100

Aggregate signal of employee strain, management friction, and workload pressure. Higher = more stress indicators found in the data.

Risk

0 – 100

Combination of retention risk, leadership instability, and growth deceleration signals. Higher = more risk flags.

Confidence

0 – 100

How much data backs this snapshot. Higher = more sources, more reviews, more corroborating signals. Low confidence means treat the score with extra caution.

Confidence — check this first

Freshness and recency weighting

Stress levels and risk levels

Low

0 – 34

Few stress or risk indicators found in the available data.

Medium

35 – 64

A moderate number of signals present — worth monitoring.

High

65 – 100

Elevated concentration of stress or risk indicators.

Future versions may introduce industry-relative benchmarking, which would make the level more contextually meaningful.

Summaries

Positives and negatives listed in a summary reflect patterns in the source data. When no summary is available, the scorecard is still shown — scores do not depend on summary generation.

Signal extraction and negation

This approach is transparent and reproducible: the same document always produces the same signal. It does not use machine learning or large language models for scoring.

Company comparison and discovery

Current limitations

Uneven coverage. Publicly traded and large established companies typically have higher review volumes and therefore stronger confidence scores. Startups and smaller companies are tracked wherever public data exists, but confidence scores will reflect the available coverage — always check the coverage badge before interpreting the score.
No private data. Pulvian only sees what is publicly posted. Internal culture that is never written about online is invisible to the model.
No real-time feed. Snapshots have a date. Significant events (layoffs, leadership changes) that occur after the last snapshot are not reflected until the company is rescored.
Fixed thresholds. Stress and risk levels use static cut-offs, not industry or region-adjusted benchmarks.
English-language bias. Review platforms skew toward English-language content. Companies whose employees primarily write in other languages may be underrepresented.

Methodology versioning

v4 — July 2026 (current). Formalises two June changes: confidence shrinkage (scores regress toward neutral 50 when confidence is low) and evidence-scaled risk boosts (a couple of negative articles can no longer saturate the risk score on thin evidence).
v3 — April 2026. Correctness pass on the scoring pipeline.
v2 — March 2026. Freshness decay: signals lose weight exponentially with age (half-life weighting).
v1 — March 2026. Initial methodology: weighted stress, risk, and confidence scores from public signals.

What this is not

Not a replacement for due diligence or professional HR advice
Not a ranking of "best" or "worst" employers
Not derived from proprietary, private, or leaked data
Not a real-time service — always check the snapshot date

Legal notice

If you believe a score is inaccurate or a company's data should be corrected or removed, submit a correction or removal request.