Pulvian
CompaniesCompareIndustriesMethodologyPricingAboutAPI

Methodology

How Pulvian collects, scores, and surfaces workplace signals — including confidence, freshness weighting, and current limitations.

Signal-oriented, not definitive. Pulvian surfaces patterns in publicly available data. Scores are directional indicators, not certified assessments. Always combine these signals with your own research.
Scoring pipeline — overview
01
Collect
RSS, Glassdoor, HN, GDELT, SEC filings
02
Normalize
Tokenise, deduplicate, entity-match
03
Weight
Recency decay · 90-day half-life
04
Score
Keyword engine → stress + risk
05
Confidence
Source breadth + review volume

Where the data comes from

Pulvian aggregates publicly available information from the sources listed below. No private data, no scraping of non-public content, and no paid insider information is used. Each data point is timestamped and stored as a snapshot — the score reflects conditions at a specific point in time, not a real-time feed.

Glassdoor
Employee reviews, ratings, and interview feedback from verified workers (periodically collected; coverage may be partial)
Layoffs.fyi
Tracked layoff events — dates, headcount, and sources
Hacker News
Tech community discussion and sentiment (via Algolia HN Search API)
GDELT
Global news event database — monitors press coverage of companies worldwide
RSS feeds
Company newsrooms, press releases, and industry publications
Comparably
Employee compensation, culture, and satisfaction ratings — published publicly on Comparably's platform
Trustpilot
Customer review aggregates — used as a risk proxy signal (customer trust patterns can correlate with operational strain; see note below)
SEC EDGAR
Public company filings — 10-K annual reports, 8-K material events, and proxy statements

Company metadata (founding year, headcount range, industry) is sourced from Wikidata and supplemented by a curated seed dataset. Metadata is not used in scoring — only in filtering and display.

Note on Trustpilot data: Trustpilot reviews reflect customer experience, not employee experience. They are included as a supplementary risk proxy only — sustained low customer trust (≤ 2.5 / 5) can indicate operational or reputational dysfunction. Trustpilot signals are weighted conservatively and are never the sole basis for a high stress or risk classification.

Worked example — one signal traced through the pipeline

“Constant crunch, no work-life balance — management keeps adding scope with no extra time or headcount.”

Glassdoor · Acme Corp · 34 days before scoring run

CollectIngested from Glassdoor adapter · timestamped 2026-04-21 · entity-matched to slug "acme-corp"
NormalizeLowercased + tokenised · duplicate fingerprint: none found · source weight: 1.0 (Glassdoor, primary tier)
WeightAge = 34 days · decay = e^(−0.693 × 34/90) ≈ 0.77 · signal carries 77% of full strength
Score"crunch" → stress +8 · "no work-life balance" → stress +12 (negation check: none found) · "adding scope" → stress +5 · raw +25 × 0.77 ≈ +19 stress pts contributed
ConfidenceReview count +1 → confidence ticks up; Glassdoor already represented so source-breadth score unchanged · final confidence: 58

Every signal in the dataset goes through the same deterministic path. The same document always produces the same output — no LLM, no randomness.

The three scores

Every company snapshot produces three scores, each on a 0–100 scale:

Raw signals

Reviews
Growth
Sentiment

↑ older · recent ↓ · recency-weighted

Scoring model

deterministic ·
90-day half-life

Output scores

Stress

Employee strain & management friction

Risk

Retention risk & instability signals

Confidence

How much data backs this snapshot

Older signals fade; recent signals carry more weight. Scores update when new data is ingested.
Stress
0 – 100
Aggregate signal of employee strain, management friction, and workload pressure. Higher = more stress indicators found in the data.
Risk
0 – 100
Combination of retention risk, leadership instability, and growth deceleration signals. Higher = more risk flags.
Confidence
0 – 100
How much data backs this snapshot. Higher = more sources, more reviews, more corroborating signals. Low confidence means treat the score with extra caution.

Confidence — check this first

Confidence is the most important score to look at before interpreting stress or risk. A stress score of 75 backed by 3 reviews is very different from the same score backed by 800 reviews across multiple sources. The confidence score encodes that difference.

When confidence is below ~30, treat the directional finding as a weak or early signal. When confidence is above ~70, the finding is more reliable — though still not a guarantee. The coverage badge visible on each company page gives a plain-language summary (Strong / Moderate / Limited) derived from source count, review count, and confidence score.

Freshness and recency weighting

Not all signals carry equal weight over time. Pulvian applies exponential decay to older data points using a 90-day half-life: a signal that is 90 days old contributes roughly half as much to the score as an equivalent signal collected today.

This means recent spikes in negative sentiment matter more than a stale review from two years ago — which better reflects how workplace conditions actually evolve. The snapshot date shown on each page reflects when the most recent data was ingested, not when the company was first added.

Stress levels and risk levels

In addition to numeric scores, each snapshot carries a categorised level (low / medium / high) derived from fixed thresholds applied to the numeric score. These thresholds do not currently vary by industry or company size.

Low
0 – 34
Few stress or risk indicators found in the available data.
Medium
35 – 64
A moderate number of signals present — worth monitoring.
High
65 – 100
Elevated concentration of stress or risk indicators.

Future versions may introduce industry-relative benchmarking, which would make the level more contextually meaningful.

Summaries

Where available, snapshots include a short summary of the signals found. Summaries are currently generated by a deterministic, rule-based system that selects phrases based on score levels and signal content. Each summary is labelled with the model that produced it. They are supplementary context, not editorial opinions.

Positives and negatives listed in a summary reflect patterns in the source data. When no summary is available, the scorecard is still shown — scores do not depend on summary generation.

Signal extraction and negation

Signals are extracted from public documents using a deterministic, keyword-based rule engine. Each rule maps a set of phrases (e.g. "layoffs", "hiring freeze", "record revenue") to a sentiment value. When a keyword is matched, the surrounding context is checked for negation cues — words like "avoids", "no", "not", or "despite" that reverse the meaning. Negated matches are suppressed to reduce false positives.

This approach is transparent and reproducible: the same document always produces the same signal. It does not use machine learning or large language models for scoring.

Company comparison and discovery

The Similar companies section on each company page shows other companies in the same industry, ordered by score proximity. This helps you explore related organisations and notice patterns across a sector.

The Compare feature lets you view two or more companies side by side. Scores, levels, and metadata are presented in parallel columns so differences are immediately visible. Comparison does not introduce any additional scoring logic — it is a presentation layer over the same underlying snapshots.

Current limitations

  • Uneven coverage. Publicly traded and large established companies typically have higher review volumes and therefore stronger confidence scores. Startups and smaller companies are tracked wherever public data exists, but confidence scores will reflect the available coverage — always check the coverage badge before interpreting the score.
  • No private data. Pulvian only sees what is publicly posted. Internal culture that is never written about online is invisible to the model.
  • No real-time feed. Snapshots have a date. Significant events (layoffs, leadership changes) that occur after the last snapshot are not reflected until the company is rescored.
  • Fixed thresholds. Stress and risk levels use static cut-offs, not industry or region-adjusted benchmarks.
  • English-language bias. Review platforms skew toward English-language content. Companies whose employees primarily write in other languages may be underrepresented.

What this is not

  • Not a replacement for due diligence or professional HR advice
  • Not a ranking of "best" or "worst" employers
  • Not derived from proprietary, private, or leaked data
  • Not a real-time service — always check the snapshot date

Legal notice

The scores, summaries, and signals on Pulvian represent opinions derived from publicly available data. They are not certified assessments, statements of fact about any employer, or professional advice of any kind. Nothing on this site constitutes financial, investment, employment, or legal advice. Use these signals as one input among many, and always conduct your own due diligence before making career, investment, or business decisions.

If you believe a score is inaccurate or a company's data should be corrected or removed, submit a correction or removal request.

© 2026 Pulvian · Workplace signals, made legible.
CompaniesMethodologyPricingAboutData CorrectionTermsPrivacyAPI Docs