methodologyhow we reach a verdict

A light estimate, computed — not measured, not edited

Every Syftly answer carries a confidence label. Today it reads light estimate: the verdict is aggregated from public benchmarks — with sources and dates — and computed from structured fields, not produced by first-hand Syftly measurement and not chosen by an editor. Here is exactly how it works, and where it is weak.

Light estimate rests on public research and benchmarks (with attribution) plus AI-as-a-judge to synthesise. It is broad and useful, but always labelled as light. Hard tested would rest only on first-hand measured facts — latency, price and uptime probed by Syftly, accuracy scored against a verified ground truth. That tier does not exist yet; nothing here is presented as hard tested. AI-as-a-judge never counts as hard.

Each category is produced by a fixed, version-controlled research recipe: a source hierarchy (Tier-1 provider docs and recognised benchmarks carry the ranking; marketing never does), then a two-phase pipeline — deterministic extraction of the hard fields, then judged synthesis on top of that grounded data. The winner on each axis is then computed from those structured fields — “cheapest” is literally a min() over the prices. So a new recipe run changes the numbers and the winners recompute by themselves. Credibility comes from provenance — a dated source plus a confidence label plus attribution — not from human approval.

A free question is mapped — deterministically, with no LLM call — onto one of these axes; its winner is computed from the ranking. If nothing matches, the answer falls back to the category default (the top of the ordered ranking). An in-category question never 404s.

The computed decision axes and the rule each uses
AxisComputed by
Cheapestlowest directly-comparable per-minute price
Most accuratelowest word error rate (WER)
Most multilingualhighest supported-language count
Lowest latencylowest published streaming latency
Best price-to-accuracylowest price × WER
Capability filtertop-ranked offering that has diarization, word-timestamps or custom vocabulary
Languagemost accurate offering that supports the asked language (e.g. Dutch, Spanish)