Research
Methodology
How we scanned 97,304 EU websites — sample source, scanner architecture, infrastructure, and limitations.
Scan Dataset
- Scan window: April 7–9, 2026
- URLs submitted: 114,748
- Successful scans: 97,304 (84.8%)
- Failed scans: 17,444 (site unreachable, TLS errors, timeouts)
- EU countries covered: 25 (all EU member states with ccTLD presence in Tranco)
Sample Source
- Domain list: Tranco Top 1M (research-grade domain ranking, snapshot L7684)
- Selection method: Filtered by 25 EU country-code TLDs, round-robin by country to prevent large-market bias
- Country distribution: Germany 22,696 / France 8,463 / Netherlands 8,752 / Italy 8,598 / Poland 7,881 / Spain 4,735 + 19 other EU countries (full breakdown in the report)
Scanner
- Engine: Go-based scanner using headless Chromium via Chrome DevTools Protocol (CDP)
- Browser: Chromium (headless, sandboxed, clean profile per scan)
- Viewport: Desktop 1920×1080
- Scan location: Hetzner Cloud, Falkenstein, Germany (EU)
- CMP detection: 45 consent management platforms recognized via script signatures, DOM selectors, and JavaScript API probing
- Selector version: 2026-04-06-v23
Infrastructure
- App server: Hetzner CX-class VPS (web application, PostgreSQL, Redis)
- Worker servers: 2 Hetzner VPS instances (15 concurrent scan slots total)
- DNS cache: Local Unbound resolver with periodic pre-warming
- Total runtime: ~60 hours for the full corpus
What Each Scan Measures
Limitations
- Desktop only. Mobile viewport behavior may differ.
- Single scan location. Geo-targeted consent banners may behave differently from non-EU locations.
- Point-in-time observation. Websites change after scanning. Results represent the state during the scan window.
- Bot detection. Some sites detect automated browsers and may alter behavior (we detect and report this when identified).
- CMP coverage. 45 CMPs recognized; sites using unrecognized or custom consent solutions fall back to generic heuristic detection.
- No legal interpretation. Risk scores are technical indicators, not legal compliance determinations.
Data Access
- Aggregate statistics are published in our blog posts and research report.
- Individual domain classifications are not published. Site operators can request their scan result by running a free scan or contacting us.
- Correction requests: If you believe your site's classification is incorrect, run a new scan or contact [email protected].
Reproducibility
The scanner is proprietary software. The aggregate dataset is provided for verification of published statistics. The Tranco domain list used is publicly available at tranco-list.eu.