Orthogonal Wheel Sieve: Linear Scalability from 10^7 to 300B Primes

(github.com)

1 points | by claugo 10 hours ago ago

2 comments

$claugo 10 hours ago

I've implemented a segmented wheel sieve with orthogonal prefilter (mod-60 base + mod-7 independent extension) that demonstrates unusual linear performance scaling.
*Benchmark Results (C++, i7-6700K, consumer hardware):*
| Range | Time | Primes Found | Density | |-------|------|--------------|---------| | 10^7 | 1.5ms | 664K | 6.64% | | 10^8 | 18ms | 5.7M | 5.76% | | 10^9 | 212ms | 50.8M | 5.08% | | 10^10 | 3.12s | 455M | 4.55% | | 10^11 | 106.6s | 4.1B | 4.12% | | 3×10^11 | 484s | 11.8B | 3.95% |
*Key Observation:* Nearly perfect linear O(n) scalability across 4 orders of magnitude. All results validated with Miller-Rabin primality testing (100 iterations, zero false positives).
*Why This Matters:* Traditional segmented sieves show sublinear but not linear performance at extreme scales. The orthogonal extension approach (mod-60 + mod-7) appears to avoid the structural overhead that typically emerges at billion+ ranges.
*Open Research Questions:* 1. Does linear scaling continue beyond 10^12? 2. How do other languages (Rust, Java, Julia) perform on the same algorithm? 3. Where are CPU/memory bottlenecks on modern architectures? 4. What's the theoretical limit of orthogonal extension beyond mod-7? 5. Can SIMD/GPU acceleration further improve performance?
*Reference Implementation:* Full C++ source, benchmark methodology, and validation suite available. MIT licensed, contributions welcome.
$claugo 10 hours ago

I'm the author. Some additional context:
*Performance Notes:* - Measured on Intel i7-6700K (consumer hardware, 2015) - Single-threaded, no SIMD optimizations yet - C++17, compiled with /O3 and /arch:AVX2 - Times include full validation pipeline
*Validation Methodology:* - Miller-Rabin: 100 iterations per candidate - Sample range: 10^7 to 3×10^11 - Zero false positives observed across all ranges
*Why Linear Scaling?* The orthogonal approach (mod-60 + mod-7) maintains constant structural overhead as range increases, unlike traditional wheels which expand geometrically.
*What I'm Most Curious About:* 1. Rust/Julia implementations - can they match C++? 2. SIMD potential - AVX-512 should help significantly 3. GPU feasibility - is the algorithm parallelizable? 4. Theoretical bounds - what's the limit of orthogonal extension?
Contributions, feedback, and collaboration welcome. Full source on GitHub, MIT licensed.