I've implemented a segmented wheel sieve with orthogonal
prefilter (mod-60 base + mod-7 independent extension)
that demonstrates unusual linear performance scaling.
*Key Observation:*
Nearly perfect linear O(n) scalability across 4 orders of
magnitude. All results validated with Miller-Rabin
primality testing (100 iterations, zero false positives).
*Why This Matters:*
Traditional segmented sieves show sublinear but not linear
performance at extreme scales. The orthogonal extension
approach (mod-60 + mod-7) appears to avoid the structural
overhead that typically emerges at billion+ ranges.
*Open Research Questions:*
1. Does linear scaling continue beyond 10^12?
2. How do other languages (Rust, Java, Julia) perform
on the same algorithm?
3. Where are CPU/memory bottlenecks on modern architectures?
4. What's the theoretical limit of orthogonal extension
beyond mod-7?
5. Can SIMD/GPU acceleration further improve performance?
*Reference Implementation:*
Full C++ source, benchmark methodology, and validation
suite available. MIT licensed, contributions welcome.
*Performance Notes:*
- Measured on Intel i7-6700K (consumer hardware, 2015)
- Single-threaded, no SIMD optimizations yet
- C++17, compiled with /O3 and /arch:AVX2
- Times include full validation pipeline
*Validation Methodology:*
- Miller-Rabin: 100 iterations per candidate
- Sample range: 10^7 to 3×10^11
- Zero false positives observed across all ranges
*Why Linear Scaling?*
The orthogonal approach (mod-60 + mod-7) maintains
constant structural overhead as range increases, unlike
traditional wheels which expand geometrically.
*What I'm Most Curious About:*
1. Rust/Julia implementations - can they match C++?
2. SIMD potential - AVX-512 should help significantly
3. GPU feasibility - is the algorithm parallelizable?
4. Theoretical bounds - what's the limit of orthogonal extension?
Contributions, feedback, and collaboration welcome.
Full source on GitHub, MIT licensed.
I've implemented a segmented wheel sieve with orthogonal prefilter (mod-60 base + mod-7 independent extension) that demonstrates unusual linear performance scaling.
*Benchmark Results (C++, i7-6700K, consumer hardware):*
| Range | Time | Primes Found | Density | |-------|------|--------------|---------| | 10^7 | 1.5ms | 664K | 6.64% | | 10^8 | 18ms | 5.7M | 5.76% | | 10^9 | 212ms | 50.8M | 5.08% | | 10^10 | 3.12s | 455M | 4.55% | | 10^11 | 106.6s | 4.1B | 4.12% | | 3×10^11 | 484s | 11.8B | 3.95% |
*Key Observation:* Nearly perfect linear O(n) scalability across 4 orders of magnitude. All results validated with Miller-Rabin primality testing (100 iterations, zero false positives).
*Why This Matters:* Traditional segmented sieves show sublinear but not linear performance at extreme scales. The orthogonal extension approach (mod-60 + mod-7) appears to avoid the structural overhead that typically emerges at billion+ ranges.
*Open Research Questions:* 1. Does linear scaling continue beyond 10^12? 2. How do other languages (Rust, Java, Julia) perform on the same algorithm? 3. Where are CPU/memory bottlenecks on modern architectures? 4. What's the theoretical limit of orthogonal extension beyond mod-7? 5. Can SIMD/GPU acceleration further improve performance?
*Reference Implementation:* Full C++ source, benchmark methodology, and validation suite available. MIT licensed, contributions welcome.
I'm the author. Some additional context:
*Performance Notes:* - Measured on Intel i7-6700K (consumer hardware, 2015) - Single-threaded, no SIMD optimizations yet - C++17, compiled with /O3 and /arch:AVX2 - Times include full validation pipeline
*Validation Methodology:* - Miller-Rabin: 100 iterations per candidate - Sample range: 10^7 to 3×10^11 - Zero false positives observed across all ranges
*Why Linear Scaling?* The orthogonal approach (mod-60 + mod-7) maintains constant structural overhead as range increases, unlike traditional wheels which expand geometrically.
*What I'm Most Curious About:* 1. Rust/Julia implementations - can they match C++? 2. SIMD potential - AVX-512 should help significantly 3. GPU feasibility - is the algorithm parallelizable? 4. Theoretical bounds - what's the limit of orthogonal extension?
Contributions, feedback, and collaboration welcome. Full source on GitHub, MIT licensed.