How We Made IPFS Content Publishing 10x Faster

(probelab.io)

115 points | by dennis-tra 6 hours ago ago

34 comments

$embedding-shape 4 hours ago

> Return control back to the user after most (not all) of the PUT RPCs have succeeded and continue with the remaining ones in the background.
Making things faster by doing less (and not the same) been speeding up computing since forever! Can't help but feel like it's slightly misleading to call the providing ("publishing") faster when it's not actually doing the same, it's just that most parts turned async instead of waiting for confirmation.
Wouldn't this lead to the problem where the user things everything been provided properly, but once others try to find it, the records haven't yet been published? As far as I understand, it'd still take mostly the same amount of time until the entire CID (not just some of them) are available to others, the only thing that got "faster" is the end-user UX of the one providing?
[-]
- $Groxx 3 hours ago
  
  The "Early Return" sections describe it more, I don't think it's as bad as it sounds in that first bullet. They're returning after 15 out of 20 complete,and it sounds like even if only those 15 end up succeeding it'll still generally be fine. (Exactly how fine / is that violating some common expectations and will cause problems: I dunno. Not familiar enough with IPFS's internals)
  That said:
  >In practice, at least one of the 20 follow-up requests fails in the vast majority of operations, and a single unresponsive peer can stall the entire phase waiting for a timeout.
  It continually surprises me how often systems lack a Fast Fallback-like strategy¹. Or at least sound like it. Just an absolute flood of apps and websites and systems that try to do something once and then never tried an alternate route until that finishes, something like a minute or two later... for a process that usually takes less than a second. It's maddening. By the time you're considering one to be "stalled" and delaying everything unnecessarily, you probably should've already started trying two or three alternate routes!
  https://wikipedia.org/wiki/Happy_Eyeballs
  [-]
  - $WorldMaker an hour ago
    
    > (Exactly how fine / is that violating some common expectations and will cause problems: I dunno. Not familiar enough with IPFS's internals)
    I felt the article addressed that a bit further down. 20 copies is a somewhat arbitrary knob in the Kademlia DHT design IPFS is based on and this lab's research suggested that 15 was probably closer to good enough for GET requests to succeed at about the same time cost. Rather than dropping the knob for the entire DHT, because redundancy is always useful in the long run they went with the Early Return and a secondary process called the Reprovide Sweep that still tries to push the network towards the 20 live copies minimum it desires.
    I'm assuming the Reprovide Sweep was work previously done/documented because it seems like something that might have been more interesting to discuss at longer length in relevant parts of the article.
- $pocksuppet 3 hours ago
  
  As far as I understand, the producer is publishing to the 20 nearest nodes it finds, but the consumer is also searching the 20 nearest nodes it finds, and there is quite a big safety margin built into that number 20. Almost all consumers should still be able to find your object once it has only published to 10 or 15.
  This is a probabilistic system anyway. Even if publication finishes to 20 nodes, why is that enough to return to the caller? Shouldn't it be 30, or 50, just in case?
  I'd say it makes sense to return control once zero PUTs have been made and do the whole thing in the background, to avoid serializing operations that usually don't need to be serialized, such as publishing multiple objects.
$boramalper 3 hours ago

Is anyone still (or has anyone ever) used IPFS in production?
I’m not talking about technology demos such as Wikipedia-on-IPFS (which indeed worked and was impressive) but where IPFS is actually being relied on for some functionality.
[-]
- $ydj 2 hours ago
  
  At meta, there was a project for delivering binaries of internally built libraries / binaries to dev laptops using a private ipfs network. This was live for at least some period of time.
  [-]
  - $boramalper 38 minutes ago
    
    Very interesting! I wonder if it’s still live and there is any writing on it?
- $errpunktjose 3 hours ago
  
  https://swap.cow.fi uses it for order metadata registering iirc
- $MattCruikshank 3 hours ago
  
  It doesn't seem like it's popular to put old game ROMs on IPFS...? And that surprises me...
  [-]
  - $boramalper 3 hours ago
    
    And why would you do that? As opposed to, say, distributing via BitTorrent or serving them using a good-old HTTP server?
    edit: Not opposed to the idea, just curious what makes you pick IPFS over the existing alternatives.
    [-]
    - $topgrain2 an hour ago
      
      The idea of simply mounting a filesystem and selecting from a list of titles which roms to download and add to your local games, unloading them and transparently re-downloading when you need to free up space, all without relying on a centralized host even for the file index, is pretty appealing. You can do similar things with torrents but it's not quite as "natural".
      Most of the emulator frontends I've seen are pretty against integrating this kind of ease-of-piracy stuff, though, accepting recognizing and filling in metadata for well-known roms, but not making it easy to integrate with remote libraries of roms... except tools that run on "hacked" consoles, which seem to love just giving you a list of games with a "tap A/X to pirate" UI.
      [-]
      - $boramalper 34 minutes ago
        
        > The idea of simply mounting a filesystem
        You can use fuse-btfs [0] for mounting torrents as filesystems! Last I checked it was a fairly mature piece of software so hopefully it doesn’t feel unnatural.
        [0] https://github.com/johang/btfs
    - $darkwater an hour ago
      
      Maybe fear of Nintendo coming to bite you?
- $Borg3 2 hours ago
  
  Yeah.. IPFS is a bit disappointement. I was a bit exceited about it back in the day. Recently, I wanted to download sth large from archive.org, I used torrent (and my legacy torrent client) and it worked like a charm!
  It seems pure HTTP tracker + Torrent is good enough.
  [-]
  - $boramalper an hour ago
    
    I think the biggest sin of IPFS is not working natively in web browsers—instead, requiring the use of either HTTP gateways or native apps running outside the browser.
- $pixel_popping 3 hours ago
  
  It's funny because even in Piracy, IPFS has never really taken off and that's a massive use case.
  [-]
  - $boramalper 3 hours ago
    
    It slowly was taking off—e.g. Library Genesis on IPFS[0]—but then IPFS introduced Bad Bits Denylist [1] which killed it on arrival.
    [0] https://freeread.org/ipfs.html
    [1] https://badbits.dwebops.pub/
    [-]
    - $RobotToaster 25 minutes ago
      
      Suddenly it looks a lot less decentralised.
      [-]
      - $throwaway8388 4 minutes ago
        
        Well, badbits are only enforced on the centralized http gateways. LibGen CIDs would still resolve fine using the DHT as the decentralised discovery mechanism
  - $frollogaston 3 hours ago
    
    Also public key lists like what Whatsapp now publishes
- $frollogaston 3 hours ago
  
  NFT artwork, if you count that. Briefly checked, the ones that were traded for the most were using IPFS rather than HTTP. But I also don't trust that these aren't self-wash sales (easy given the "NF" part), also NFTs are dumb.
  [-]
  - $boramalper 3 hours ago
    
    I don’t think NFTs (should) count: My first impressions of web3 by Moxie Marlinspike
    https://moxie.org/2022/01/07/web3-first-impressions.html
    [-]
    - $frollogaston 2 hours ago
      
      I agree that purely in the dApp sense, NFTs never fully took off. The blockchain tech made it theoretically distributed, but the interest in NFTs died off way before that mattered, so we only ever saw effectively centralized versions of it.
      I personally had no interest in seeing the decentralized one either, but there are people who collect digital things for some reason. In that case would've needed convergence on one JSON format at least for the 99% use case of still images, and agreement to put the heavy assets on IPFS instead of HTTP (it was a mix). Maybe axe some of the confusing features like editions.
$hannesfur 2 hours ago

Having worked on libp2p‘s DHT (Double Hashing for rust-libp2p) for a bit two years ago, it’s really great to see that there are improvements. To get to CDN level speeds though on dense networks, I still see it as an architectural flaw to not somehow encode network topology into the PeerID / identity in the DHT. A start would be to use the five RIRs. If you want to be more sophisticated, and I spent a lot of time theorising about this, you could have a dezentrally governed anycast IP address of Geo DNS to bootstrap new peers into their neighbourhood and couple that into their DHT identity. But do you want to put BGP into the hands of a decentralised system? Could you even do it in the governance structure of the internet?
Btw when we were working on our project HyveOS, we used Batman-advs routing table to quickly (really really quick) bootstrap new peers into the system.
Ah… sometimes i really miss working on this.
$someonebaggy 4 hours ago

Is it also possible to speed up lookup? I never used IPFS much as it took several minutes to find a cid.
[-]
- $throwaway8388 4 hours ago
  
  Actually, lookup is super fast - CID lookup is consistently <200ms from the EU [0]. The original slowness came mostly from stale records and NAT’d peers that were indexed in the DHT which has since been mostly resolved.
  [0] https://probelab.io/ipfs/dht/#chart-ipfs-dht-lookup-performa...
- $yiannisbot 4 hours ago
  
  When’s the last time you tried? It must be much faster now. Check: https://probelab.io/ipfs/dht/#chart-ipfs-dht-lookup-performa...
  [-]
  - $esperent 3 hours ago
    
    Do you have any examples of actual content to look up, rather than benchmark graphs?
$davidwritesbugs 2 hours ago

Slightly tangential to the article, which seems interesting, but the main issue with IPFS was the horrendous performance of clients which I seem to recall related to having a refresh storms, sparse routing tables, unreachable peers as well as lookup speeds. Mostly the reputation was so bad that people didn't bother with it, I dismissed it for my own project. If your only users are crypto-grift projects you're in a bad place.
$nekusar 4 hours ago

Are the defaults still leaking your whole internal and external IP allocations to the dHT still?
Its security posture was absolutely fucking gross the last time I reviewed it.
And of course, there's a shitcoin bolted on as well. Last thing I want to do is feed into FileCoin. Of course, everything new these days has some financial interaction crap bolted on to entice speculators and ilk.
$catapart 4 hours ago

I'll add to the "is it still...?" questions.
Last I was told about it, there was no way to delete stuff from IPFS. Nothing enforceable, at least. Setting aside that public stuff is "impossible" to delete on the internet, there's something appealing to me about being able to shut off my server. Feels like that is less possible with IPFS hosted content.
Does anyone have some perspective for me about removing content?
[-]
- $deno 3 hours ago
  
  Imagine you created a torrent (and/or magnet link) with a file and then stopped seeding after some time. If it was popular it will probably live on, if not then eventually it disappears.
  [-]
  - $catapart 3 hours ago
    
    Thanks! Yeah, I kind of figured that was still the case. Not really any use cases I have that I would feel comfortable with that paradigm, but I'm glad it's available!
    [-]
    - $somat 3 hours ago
      
      Is that not the same with anything published to the internet. For example I could keep your comment published and available for as long as I had interest in doing so despite any effort you may take to remove it from HN. I mean I guess tech like ipfs and bittorrent try to automate this process(keeping something on the internet as long as there is interest) but you let something out on the internet it could stay there a long while. Or it could go poof and disappear, it depends on how much interest there is in the subject.