Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

(zml.ai)

70 points | by steeve 5 days ago ago

11 comments

$serialx 6 hours ago

Look into all-smi https://github.com/lablup/all-smi It supports all GPUs thinkable including Apple Silicon and many AI accelerator cards.
$rdyro 11 hours ago

Looks cool!
nvtop can actually support TPUs too via https://github.com/rdyro/libtpuinfo/ https://github.com/Syllo/nvtop/blob/76890233d759199f50ad3bdb...
$mrflop 5 days ago

Renaming fopen64 to intercept library calls feels like a brittle hack masquerading as "sandboxing." Why not just upstream this hardware support to nvtop instead of fragmenting the ecosystem?
[-]
- $steeve 5 days ago
  
  sadly, sandboxing is something that can't be upstreamed. this way, sandboxing is kept in zml instead of patching mesa.
  as for nvtop, great program, but we missed a few features (such as sandboxing)
  [-]
  - $pstuart 9 hours ago
    
    It looks cool and I was excited to get monitoring for the NPU on my Ryzen AI 395+, unfortunately it does not show. NPU support in linux really seems to be an afterthought.
    [-]
    - $steeve 9 hours ago
      
      Weird, because we tried it. It doesn’t show anything?
      We use the amdsmi to get metrics. I’ll investigate.
- $marwanet 9 hours ago
  
  If this logic were pushed into nvtop, wouldn't the codebase become unmaintainable? Each vendor's interception method is going to be different.
$synergy20 2 hours ago

would be nice to have cpu usage added so I have all in one?
currently I use btop which shows basic gpu load along with cpu, network, etc.
$imcritic 3 hours ago

Is it capable of exposing metrics in Prometheus format?
[-]
- $steeve 3 hours ago
  
  consider it done
$152334H 8 hours ago

"NPU" seems to refer to trainium only?