And looking at her main website https://www.citationneeded.news/ there is a tip jar but it doesn't accept crypto. I was expecting her to take at least the major coins like Ada, Eth and BTC, but she's consistent with her views.
The joke is, LLM Horrors is anti-LLM, Web3 is Going Just Great is anti Web3. The equivalent for Tesla would be Tesla putting a ICE inside their model 2 if they didn't believe in EVs.
> Conclusion: Always set billing caps and alerts on cloud API keys.
Sadly, way easier said than done in the case of GCP. Been a proper reason for me to avoid GCP deployments with LLM use-cases for smaller projects.
I remember looking into this a while back assuming it would be a sane feature to expect. But for some reason it's surprisingly non-trivial with GCP to set budgets. Especially if the only thing you want is a Gemini API key with finite spending.
IIRC you could either set (rate) limits on quotas, but quotas are extremely granular (like, per region per model) meaning you need to both set tons of values and understand which quotas to relax. Or alternatively do some bubblegum-and-ducktape like solution where you build an event-driven pipeline to react to cost increases in your own project.
I understand that exact budgets are hard to enforce in real-time, especially for their more complex infra offerings.
However, (1) even if it's not exactly real-time, but instead enforced every hour that's already going to go a long way, and (2) PAYG LLM usage is billed rather linearly by the amount of tokens you use, so if there would be an easy way to set a dollar-amount and have that expressed as budgets that would already get you part of the way there.
Anyway, the current state of GCP budgeting it makes me avoid it for production usage until I'm ready to commit spending significant effort to harden it. For all small projects, the free tier tokens are a safe bet, but their extremely low rate-limits make them rarely a good fit.
Thankfully Google has some basic protection for it. I accidentally commited my google api token, as part of some OTEL trace JSON file, and within a few minutes my key was automatically locked by google, and marked as leaked (with exact link pointing where it has happened).
"some basic protection" it wasn't always like this. A few years back you could easily get api keys for any web service by typing certain keywords on github and that included all google APIs, but since the Microsoft acquisition it's not as simple anymore....
It is used to exchange goods and services without the consent of the owner. Kind of like picking up a wallet full of cash off the ground (with or without) identification.
Yeah, I couldn't figure out how to set billing caps on the gemini API. Here's what the chatbot said:
Me: Help me cap gemini API request costs ... limit total billing for this project to max $100 a month
GC: Hello! While it's not possible to set a hard spending cap on Gemini API requests, you can set up billing alerts to monitor your costs and avoid surprises.
Me: How to set hard budget limit tied to billing account
GC: Based on your account information, it is not possible to set a hard budget limit that automatically stops charges for a billing account.
I'd guess they are selling access to other people somehow. Like it used to be the case that a stolen phone would rack up enormous overseas call charges until it was reported and disabled.
If your goal is to just burn as much money as possible, as fast as possible, simply spamming expensive image/video generation requests would probably do the trick, if the key's rate limits are high enough.
There's also a practice that primarily seems to occur in china where stolen keys are resold via proxy services. A single key can provide access to thousands of users, racking up costs very fast (again, assuming the rate limits are high enough).
There is a way to trigger a script when a budget is hit, but they don't make it easy. You set up a billing notification that triggers a script, which can disable resources (like APIs) automatically.
Those budget alerts usually aren't instant though, they only fire when the cloud gets around to reconciling your usage some number of hours or even days after the damage is done. It's better than nothing but with runaway spending you can still blow way past your limit.
One caveat to alerts (and automatically acting on alerts) is that there are delays[0] between costs being incurred and alerted. I can't find a Google source for what the delay is, but a source online say it could be "24 hours [to] a few days."[1]
This has been a major reason why I reach for OpenAI models before Gemini, but also why I'd rather use services like RunPod for training jobs. For a small boostrapped company like mine, it feels terrifyingly easy to rack up a company-ending AI bill.
The cloud companies try to limit these accidents through cranking your quotas down to nothing, but this also means that my small company can't just whip up 8xH100 easily without major ceremony, and I have routinely been rejected the GPUs quotas I needed for projects.
Accidentally leaving that kind of node on for the 24 hours that it might take to get an alert would rack up a $2,000+ bill, compared to $500 on RunPod, which will also stop the instance when you run out of money.
I've loved working with major cloud providers at growing VC-funded startups that have credits, TAMs and bigger budgets for errors. But hyperscalers are fairly difficult for a pre-scale bootstrapped business, and arguably not designed or optimized for it.
There is not any practical way to do this effectively.
There are several, rather tedious and incomplete, hacks that you can apply to attempt to prevent billable actions after limits are hit.
But to be frank - they're cop-outs for a real spending cap.
You'd hope these companies would address this themselves - but it's not profitable for them to resolve (it's somewhat involved and requires them to allow people to pay them less)... So my strong vote is to make the contracts that allow this sort of "un-cappable" spending for automated actions void in court.
It is worth noting that both products have had "student" tiers or similar, that had fixed credit limits with a cliff.
Therefore, they've implemented hard-limits. So not offering hard-limits is a business decision, NOT a technical one. They're essentially hiding functionality they have.
Make of that as you will. Anyone justifying it, should be me with skepticism.
Soft limits would be ideal (x/day with maximum peak of x/minute), but hey, that's literally negative value to them (work to code, CPU time to implement, less income out of "mistakes")
I've heard that Google keeps Google Drive data around for up to two years if your subscription expired and your account is over quota. They could certainly do the same with other cloud storage.
If I reduce my gdrive subscription they don’t simply delete what I have over the new (lower) limit. There is a grace period and it’s standard practice. Why should it be any different in this case?
There is, and it would cause an outage while still not achieving the supposed goal of not going over budget. You don't want to be killing your customer's production over potential misconfigurations/forgotten budgets. Especially when you'd continue to bill them for the storage and other static things like IPs.
It's so much easier for them to have support wave accidental overuses.
I understand that cloud resources and automatically stopping them beyond a certain spend is problematic and challenging in many ways, e.g. do you just destroy provisioned computer, storage, data?
But for those stupid API keys the corporations have zero excuse not to have configurable limits with a sensible default.
My understanding is that AWS budget actions also operate on a delay. I love using AWS at work but I'm never giving it my personal credit card as long as I can't turn off auto-billing.
Contents of the blog are themselves written by LLM.
https://github.com/coollabsio/llmhorrors.com/blob/main/CLAUD...
The whole website seems to be focused on promoting the author and their projects more than sharing the information. Just link to the original.
https://www.reddit.com/r/googlecloud/comments/1reqtvi/82000_...
Posted to HN twice recently.
https://news.ycombinator.com/item?id=47231708
https://news.ycombinator.com/item?id=47184182
What do you expect from a website named llmhorrors.com?
I would expect it to not be written by an LLM. Molly White didn’t run Web3 is Going Great on the blockchain.
https://www.web3isgoinggreat.com/
And looking at her main website https://www.citationneeded.news/ there is a tip jar but it doesn't accept crypto. I was expecting her to take at least the major coins like Ada, Eth and BTC, but she's consistent with her views.
False equivalence, Tesla also does not run their website from a Model S.
The joke is, LLM Horrors is anti-LLM, Web3 is Going Just Great is anti Web3. The equivalent for Tesla would be Tesla putting a ICE inside their model 2 if they didn't believe in EVs.
Another plea for @dang to integrate pangram into all story and comment submissions
Yeah, right...
> Conclusion: Always set billing caps and alerts on cloud API keys.
Sadly, way easier said than done in the case of GCP. Been a proper reason for me to avoid GCP deployments with LLM use-cases for smaller projects.
I remember looking into this a while back assuming it would be a sane feature to expect. But for some reason it's surprisingly non-trivial with GCP to set budgets. Especially if the only thing you want is a Gemini API key with finite spending.
IIRC you could either set (rate) limits on quotas, but quotas are extremely granular (like, per region per model) meaning you need to both set tons of values and understand which quotas to relax. Or alternatively do some bubblegum-and-ducktape like solution where you build an event-driven pipeline to react to cost increases in your own project.
I understand that exact budgets are hard to enforce in real-time, especially for their more complex infra offerings.
However, (1) even if it's not exactly real-time, but instead enforced every hour that's already going to go a long way, and (2) PAYG LLM usage is billed rather linearly by the amount of tokens you use, so if there would be an easy way to set a dollar-amount and have that expressed as budgets that would already get you part of the way there.
Anyway, the current state of GCP budgeting it makes me avoid it for production usage until I'm ready to commit spending significant effort to harden it. For all small projects, the free tier tokens are a safe bet, but their extremely low rate-limits make them rarely a good fit.
Yeah, it's an utter joke and a UX/UI crime that has been going unpunished for way too long. Wonder what all those geniuses are doing.
Thankfully Google has some basic protection for it. I accidentally commited my google api token, as part of some OTEL trace JSON file, and within a few minutes my key was automatically locked by google, and marked as leaked (with exact link pointing where it has happened).
"some basic protection" it wasn't always like this. A few years back you could easily get api keys for any web service by typing certain keywords on github and that included all google APIs, but since the Microsoft acquisition it's not as simple anymore....
the tokens are not stolen. they are public. how can you steal public tokens?
its googles blunder that they allowed public tokens to be used for paid functionality.
It is used to exchange goods and services without the consent of the owner. Kind of like picking up a wallet full of cash off the ground (with or without) identification.
This is one of the main reasons I prefer to use openrouter instead. It's prepaid.
Yeah, I couldn't figure out how to set billing caps on the gemini API. Here's what the chatbot said:
Me: Help me cap gemini API request costs ... limit total billing for this project to max $100 a month
GC: Hello! While it's not possible to set a hard spending cap on Gemini API requests, you can set up billing alerts to monitor your costs and avoid surprises.
Me: How to set hard budget limit tied to billing account
GC: Based on your account information, it is not possible to set a hard budget limit that automatically stops charges for a billing account.
Me: How to set quota for gemini api?
GC: Sorry, I'm not able to answer that question.
Slightly unrelated question: how would you spend $82k on prompts in 48 hours? Just phishing?
I'd guess they are selling access to other people somehow. Like it used to be the case that a stolen phone would rack up enormous overseas call charges until it was reported and disabled.
OpenClaw or a bunch of agents.
If your goal is to just burn as much money as possible, as fast as possible, simply spamming expensive image/video generation requests would probably do the trick, if the key's rate limits are high enough.
There's also a practice that primarily seems to occur in china where stolen keys are resold via proxy services. A single key can provide access to thousands of users, racking up costs very fast (again, assuming the rate limits are high enough).
This might have something to do with https://news.ycombinator.com/item?id=47156925
Is there a way to limit spending on Google Cloud?
As far as I saw you can only set up billing alerts, no hard limit.
There is a way to trigger a script when a budget is hit, but they don't make it easy. You set up a billing notification that triggers a script, which can disable resources (like APIs) automatically.
https://docs.cloud.google.com/billing/docs/how-to/control-us...
Google Cloud is easy to set up soft budget alerts via email though, something that I had to use third party service for with AWS.
Those budget alerts usually aren't instant though, they only fire when the cloud gets around to reconciling your usage some number of hours or even days after the damage is done. It's better than nothing but with runaway spending you can still blow way past your limit.
One caveat to alerts (and automatically acting on alerts) is that there are delays[0] between costs being incurred and alerted. I can't find a Google source for what the delay is, but a source online say it could be "24 hours [to] a few days."[1]
This has been a major reason why I reach for OpenAI models before Gemini, but also why I'd rather use services like RunPod for training jobs. For a small boostrapped company like mine, it feels terrifyingly easy to rack up a company-ending AI bill.
The cloud companies try to limit these accidents through cranking your quotas down to nothing, but this also means that my small company can't just whip up 8xH100 easily without major ceremony, and I have routinely been rejected the GPUs quotas I needed for projects.
Accidentally leaving that kind of node on for the 24 hours that it might take to get an alert would rack up a $2,000+ bill, compared to $500 on RunPod, which will also stop the instance when you run out of money.
I've loved working with major cloud providers at growing VC-funded startups that have credits, TAMs and bigger budgets for errors. But hyperscalers are fairly difficult for a pre-scale bootstrapped business, and arguably not designed or optimized for it.
[0] https://docs.cloud.google.com/billing/docs/how-to/disable-bi... [1] https://support.terra.bio/hc/en-us/articles/360057589931-How...
There is not any practical way to do this effectively.
There are several, rather tedious and incomplete, hacks that you can apply to attempt to prevent billable actions after limits are hit.
But to be frank - they're cop-outs for a real spending cap.
You'd hope these companies would address this themselves - but it's not profitable for them to resolve (it's somewhat involved and requires them to allow people to pay them less)... So my strong vote is to make the contracts that allow this sort of "un-cappable" spending for automated actions void in court.
Would be very disappointing if that's true, but I've not known Google not to find ways to disappoint.
It's true, neither AWS nor GCP support spending limits. Only alerting.
It is worth noting that both products have had "student" tiers or similar, that had fixed credit limits with a cliff.
Therefore, they've implemented hard-limits. So not offering hard-limits is a business decision, NOT a technical one. They're essentially hiding functionality they have.
Make of that as you will. Anyone justifying it, should be me with skepticism.
I have never heard of nor seen AWS student accounts.
There is a free tier but that varies per service and anyway will not limit anything. It works as if it just gives you some credit to offset the costs.
AWS Educate "Starter" Accounts were exactly that[0]. It didn't ask for, nor need a Credit Card, and there was functionally no way to exceed.
[0] https://www.geeksforgeeks.org/cloud-computing/aws-educate-st...
They also offered (may still offer) the same thing with AWS Academy.
Soft limits would be ideal (x/day with maximum peak of x/minute), but hey, that's literally negative value to them (work to code, CPU time to implement, less income out of "mistakes")
That's because you pay for stuff like storage. If you had a spending limit, they'd have to delete your data to stop your spend.
Or do what every other industry does, and trigger a conversation. Or even don't let you store more, or restrict access. Why the need to delete?
'By the way old chap, you have gone over your storage limit. Do you want to buy more or delete some stuff?'
>By the way old chap, you have gone over your storage limit. Do you want to buy more or delete some stuff?
Why does my AWS counselor sound British. Am I in eu-west-2?
Why shouldn't it, its just a machine? Wouldn't the world be better if these messages varied a bit!
That's what alarms that you set up are for.
I've heard that Google keeps Google Drive data around for up to two years if your subscription expired and your account is over quota. They could certainly do the same with other cloud storage.
If I reduce my gdrive subscription they don’t simply delete what I have over the new (lower) limit. There is a grace period and it’s standard practice. Why should it be any different in this case?
If only there was a way to pause all the other stuff and only let storage to keep costing you ...
There is, and it would cause an outage while still not achieving the supposed goal of not going over budget. You don't want to be killing your customer's production over potential misconfigurations/forgotten budgets. Especially when you'd continue to bill them for the storage and other static things like IPs.
It's so much easier for them to have support wave accidental overuses.
If only we had the technology to exempt storage from spending limits.
As if that would solve anything? Depending on use, storage could be the largest line item (storage across databases, VMs, object storage).
not really no
you can set up a cloud function to monitor billing limits and automatically disable billing for a project if it exceeds the limits though
I understand that cloud resources and automatically stopping them beyond a certain spend is problematic and challenging in many ways, e.g. do you just destroy provisioned computer, storage, data?
But for those stupid API keys the corporations have zero excuse not to have configurable limits with a sensible default.
Billing caps? Google? Ha ha ha ha... OK, I'm sad now.
Is this part of the keys didn't use to be a secret, now they are issue with google? [1] If so they have a good case on their hands.
[1] https://news.ycombinator.com/item?id=47156925
Oof, $82k in 48 hours is brutal. Makes me even more glad I run everything local where possible.
What are you running locally? ClawdBot by chance...?
[flagged]
My understanding is that AWS budget actions also operate on a delay. I love using AWS at work but I'm never giving it my personal credit card as long as I can't turn off auto-billing.