It's interesting that we're so used to be tracked at this point that no one balks at being opted-in by default. A flag called DO_NOT_TRACK sounds like a good idea, but also suggests the default is CONSENT_TO_TRACK=1, and I find that creepy.
This flag is sent by my browser when I connect to SOMEONE ELSE’s SERVER.
The internet only took off because the primary business model which ran on ads and derivative information that servers do to their users.
It’s not fun. It’s not private or secure. It’s not illegal (in most jurisdictions for most industries). The flag exists as a response to the de facto and de jure state of the world, not some fairytale scenario.
> The internet only took off because the primary business model which ran on ads
No? It took off before advertising was widespread as a primary or sole funding business model? Also there's literally nothing about advertising that requires data collection about users. Sure they love to do it, and they might even believe that it helps their profits in some way. But it's not inherent, they got along just fine with billboards and newspaper classifieds. TV ads never required personal information. Not did pre roll cinema ads, or radio adverts. Nobody was bemoaning in the streets that they couldn't possibly find anything to buy
Article quite literally talks about tracking of cli tools you run on your own computer, half of which are to pilot products that you pay with your own money.
Wow, I guess I grew up too close to actual cowboys that this is an interpretation I just never considered. Not sure why though as it's right there for the taking.
This is set up for the same fate as DNT in browsers. Collecting all the "do not track" env vars into a single "do_not_track.env" file, however, may not be a bad idea...
Advertisers chose to ignore DNT because they claimed Microsoft making DNT enabled by default took agency away from the user. In reality, they probably weren't going to honor it anyway.
There's an inherent conflict. No one _wants_ to be tracked, there is no direct benefit to being tracked and only downsides. And advertisers want to track you. So there was no way to respect the flag other than making it obscure so only a few dedicated people turned it on.
To play devils advocate there is a direct benefit to being tracked, at least theoretically search and ads will more relevant to you. I get no one wants ads but you do see ads here and there. It would arguably be better for you if everyone of them was relevant than not. Similarly search or even LLM answers could be better if the preferences of the asker are known
No, in not making excuses for tracking and I do lots of stuff myself of avoid being tracked
I’m only responding to the false premise that there are no benefits. There are. You can just choose to believe they aren’t worth the cost. I believe they aren’t but I have friends who opt into all tracking and even register their presence with multiple apps. They believe they’ll make more positive connections
Microsoft is too sophisticated to plead ignorance; they are responsible for that outcome and I think we can assume they knowningly chose it. (Though now Microsoft browsers are such a small portion of the market that it doesn't matter.)
The biggest failure of DNT was browser makers - including Mozilla - removing it. It has zero performance impact (1 bit?) or development cost. As long as it was out there, when there was momentum against tracking, advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
> advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
This evidence both still exists and is also completely useless for anything. The more important consideration, by far, is that the DNT flag was actively harmful to users in the real world because, if it was acknowledged at all, it was used maliciously to help fingerprint and track users. There is no reason for browsers to continue providing to their users a toggle that not only misleads them about what will happen with the setting enabled, but actively contributes to the opposite outcome because we live in a world where being evil is the norm.
Lately, I've come across websites that instead of a cookie banner display a banner that states they recognize and honor my wish to not be tracked. Whether that really do or not is something I did not spend time looking into. The first time I saw it I thought it was a fluke, and then it happened a few more times with in a short time period. Couldn't tell you what sites they were though as it was just something from search results.
Love it. This is an annoying problem and likely the actual solution than asking folks to use a universal one. I'll put something together as a starting point.
I was surprised how hard it was to stop the Python transformers library from phoning home to Hugging Face. I set HF_HUB_DISABLE_TELEMETRY=1, and when I called Wav2Vec2CTCTokenizer.from_pretrained I explicitly passed local_files_only=True, but still I got got a warning about not having a valid HF_TOKEN. It wasn't until I stumbled upon HF_HUB_OFFLINE=1 that I'm somewhat confident that I'm not making outgoing connections to HF every time I load a wav2vec2 model from disk.
I wouldn't have realized this was happening at all if it weren't for the obnoxious HF_TOKEN warning.
HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.
Looks like a helpful honeypot! Any tool that will public announce support for this spec is a tool I know to avoid because it collects telemetry without explicit opt-in in the first place.
DO_NOT_TRACK support doesn't mean tracking is not an explicit opt-in.
Example: the software crashes, and there is a crash handler that asks you if you want to send a crash dump. With DO_NOT_TRACK, the crash handler is disabled entirely, no question, no dump.
If it gets some adoption, that's probably how it will work. Those who have an financial interest in using tracking (ex: ads) probably won't support such an option.
It's probably easier to run your own DNS and blacklist the offending domains. There are good blacklists with millions of telemetry domains, e.g. https://github.com/hagezi/dns-blocklists.
> Many CLI tools, SDKs, and frameworks collect telemetry data by default.
Any of those are using a dark pattern and before exploring new ways to opt out you should look for and spend your energy on an alternative which respects your freedoms upfront.
I don't think there is any way to stop people from tracking you. Technically speaking, you can pretty much always be tracked. Even if you eliminated all third party requests you could still be tracked. Downloads, logins, queries, etc all can be tracked. Virtually all software now has the "continuously upgrade to the latest version" bullshit so you are tracked every time you open the app. Even if you turn it off, they stop the app from working until you upgrade, so they force you to be tracked.
I think the only solution is to make it law that you can't track anyone for any reason without their consent, and can't sell consensual tracking data without an additional consent agreement. It would be a huge blow to the advertising industry, so it will never be made law, but it's the only thing that would work.
Also every time you install a program Microsoft, Apple and Google knows depending on the device. For your safety of course. The tracking is so pervasive and the majority of people do not care.
Domain blocking is my preference but I would imagine that trackers probably also try to weed out data that contains racism, sexism, lewdness or some combination thereof. People can get very creative with ASCII art. AI surely does not accept such things.
The issue is that it is not enforced. My version of My IP will tell you if 'Do Not track' and 'Global Privacy Control' are set by your browser but it is up to the website to honour your requests. Check if your browser is sending them by visiting: https://fshot.org/utils/myip.php
This is just sad. Luckily I do not use any of the listed programs. I threw out Homebrew many years ago when they started this nonsense.
The only tool I have installed currently that does %/"($& like this is Deno (required for yt-dlp now). It phones happily home even if you wrap it into a wrapper script that forces the env variable (in no way I'll pollute my default environment with stuff like this):
I wish bad dreams to whoever puts such crap into their software! Thankfully I have Little Snitch to catch most of those kind of invasions of my privacy.
And setting that env var should require a notarized consent to track contract that has an expiration of at most 60 days and has penalties of jail time for any data related to that telemetry, anonymized or not that is shared with a third party, for any reason, including but not limited to fulfilling the service the business purports to be providing.
It should be much more difficult to collect data than to opt out of collection.
Could you provide more details? Many applications use multiple processes, and use some intermittently. It seems like quite a bit of work to enumerate every process used and then to keep the white/blacklist updated as usage and software changes - every new application or command you use, every update, every OS change that affects networking or system calls etc ...
Yes, with security comes inconvenience, this is inevitable.
I'm not a daily user of network namespaces, and would probably write a script to do the configuration within a shell (it works a bit like containers). The configuration is inherited by child processes, so you only have to do it once. Basically whitelist the urls you typically use, and maybe let the script popup a dialog asking you to allow access when the firewall catches a domain that is not in the whitelist yet.
I'd be interested in,
1. a SOME-TRUST model: a list of opt-outs for the known software that collect telemetry; so that I can just paste that into an env file and be done with it.
2. a ZERO-TRUST model [preferable]: where I control if an application can send any telemetry data; instead of depending on a flag that the distributor may or may not respect.
I’m morally opposed to the notion of optimizing the opt-out mechanism. I want a standardized opt-in mechanism, like:
export ALLOW_TRACKING=telemetry,crash_dumps
and the absence of such a setting means “fuck off, don’t spy on me”. It’s not my responsibility to turn off apps wanting to track me. It’s their responsibility to get me to authorize their specific flavor of tracking.
Default opt-in tracking should be illegal and enforced with such fines and prison sentences, that companies wouldn't even dare to have anything remotely capable of tracking in the runtime.
Unfortunately big corporations can always find away to make regulators see no problem.
I'm sure this will be about as effective as putting yourself on the do not call list for domestic phone telemarketers, which has absolutely no effect whatsoever on overseas scam call centers.
This does not make sense to support. Businesses that have proper privacy controls and security do not want to be lumped together with random shady apps and want users to explicitly opt out. Another issue with this header is that users could set it and then accidentally opt out of other sharing that they don't realize since this header is being set somewhere random. Standardizing on a per app basis way to revoke consent, along with showing privacy polices and measures the apps have put in place for guarding security would be a more sensible alternative that could gain traction.
Honest question, what's the problem with crash dumps that include no personal info? They just help make the software less buggy. I also don't see an issue with anonymized usage patterns (this feature was used X times this month, this one Y times, etc).
Can someone expound on what they see as a problem?
> Honest question, what's the problem with crash dumps that include no personal info?
In addition to the other response: crash dumps are difficult to anonymize, both because useful crash dumps include something like a minidump (or some other small alternative to a core file), and because even without that, any random information from a backtrace may be sensitive (e.g. a URL).
There's nothing wrong with saving a crash dump and giving the user control of whether to submit a bug report.
I would suggest that the default to enrolling people in supplying such information is the issue. In a world driven by surveillance capitalism, even "anonymous" data can be used for much broader purposes (think, for example, of when and where people are using tools geographically and at what times: you can start to track the behaviour of people in this way).
Users should never be opted in through usage alone of free or paid-for tooling to supply information that isn't part of the function of the tool. Where that is required for a service or product, you should opt-in explicitly, not implicitly.
It's interesting that we're so used to be tracked at this point that no one balks at being opted-in by default. A flag called DO_NOT_TRACK sounds like a good idea, but also suggests the default is CONSENT_TO_TRACK=1, and I find that creepy.
Do not track WHEN?
This flag is sent by my browser when I connect to SOMEONE ELSE’s SERVER.
The internet only took off because the primary business model which ran on ads and derivative information that servers do to their users.
It’s not fun. It’s not private or secure. It’s not illegal (in most jurisdictions for most industries). The flag exists as a response to the de facto and de jure state of the world, not some fairytale scenario.
> The internet only took off because the primary business model which ran on ads
No? It took off before advertising was widespread as a primary or sole funding business model? Also there's literally nothing about advertising that requires data collection about users. Sure they love to do it, and they might even believe that it helps their profits in some way. But it's not inherent, they got along just fine with billboards and newspaper classifieds. TV ads never required personal information. Not did pre roll cinema ads, or radio adverts. Nobody was bemoaning in the streets that they couldn't possibly find anything to buy
Article quite literally talks about tracking of cli tools you run on your own computer, half of which are to pilot products that you pay with your own money.
Get off your high horse.
I would advocate for not getting your horse high to begin with, or hide your stash better.
Wow, I guess I grew up too close to actual cowboys that this is an interpretation I just never considered. Not sure why though as it's right there for the taking.
This is set up for the same fate as DNT in browsers. Collecting all the "do not track" env vars into a single "do_not_track.env" file, however, may not be a bad idea...
https://toptout.me - exists and handles a lot of these problems, if not looking to create a new wheel.
Though if you just want a simple ENV var that handles this WHILE honoring the specification on this page: https://github.com/alloydwhitlock/do-not-track-cli
Advertisers chose to ignore DNT because they claimed Microsoft making DNT enabled by default took agency away from the user. In reality, they probably weren't going to honor it anyway.
There's an inherent conflict. No one _wants_ to be tracked, there is no direct benefit to being tracked and only downsides. And advertisers want to track you. So there was no way to respect the flag other than making it obscure so only a few dedicated people turned it on.
To play devils advocate there is a direct benefit to being tracked, at least theoretically search and ads will more relevant to you. I get no one wants ads but you do see ads here and there. It would arguably be better for you if everyone of them was relevant than not. Similarly search or even LLM answers could be better if the preferences of the asker are known
No, in not making excuses for tracking and I do lots of stuff myself of avoid being tracked
I’m only responding to the false premise that there are no benefits. There are. You can just choose to believe they aren’t worth the cost. I believe they aren’t but I have friends who opt into all tracking and even register their presence with multiple apps. They believe they’ll make more positive connections
> No one _wants_ to be tracked
Plenty of people seem to genuinely believe that “personalized ads” are good for them.
Microsoft is too sophisticated to plead ignorance; they are responsible for that outcome and I think we can assume they knowningly chose it. (Though now Microsoft browsers are such a small portion of the market that it doesn't matter.)
The biggest failure of DNT was browser makers - including Mozilla - removing it. It has zero performance impact (1 bit?) or development cost. As long as it was out there, when there was momentum against tracking, advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
> advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
This evidence both still exists and is also completely useless for anything. The more important consideration, by far, is that the DNT flag was actively harmful to users in the real world because, if it was acknowledged at all, it was used maliciously to help fingerprint and track users. There is no reason for browsers to continue providing to their users a toggle that not only misleads them about what will happen with the setting enabled, but actively contributes to the opposite outcome because we live in a world where being evil is the norm.
Lately, I've come across websites that instead of a cookie banner display a banner that states they recognize and honor my wish to not be tracked. Whether that really do or not is something I did not spend time looking into. The first time I saw it I thought it was a fluke, and then it happened a few more times with in a short time period. Couldn't tell you what sites they were though as it was just something from search results.
Just here to say yeah, I've seen this more of this lately- "The do-not-track signal has been followed" or somesuch.
Wow. I've never seen that. It would be great if it became more widespread.
But isn't DNT deprecated in most browsers? Maybe I misremember.
Love it. This is an annoying problem and likely the actual solution than asking folks to use a universal one. I'll put something together as a starting point.
I was surprised how hard it was to stop the Python transformers library from phoning home to Hugging Face. I set HF_HUB_DISABLE_TELEMETRY=1, and when I called Wav2Vec2CTCTokenizer.from_pretrained I explicitly passed local_files_only=True, but still I got got a warning about not having a valid HF_TOKEN. It wasn't until I stumbled upon HF_HUB_OFFLINE=1 that I'm somewhat confident that I'm not making outgoing connections to HF every time I load a wav2vec2 model from disk.
I wouldn't have realized this was happening at all if it weren't for the obnoxious HF_TOKEN warning.
HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.
Dos something like Little Snitch catch these to help find the things doing hidden shenanigans?
Looks like a helpful honeypot! Any tool that will public announce support for this spec is a tool I know to avoid because it collects telemetry without explicit opt-in in the first place.
DO_NOT_TRACK support doesn't mean tracking is not an explicit opt-in.
Example: the software crashes, and there is a crash handler that asks you if you want to send a crash dump. With DO_NOT_TRACK, the crash handler is disabled entirely, no question, no dump.
If it gets some adoption, that's probably how it will work. Those who have an financial interest in using tracking (ex: ads) probably won't support such an option.
i can't think of a single CLI that is possibly collecting analytics for ads
Most services are already collecting telemetry, them announcing support for it won't change that.
Well, don't look too deep else you won't be using many modern tools.
Hey, it's a list of services to feed fake data to!
Same thing has been suggested a few years ago and it went nowhere.
https://web.archive.org/web/20200613155957/https://consoledo...
It's probably easier to run your own DNS and blacklist the offending domains. There are good blacklists with millions of telemetry domains, e.g. https://github.com/hagezi/dns-blocklists.
Better yet, don't allow such spyware crap on your computer.
pfft, just don't have a computer and you'll be good
Some hobbies are more fun than others.
That is the correct way of handling this.
Everyone proclaiming a "standard" is just adding to the long list of (unofficial) alternatives.
obligatory: https://xkcd.com/927/
The most useful part of this page is the list of optout commands to stick in my shellrc.
Is anyone maintaining a more complete list of those?
an LLM would do a fine job for most common things, doesn't really matter if a few of them get hallucinated
> Many CLI tools, SDKs, and frameworks collect telemetry data by default.
Any of those are using a dark pattern and before exploring new ways to opt out you should look for and spend your energy on an alternative which respects your freedoms upfront.
Exactly, new “standard” won’t fix it
just sinkhole the domains
https://dpaste.com/E7RZ34MVD
https://github.com/StevenBlack/hosts
I thought it would be a sh script to automatically set the flags for all known do not track env vars.
It worked so well on the browser already
Given the URL and list of different opt-outs I thought this was going to be a shell script to set all these for you. In fact, I've just had an idea...
Exactly what I was thinking.
I don't think there is any way to stop people from tracking you. Technically speaking, you can pretty much always be tracked. Even if you eliminated all third party requests you could still be tracked. Downloads, logins, queries, etc all can be tracked. Virtually all software now has the "continuously upgrade to the latest version" bullshit so you are tracked every time you open the app. Even if you turn it off, they stop the app from working until you upgrade, so they force you to be tracked.
I think the only solution is to make it law that you can't track anyone for any reason without their consent, and can't sell consensual tracking data without an additional consent agreement. It would be a huge blow to the advertising industry, so it will never be made law, but it's the only thing that would work.
Also every time you install a program Microsoft, Apple and Google knows depending on the device. For your safety of course. The tracking is so pervasive and the majority of people do not care.
It’s already a law in Europe. GDPR and ePrivacy. You have to get consent from the user. Having worked for European companies, they take it seriously.
The assumption that telemetry is not allowed by GDPR is flawed
https://gdpr-info.eu/recitals/no-26/
Anonymous telemetry is allowed – and I don’t have a problem with that.
[delayed]
I'm old enough to remember Nancy Reagan just say no!I think this has the same effect.
Domain blocking is my preference but I would imagine that trackers probably also try to weed out data that contains racism, sexism, lewdness or some combination thereof. People can get very creative with ASCII art. AI surely does not accept such things.
The issue is that it is not enforced. My version of My IP will tell you if 'Do Not track' and 'Global Privacy Control' are set by your browser but it is up to the website to honour your requests. Check if your browser is sending them by visiting: https://fshot.org/utils/myip.php
That's great, but isn't DNT deprecated?
Was wondering if there was a list of known opt outs as we are looking at a default opt out in Renovate[0] - we'll also look to set `DO_NOT_TRACK`
[0]: https://github.com/renovatebot/renovate/discussions/42932
This is just sad. Luckily I do not use any of the listed programs. I threw out Homebrew many years ago when they started this nonsense.
The only tool I have installed currently that does %/"($& like this is Deno (required for yt-dlp now). It phones happily home even if you wrap it into a wrapper script that forces the env variable (in no way I'll pollute my default environment with stuff like this):
I wish bad dreams to whoever puts such crap into their software! Thankfully I have Little Snitch to catch most of those kind of invasions of my privacy.No, it should be a required (by law) opt-in TRACK_ME_I_DO_NOT_CARE_OR_AM_A_TEAPOT=418.
The proposed way just normalizes tracking.
And setting that env var should require a notarized consent to track contract that has an expiration of at most 60 days and has penalties of jail time for any data related to that telemetry, anonymized or not that is shared with a third party, for any reason, including but not limited to fulfilling the service the business purports to be providing.
It should be much more difficult to collect data than to opt out of collection.
Also this, we disable it when building or deploying apps in DollarDeploy
export SEMGREP_SEND_METRICS=off export COLLECT_LEARNINGS_OPT_OUT=true export STORYBOOK_DISABLE_TELEMETRY=1 export NEXT_TELEMETRY_DISABLED=1 export SLS_TELEMETRY_DISABLED=1 export SLS_NOTIFICATIONS_MODE=off export DISABLE_OPENCOLLECTIVE=true export NPM_CONFIG_UPDATE_NOTIFIER=false
Love the idea but is an env var enough. Are there some sessions (docker?) that may not get it.
I'd prefer TRACK_ME as an opt in.
https://xkcd.com/927/
It feels like this should be no_track, for consistency with no_color
You can also use network namespaces to simply block internet access for certain processes. It can even be finetuned with whitelists or blacklists.
Could you provide more details? Many applications use multiple processes, and use some intermittently. It seems like quite a bit of work to enumerate every process used and then to keep the white/blacklist updated as usage and software changes - every new application or command you use, every update, every OS change that affects networking or system calls etc ...
Yes, with security comes inconvenience, this is inevitable.
I'm not a daily user of network namespaces, and would probably write a script to do the configuration within a shell (it works a bit like containers). The configuration is inherited by child processes, so you only have to do it once. Basically whitelist the urls you typically use, and maybe let the script popup a dialog asking you to allow access when the firewall catches a domain that is not in the whitelist yet.
I'd be interested in, 1. a SOME-TRUST model: a list of opt-outs for the known software that collect telemetry; so that I can just paste that into an env file and be done with it. 2. a ZERO-TRUST model [preferable]: where I control if an application can send any telemetry data; instead of depending on a flag that the distributor may or may not respect.
Privacy should be treated as a right, not something that can be abused for money. Love the idea of this
If solution was real, it would be DO_TRACK=1, not the inverse.
I’m morally opposed to the notion of optimizing the opt-out mechanism. I want a standardized opt-in mechanism, like:
and the absence of such a setting means “fuck off, don’t spy on me”. It’s not my responsibility to turn off apps wanting to track me. It’s their responsibility to get me to authorize their specific flavor of tracking.> It’s their responsibility to get me to authorize their specific flavor of tracking.
And they do by burying it in the user agreement you probably agreed to.
Like it or not, it is your responsibility. I agree it shouldn’t be, but let’s be realistic.
Then it's my responsibility to feed them fake data.
They didn't opt out of my data, after all.
Default opt-in tracking should be illegal and enforced with such fines and prison sentences, that companies wouldn't even dare to have anything remotely capable of tracking in the runtime.
Unfortunately big corporations can always find away to make regulators see no problem.
> Default opt-in
This is called opt out.
I'm sure this will be about as effective as putting yourself on the do not call list for domestic phone telemarketers, which has absolutely no effect whatsoever on overseas scam call centers.
This does not make sense to support. Businesses that have proper privacy controls and security do not want to be lumped together with random shady apps and want users to explicitly opt out. Another issue with this header is that users could set it and then accidentally opt out of other sharing that they don't realize since this header is being set somewhere random. Standardizing on a per app basis way to revoke consent, along with showing privacy polices and measures the apps have put in place for guarding security would be a more sensible alternative that could gain traction.
Gathering information without real consent is shady.
Honest question, what's the problem with crash dumps that include no personal info? They just help make the software less buggy. I also don't see an issue with anonymized usage patterns (this feature was used X times this month, this one Y times, etc).
Can someone expound on what they see as a problem?
> Honest question, what's the problem with crash dumps that include no personal info?
In addition to the other response: crash dumps are difficult to anonymize, both because useful crash dumps include something like a minidump (or some other small alternative to a core file), and because even without that, any random information from a backtrace may be sensitive (e.g. a URL).
There's nothing wrong with saving a crash dump and giving the user control of whether to submit a bug report.
I'm more thinking Python crashes, where you just get the lines that executed, and ~zero identifiable data.
I would suggest that the default to enrolling people in supplying such information is the issue. In a world driven by surveillance capitalism, even "anonymous" data can be used for much broader purposes (think, for example, of when and where people are using tools geographically and at what times: you can start to track the behaviour of people in this way).
Users should never be opted in through usage alone of free or paid-for tooling to supply information that isn't part of the function of the tool. Where that is required for a service or product, you should opt-in explicitly, not implicitly.
That's fair, thanks.
He’s better off vibecoding an include.sh that sets all the known do not track env vars for you.
Am I the only one who also finds it comical that rejecting cookies requires a cookie.