There are two issues I see here (besides the obvious “Why do we even let this happen in the first place?”):
1. What happened to all the data Copilot trained on that was confidential? How is that data separated and deleted from the model’s training? How can we be sure it’s gone?
2. This issue was found; unfortunately without a much better security posture from Microsoft, we have no way of knowing what issues are currently lurking that are as bad as —- if not worse than —- what happened here.
There’s a serious fundamental flaw in the thinking and misguided incentives that led to “sprinkle AI everywhere”, and instead of taking a step back and rethinking that approach, we’re going to get pieced together fixes and still be left with the foundational problem that everyone’s data is just one prompt injection away from being taken; whether it’s labeled as “secure” or not.
> "The Microsoft 365 Copilot 'work tab' Chat is summarizing email messages even though these email messages have a sensitivity label applied and a DLP policy is configured."
> DLP, with collection policies, monitors and protects against oversharing to Unmanaged cloud apps by targeting data transmitted on your network and in Microsoft Edge for Business. Create policies that target Inline web traffic (preview) and Network activity (preview) to cover locations like:
> OpenAI ChatGPT—for Edge for Business and Network options
> Google Gemini—for Edge for Business and Network options
> DeepSeek—for Edge for Business and Network options
> Microsoft Copilot—for Edge for Business and Network options
> Over 34,000 cloud apps in the Microsoft Defender for Cloud Apps cloud app catalog—Network option only
> a DLP policy is apparently ineffective at its purpose
/Offtopic
Yes, MSFT's DLP/software malfunctioned, but getting users to MANUALLY classify things as confidential is already an uphill battle. These are for the rare subset of people that are aware of and compliant with NDAs/Confidentiality Agreements!
I'm an AI researcher, here's my beliefs (it'll be clear in a second why I say belief and not claim objective facts)
1) you can't be sure it's gone. It's even questionable if data can be removed (longer discussion needed). These are compression machines, so the very act of training is compressing that information. The question really becomes how well that information is compressed or embedded into the model. On one hand, the models (typically) aren't invertible so the information is less likely to be compressed lodslessly. On the other hand, the models aren't invertible, so reversing them is probabilistic and they are harder to analyze in this sense.
2) as you may gather from 1) there's almost certainly more issues like this. There are many unknown unknowns waiting to be discovered. Personally this is why I'm very upset the field is so product focused and that a large portion regards theory as pointless. Theory does two things for us because it builds a deeper and more nuanced understanding. Theory advancing allows us to develop faster as we can iterate on paper rather than through experimentation. This allows us to better search the solution space and even understand our understanding. This also leads to better safety of models as it is necessary to understand them to understand where they fail and how to prevent those failures. Experimentation alone is incredibly naïve. It is like proving the correctness of your programs through testing (see the issues with TDD). Tests are great but they are bounds, not proofs. They can suggest safety, give you some level of confidence in safety, but they cannot guarantee it. We all know that the deeper understanding of your code the better tests you can write, and this is the same thing here. That theory is reducing your unknown unknowns and even before strong proofs are made we can get wider coverage in our testing.
I think we're so excited right now we're blinding ourselves. If we're cutting off or reducing fundamental research then we are killing the pipeline of development. Theory is the foundation that engineering sits on top of. But what worries me is that there's so many unknown unknowns and everyone is eagerly saying "we're just need 'good enough'" or "what's the minimum viable product". These are useful tools/questions but they have limits and it gets dangerous when putting out the minimum at scale
Copilot is not a model, to my knowledge. When you’re asking about the data that it was trained on, you are most likely referring to an OpenAI or, in some circumstances, an Anthropic model. Customer data is not used for training the models that run Copilot.
All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
They have significant experience in this. Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.
> All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
It depends. E.g. OpenAI says: "By default, we do not train on any inputs or outputs from our products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API."[0]
Why would they want to train on random garbage proprietary emails?
If their models ever spit out obviously confidential information belonging to their paying customers they'll lose those paying customers to their competitors - and probably face significant legal costs as well.
Your random confidential corporate email really isn't that valuable for training. I'd argue it's more like toxic waste that should be avoided at all costs.
I recently switched my work laptop from a Dell to a MacBook. I found out that windows 11 has so much corporate bloat, than even MS apps like outlook, office and OneDrive functions better on a Mac than on Windows 11.
Performance is also degrading on iphones as software bloats, and/or they're up to their old shenanigans and making older phones unbearable to force people to buy the newest ones.
Big tech is reaping what they've sown in a very satisfying way.
Don't forget Apple handwaving serious security issues of their devices - users still cannot even check if their devices are compromised and only thing Apple can do here is "lockdown mode" - which again, after compromise is likely useless anyway.
The problem with the Microsoft features is really not excessive ambition.
Half of the time it's open user hostility and blatant incompetence. The other half it's just the incompetence. Ambition doesn't enter the picture at all.
Yes, and those ideas are user hostile and poorly conceived, badly executed, and incompetently built.
A remote code execution exploit in notepad?! That's not professional, or skillful, or well done. Unnecessary feature bloat and change for the sake of change, because some MBA dork wants to justify their department and continued employment by checking boxes on spreadsheets.
There's no innovation or skillful, well built features. There's hardly any consideration of users at all, except as net continuing depositors of money into Microsoft coffers. Features and updates are nothing more than marketing slop and manipulation of enterprise into renewing subscriptions and purchasing the latest version of new hardware.
edit:
I just don't think that you can point at a company whose entire foundational product, Windows, the operating system that's pretty much default for most of the world, and say that they're not completely and utterly failing as a company when their single most compelling "feature" is that the OS can run Excel.
It's the year of the Linux desktop, fire it up and never look back!
Microsoft somehow sees a future where LLMs have access to everything in your screen. In that dystopia, adding "confidential" tags or prompt instructions to ignore some types of content is never going to be enough. If you don't want LLMs to exfiltrate content then they cannot have access to it, period.
Microsoft wants access to everything in your screen (as well as the contents of your personal files) and feeding that to an LLM just makes it easier for them to profit from that data
> However, this ongoing incident has been tagged as an advisory, a flag commonly used to describe service issues typically involving limited scope or impact.
How is having Copilot breach trust and privacy an “advisory”? Am I missing something?
Advisory doesn't have the same meaning in security research as it does in the english language.
Unfortunately "Advisory" is a report written about a security incident, like an official statement about the bug, it's impact, and how to fix it -- which differs from the english meaning... it's not meant to mean to "advise" people or to "take something" under "advisory" (which, is a very soft statement typically).
Words have multiple meanings depending on context, and here it's at best ambiguous. In the context of security incidents, logging, auditing, etc., "advisory" is often used as a severity level (and one of the lower ones at that).
So, yes, technically, it's de-facto advisory to publish this information, but assigning "advisory" as a severity tag here is questionable.
The LLM that wrote this nearly content-free story doesn't know what it's talking about.
The basic distinction in the infosec industry is that advisories are what you publish to tell customers that you had a bug in your product that might have exposed them or their data to attacks and you want them to take some specific action (e.g., upgrade a package, review logs); while an incident report is what you publish when you know that the damage happened, it involved your infrastructure, and you want to share some details about happened and how you're going to prevent it from happening again.
Because the latter invites a lot more public attention and regulatory scrutiny, a company like Microsoft will go out of their way to stick to advisories whenever possible (or just keep incidents under wraps). It might have happened at some points in their history, but off the top of my head, I don't recall Microsoft ever publishing a first-party security incident report.
If you inflate severity, people simply ignore incident warnings.
What's the actual action needed here by a security team? None. You can hate it or not care but the end of the day there's no remediation or imminent harm, just a potential issue with DLP policies. Don't make it look like a 0-day that they actually have to deal with.
Reads to me like it is not accessing other users mailboxes, its just accessing the current user's mailbox (like its meant to) but its supposed to ignore current user's emails that have a 'confidential' flag and that bit had a bug
I think that Microsoft would rather not acknowledge that one. It's much easier to hide behind a simple "bug" than to admit to such a massive security breach.
The article doesn't say if the confidentiality labels were created with encryption. I've been using the latter (with Preview DLP) to prevent emails leaking out to _external_ integrations, which can't access the keys. With MS internal tooling, it's feasible that it access to the key, in which case that would be even worse. Does anyone know if this happened?
All these government contractors are forced to pay astronomical cloud bills to get "GCC-High" because it passes the right security-theater checklist, and then it totally ignores the DLP settings anyway!
calling it a bug is generous. the whole point of these tools is to read everything you have access to. the 'bug' is that it worked exactly as designed but on the wrong emails
I wonder, is Microsoft doing “outsider trading”, where they covertly pipe analytical data to the executives’ independently-owned stock trading houses as “tips”? They’ve had access to so many corporate internal emails for so long, with MS365, but Copilot is the perfect way to mask such analysis. Also Copilot would be good at analysing emails and providing useful “tips”.
None of this should surprise anyone by now. You are being lied to, continually.
You guys need to read the actual manifestos these AI leaders have written. And if not them, then read the propagandist stories they have others write like The Overstory by Richard Powers which is an arrogant pile of trash that culminates in the moral:
humans are horrible and obsolete and all should die and leave the earth for our new AI child
Which is of course, horseshit. They just want most people to die off, not all. And certainly not themselves.
They don't care about your confidential information, or anything else about you.
Oh, poor desperate Microsoft. No amount of bug fixing is going to fix Microsoft. Now that they've embarked on the LLM journey they're not going to know what's going to hit them next.
There are two issues I see here (besides the obvious “Why do we even let this happen in the first place?”):
1. What happened to all the data Copilot trained on that was confidential? How is that data separated and deleted from the model’s training? How can we be sure it’s gone?
2. This issue was found; unfortunately without a much better security posture from Microsoft, we have no way of knowing what issues are currently lurking that are as bad as —- if not worse than —- what happened here.
There’s a serious fundamental flaw in the thinking and misguided incentives that led to “sprinkle AI everywhere”, and instead of taking a step back and rethinking that approach, we’re going to get pieced together fixes and still be left with the foundational problem that everyone’s data is just one prompt injection away from being taken; whether it’s labeled as “secure” or not.
> "The Microsoft 365 Copilot 'work tab' Chat is summarizing email messages even though these email messages have a sensitivity label applied and a DLP policy is configured."
I'd add (3) - a DLP policy is apparently ineffective at its purpose: monitoring data sharing between machines. (https://learn.microsoft.com/en-us/purview/dlp-learn-about-dl...).
Directly from the DLP feature page:
> DLP, with collection policies, monitors and protects against oversharing to Unmanaged cloud apps by targeting data transmitted on your network and in Microsoft Edge for Business. Create policies that target Inline web traffic (preview) and Network activity (preview) to cover locations like:
> OpenAI ChatGPT—for Edge for Business and Network options > Google Gemini—for Edge for Business and Network options > DeepSeek—for Edge for Business and Network options > Microsoft Copilot—for Edge for Business and Network options > Over 34,000 cloud apps in the Microsoft Defender for Cloud Apps cloud app catalog—Network option only
> a DLP policy is apparently ineffective at its purpose
/Offtopic
Yes, MSFT's DLP/software malfunctioned, but getting users to MANUALLY classify things as confidential is already an uphill battle. These are for the rare subset of people that are aware of and compliant with NDAs/Confidentiality Agreements!
Who can blame them, when in the end, it gets ignored anyways?
I'm an AI researcher, here's my beliefs (it'll be clear in a second why I say belief and not claim objective facts)
1) you can't be sure it's gone. It's even questionable if data can be removed (longer discussion needed). These are compression machines, so the very act of training is compressing that information. The question really becomes how well that information is compressed or embedded into the model. On one hand, the models (typically) aren't invertible so the information is less likely to be compressed lodslessly. On the other hand, the models aren't invertible, so reversing them is probabilistic and they are harder to analyze in this sense.
2) as you may gather from 1) there's almost certainly more issues like this. There are many unknown unknowns waiting to be discovered. Personally this is why I'm very upset the field is so product focused and that a large portion regards theory as pointless. Theory does two things for us because it builds a deeper and more nuanced understanding. Theory advancing allows us to develop faster as we can iterate on paper rather than through experimentation. This allows us to better search the solution space and even understand our understanding. This also leads to better safety of models as it is necessary to understand them to understand where they fail and how to prevent those failures. Experimentation alone is incredibly naïve. It is like proving the correctness of your programs through testing (see the issues with TDD). Tests are great but they are bounds, not proofs. They can suggest safety, give you some level of confidence in safety, but they cannot guarantee it. We all know that the deeper understanding of your code the better tests you can write, and this is the same thing here. That theory is reducing your unknown unknowns and even before strong proofs are made we can get wider coverage in our testing.
I think we're so excited right now we're blinding ourselves. If we're cutting off or reducing fundamental research then we are killing the pipeline of development. Theory is the foundation that engineering sits on top of. But what worries me is that there's so many unknown unknowns and everyone is eagerly saying "we're just need 'good enough'" or "what's the minimum viable product". These are useful tools/questions but they have limits and it gets dangerous when putting out the minimum at scale
Copilot is not a model, to my knowledge. When you’re asking about the data that it was trained on, you are most likely referring to an OpenAI or, in some circumstances, an Anthropic model. Customer data is not used for training the models that run Copilot.
All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
They have significant experience in this. Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.
> All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.
It depends. E.g. OpenAI says: "By default, we do not train on any inputs or outputs from our products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API."[0]
[0] https://openai.com/policies/how-your-data-is-used-to-improve...
"By default" is a fantastic escape catch in the language used there. So... What are the exceptions?
Why would they want to train on random garbage proprietary emails?
If their models ever spit out obviously confidential information belonging to their paying customers they'll lose those paying customers to their competitors - and probably face significant legal costs as well.
Your random confidential corporate email really isn't that valuable for training. I'd argue it's more like toxic waste that should be avoided at all costs.
> Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.
That was pretty funny and explains a lot.
I wish I could do more :(
Instead I always break things when I paraphrase code without the GeniusParaphrasingTool
This is exactly why I moved to self hosted code in 2017.
While I couldn’t have predicted the future, even classic data mining posed a risk.
It is just reality that if you give a third party access to your data, you should expect them to use it.
It is just too tempting of a value stream and legislation just isn’t there to avoid the EULA trap.
I was targeting a market where fractions of a percentage advantage were important which did drive my what at the time was labeled paranoia
Seems like every day there's another compelling reason to switch to Linux. Microsoft is doing truly incredible work this year!
I recently switched my work laptop from a Dell to a MacBook. I found out that windows 11 has so much corporate bloat, than even MS apps like outlook, office and OneDrive functions better on a Mac than on Windows 11.
It’s been this way forever.
The problem would still exist if you use Linux. This is a cloud service issue, not an OS issue.
Apple not doing much better, but from the other end.
Microsoft releasing overly ambitious features with disastrous consequences.
Apple releasing features so unambitious it's hard to remember they're there.
Performance is also degrading on iphones as software bloats, and/or they're up to their old shenanigans and making older phones unbearable to force people to buy the newest ones.
Big tech is reaping what they've sown in a very satisfying way.
We can safely assume that Apple will do much better compared to MS until they put AI to the Finder and Dock.
Don't forget Apple handwaving serious security issues of their devices - users still cannot even check if their devices are compromised and only thing Apple can do here is "lockdown mode" - which again, after compromise is likely useless anyway.
The problem with the Microsoft features is really not excessive ambition.
Half of the time it's open user hostility and blatant incompetence. The other half it's just the incompetence. Ambition doesn't enter the picture at all.
Eh. I think it is ambition. It's a lot product managers coming up with ideas, I think, and teams with a mandate to release those ideas.
Yes, and those ideas are user hostile and poorly conceived, badly executed, and incompetently built.
A remote code execution exploit in notepad?! That's not professional, or skillful, or well done. Unnecessary feature bloat and change for the sake of change, because some MBA dork wants to justify their department and continued employment by checking boxes on spreadsheets.
There's no innovation or skillful, well built features. There's hardly any consideration of users at all, except as net continuing depositors of money into Microsoft coffers. Features and updates are nothing more than marketing slop and manipulation of enterprise into renewing subscriptions and purchasing the latest version of new hardware.
edit:
I just don't think that you can point at a company whose entire foundational product, Windows, the operating system that's pretty much default for most of the world, and say that they're not completely and utterly failing as a company when their single most compelling "feature" is that the OS can run Excel.
It's the year of the Linux desktop, fire it up and never look back!
I agree except for Microsoft "failing". Windows is failing. Microsoft has moved onto other things.
azure is just as bad, if not worse
Microsoft somehow sees a future where LLMs have access to everything in your screen. In that dystopia, adding "confidential" tags or prompt instructions to ignore some types of content is never going to be enough. If you don't want LLMs to exfiltrate content then they cannot have access to it, period.
Microsoft wants access to everything in your screen (as well as the contents of your personal files) and feeding that to an LLM just makes it easier for them to profit from that data
> However, this ongoing incident has been tagged as an advisory, a flag commonly used to describe service issues typically involving limited scope or impact.
How is having Copilot breach trust and privacy an “advisory”? Am I missing something?
Advisory doesn't have the same meaning in security research as it does in the english language.
Unfortunately "Advisory" is a report written about a security incident, like an official statement about the bug, it's impact, and how to fix it -- which differs from the english meaning... it's not meant to mean to "advise" people or to "take something" under "advisory" (which, is a very soft statement typically).
https://www.merriam-webster.com/dictionary/advise meaning 2: to give information or notice to : INFORM
An advisory gives notice and/or warns about something, and may give recommendations on possible actions (but doesn’t have to).
Words have multiple meanings depending on context, and here it's at best ambiguous. In the context of security incidents, logging, auditing, etc., "advisory" is often used as a severity level (and one of the lower ones at that).
So, yes, technically, it's de-facto advisory to publish this information, but assigning "advisory" as a severity tag here is questionable.
The LLM that wrote this nearly content-free story doesn't know what it's talking about.
The basic distinction in the infosec industry is that advisories are what you publish to tell customers that you had a bug in your product that might have exposed them or their data to attacks and you want them to take some specific action (e.g., upgrade a package, review logs); while an incident report is what you publish when you know that the damage happened, it involved your infrastructure, and you want to share some details about happened and how you're going to prevent it from happening again.
Because the latter invites a lot more public attention and regulatory scrutiny, a company like Microsoft will go out of their way to stick to advisories whenever possible (or just keep incidents under wraps). It might have happened at some points in their history, but off the top of my head, I don't recall Microsoft ever publishing a first-party security incident report.
If you inflate severity, people simply ignore incident warnings.
What's the actual action needed here by a security team? None. You can hate it or not care but the end of the day there's no remediation or imminent harm, just a potential issue with DLP policies. Don't make it look like a 0-day that they actually have to deal with.
Reads to me like it is not accessing other users mailboxes, its just accessing the current user's mailbox (like its meant to) but its supposed to ignore current user's emails that have a 'confidential' flag and that bit had a bug
I think the issue is that the confidential information is being sent to cloud AI, against DLP policies.
I think that Microsoft would rather not acknowledge that one. It's much easier to hide behind a simple "bug" than to admit to such a massive security breach.
Not a bug, a “code issue”.
I.e. LLM slop code that wasn't adequately tested.
It's a feature now.
Exactly.
The article doesn't say if the confidentiality labels were created with encryption. I've been using the latter (with Preview DLP) to prevent emails leaking out to _external_ integrations, which can't access the keys. With MS internal tooling, it's feasible that it access to the key, in which case that would be even worse. Does anyone know if this happened?
All these government contractors are forced to pay astronomical cloud bills to get "GCC-High" because it passes the right security-theater checklist, and then it totally ignores the DLP settings anyway!
Is this a real bug or is it a "lets train on more emails" by being careless?
I assume that whatever that is processed by AI service are generally retained for product improvements (training).
This company is an absolute joke now, if you're not desperately trying to jump ship at this point then you will go down with it.
This is one of many reasons we are taking all our current and future private repos off of GitHub.
A bug here and a bug there...
I more and more see a bug in my mouth that tries to encourage my boss to cancel Microsoft 365. I did not find the root cause yet
"...including messages that carry confidentiality labels."
Trusted operating system Mandatory Access Control where art thou?
calling it a bug is generous. the whole point of these tools is to read everything you have access to. the 'bug' is that it worked exactly as designed but on the wrong emails
An exemplar BaaF corporation (Bug as a Feature).
I wonder, is Microsoft doing “outsider trading”, where they covertly pipe analytical data to the executives’ independently-owned stock trading houses as “tips”? They’ve had access to so many corporate internal emails for so long, with MS365, but Copilot is the perfect way to mask such analysis. Also Copilot would be good at analysing emails and providing useful “tips”.
Just my whacky conspiracy theory of the day!
microsoft may very well be the MOST sinking ship to ever sink.
Initial date of issue 3rd Feb 2026
Microsoft deploying buggy software is hardly news.
Why was this bug not found in testing?
AI is such garbage. There is considerable overlap between the security practices of AI and that of the slowest interns in the office.
None of this should surprise anyone by now. You are being lied to, continually.
You guys need to read the actual manifestos these AI leaders have written. And if not them, then read the propagandist stories they have others write like The Overstory by Richard Powers which is an arrogant pile of trash that culminates in the moral:
humans are horrible and obsolete and all should die and leave the earth for our new AI child
Which is of course, horseshit. They just want most people to die off, not all. And certainly not themselves.
They don't care about your confidential information, or anything else about you.
Everyone should go back and watch The Matrix again
This is one of the things that boggles my mind the most.
I guess everyone just ended up agreeing with Cypher, after all...
I'm shocked. Shocked!
Oh, poor desperate Microsoft. No amount of bug fixing is going to fix Microsoft. Now that they've embarked on the LLM journey they're not going to know what's going to hit them next.