I had to block meta's ASN on my personal cgit server a few weeks ago because they were ignoring robots.txt and torching it. Like hundreds of megabytes of access logs just from them, spread around different network blocks to clearly try and defeat IP based limiting. I couldn't believe it.
Yeah, I dont know how anybody stays sane without it. I have a list of over a thousand ASNs I blackhole at this point...
Mine is a daily bash cronjob that fetches a text-based database and uses grep to build an nftables-apply script with all the IPs for the blocked ASNs. I keep meaning to share it, but it's embarrassingly messy I haven't had time to clean it up...
It's a real pain in the ass because in the absence of ASN based blocking, you often have to give something a long list of IP ranges in CIDR notation, and be certain you don't "miss" even one ipv4 /23 or /24 or a crawler will get through.
A lot of people would be very pleased if this leads to Zuckerberg getting even the statutory minimum damages ($750?) on each infringement.
The previous infringement case with Anthropic said that while training an AI was transformative and not itself an infringement, pirating works for that purpose still was definitely infringement all by itself. The settlement was $1.5bn, so close to $3k for each of the 500k they pirated, so if Zuckerberg pirated "millions" (plural) it is quite plausible his settlement could be $6bn.
A company being "worth" some amount doesn't mean it has that much money and real property; it means there exist people willing to buy shares, on the margin, at a price which works out like that. One of the common (very rough) approximations is that a business is worth as much as the profit it's expected to make over the next 20 years. But one of the reasons (there are many) that this is only a rough guide, is that if you tried to sell too much of a big company all in one go, it usually depresses the price a lot, and the other way around (trying to buy a whole company) tends to raise the price a lot; both effects are because most people have different ideas about how much any given company is really worth despite that rough guide, and trade their shares at different prices while you're doing it. You may note this is a circular argument, this is indeed part of the problem.
IIRC, Facebook's cash is more like $81-82 billion.
At the same time, isn't Zuck's worth based on his shares of evilCorp while evilCorp's shares are what you just said. Ergo, the Zuck isn't worth all that either???
Yup. All the headlines following the pattern "${billionaire} {gains|loses} ${x} billion this week" are mostly just fluff, the marginal share price of any given stock wanders all over the place even without forced sales or people trying to buy them out.
There's some interesting exceptions, like how Musk has managed to sell Tesla shares totalling more or less as much as the business itself has made in total lifetime revenue; but even then, Musk's theoretical net worth is very different from how much he could get if he was forced to sell all his shares suddenly.
Owner-CEOs like Musk and Zuckerberg get all the effects of such randomness, but the only examples I can think of such people getting into billion-dollar legal troubles tend to be examples which go on to sink their companies completely, so I'm not sure what impact a fine of "merely" 10% of cash reserves would do to investor confidence as expressed in share price. And this is not the only legal case Meta's facing right now.
MacKenzie Scott (Jeff Bezos' ex wife) show it can be turned into real money. As of December 2025 She had given away $7.1 billion in 2025 charitable donations, and $26.3 billion since 2019.
In reality there is the ability to execute on the shares to turn them into real money.
Jeff Bezos holds less than 10% of Amazon stock himself. Which is a huge amount of money, and a not insignificant amount of which can be turned into "real" money and even with some decline is still a phenomenal amount.
In that same time period the stock valuation has more than doubled.
That's why billionaires use shares as collateral to get loans. It's money once removed, and it continues to be spendable so long as the share price stays high.
I sincerely doubt that Meta's share price would crash as a result of Zuckerberg getting an expensive judgement.
Plus, the money he borrows is not taxable. If he sold stock he would have to pay taxes before he could spend the income. Sure, he now owes money to someone, but he can refinance those loans again and again, and live tax-free the rest of his life while we, poor working stiffs, pay the taxes that built the airport where he parks the private jet he bought with the money he borrowed.
People seem to get the weird idea that borrowing against their stock holdings is some special thing rich people get to do with products that the rest of us don't have access to. It's not. Margin loans are widely available to the tune of ff+1%ish or lower, and if your brokerage's publicly offered rates are probably a ripoff, they're almost certainly negotiable. The bar for access to "institutional" rates is basically 100k, the regulatory requirement for portfolio margin.
Yes, there are specialized products catered to billionaires. But those aren't getting them better rates than someone with a $200k portfolio (Zuck is not conventionally a less risky borrower than the Options Clearing Corporation!). They exist to work around the fact that some borrowers can't just casually liquidate their stock on the open market, let alone at face value. By all accounts these products are more expensive than retail.
Mostly this is an expensive (but maybe still less expensive than taxes, depending on the rate environment—it's more of a no-brainer in ZIRPland) way to diversify out of a single-stock portfolio without selling by adding leverage. At Zuck's age, it's still very unlikely to make sense to borrow instead of sell to spend. He's been known to pay real taxes in the past, they just look small relative to his imputed wealth growth because rich people don't spend a lot relative to their wealth growth because they, quite by definition, have a lot of wealth.
I think people take issue with the taxes loophole. They have GAINED from the VALUE of their stocks, but they don't pay taxes on that. It should be law if you realize value from stocks you pay capital gains on those stocks. So if a loan is collateralized by $1,000,000 worth of stock value taxes should be paid on $1,000,000.
I've wondered what the legalese justification for letting liability evaporate as it does so often with corps. So far the reasons I'm left with are 'shrugs' and 'the relevant provision (seemingly? apparently?) simply don't apply', neither of which are any good.
I was going to make a joke about how we should attach magnets to Aaron Swartz' corpse, since that'd make for a pretty potent energy source, given how fast he must be spinning. But honestly, I think he would have seen this sort of thing coming, given how his case was handled and how things really haven't gotten any better.
Alternate reality Aaron Swartz escaped canonization and is now running an AI/crypto startup that pays you to upload training data with his YC alum buddies
I should hope that if Zuckerberg isn't severely punished for this, it at least sets a legal precedent for every other person to do the same with immunity.
All the Aaron Schwartzes of the future could freely share scientific papers with the world.
I'd love for that to come out during discovery when the lawsuit hits, but it probably never will. Blowing the whistle is also not a great option in this economy, although I wish more people did.
When the AI scrapers were just getting started, that is basically what I thought - their plan was to scrape / suck up everything they possibly could before people realized what was happening and blocked them.
The rate at which they were spidering and scraping was so far beyond what any other supposedly legit spider was doing, it seemed like the logical explanation.
Just gonna say... Aaron Swartz faced years of prison time and ultimately decided to take his own life... for downloading scientific journal articles... to share freely with the world (aka not even profiting from it).
But a multi-billion dollar corporation downloading millions of copyrighted creative works so that they can reshape the entire labor market by training a new type of artificial intelligence model on that data set? Meh, sounds like Silicon Valley disruption, give the man a medal!
Aaron Swartz was treated unjustly because copyright sucks. we should oppose such laws and treatment, not wield them as retributive tools against our opponents
it is wrong to advocate for everyone to be treated equally unjustly. better to advocate for the removal of the bad laws/structures
One man illegally downloading copyrighted material is a crime. Multinational corporations illegally downloading copyrighted material is the only remaining growth area in the US economy and vital to national security.
Tired of the double standard that CEOs get away when bad things happen (because they can’t be everywhere all the time) but all the benefits when the company makes a great profit (because they’re personally driving results!).
> a Meta spokesperson said, “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively.”
> Authors have sued AI companies for copyright infringement before - and lost.
they'll litigate how meta acquired those materials to train. you can do whatever you want with a book after it's in your house. but how did it get there?
They’re already on record as hoovering up Library Genesis and Anna’s Archive. For their “fair use” copyright bonfire to train their LLM.
So not are these publishers rightfully pissed, Meta didn’t even give them the $6.99 for each epub to begin with. They’ve stolen the whole thing as part of this “fair use” campaign to destroy human authorship free of even the most basic remuneration.
Until Sony, Nintendo, Disney... sues them and Zuck craps down his pants. And the NSA themselves, too; because for sure they are half-backed from them. If they keep pirating down Japanese and European media, these can just wipe their asses with USA licenses and declare all media from the US un-Copyrighteable Europe and Japan.
How are these fruits "stolen" if they still have what was allegedley stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"
And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.
I think you are confusing the idiom "stolen fruits" with an actual accusation of criminal theft. Aside from its use in this phrasing, neither "theft" nor "steal" appears anywhere else in the article.
>But the latest lawsuit alleges that Meta and Zuckerberg deliberately circumvented copyright-protection mechanisms — and had considered paying to license the works before abandoning that strategy at “Zuckerberg’s personal instruction.” The suit essentially argues that the conduct described falls outside protections afforded by fair-use provisions of the U.S. copyright code.
I don't have strong opinions on Zuck needing to be punished for this, because I have friends and family doing the same thing, although perhaps not at the same scale. I myself do not download copyrighted content. I think "rules for thee, not for me" goes both ways.
Who will be the first to implement a one-layer three-weight model and add it to BitTorrent? Let it “train” on all downloaded files. That makes it fair use. Am I doing this right?
Shouldn't this stuff trigger RICO? Why do torrent site operators get led off in cuffs for running operations that usually lose money, but Zuck doesn't?
RICO specifically cites "criminal infringement of a copyright" as laid out in 18 U.S. Code § 2319. If the CEO tells his employees to download hundreds of thousands of works illegally in order to carry out his money-making scheme, how is that not organized crime even if (dubiously) LLM training on the material is fair use?
> As used in this chapter — (1) “racketeering activity” means (A)[...]; (B) any act which is indictable under any of the following provisions of title 18, United States Code: [...], section 2319 (relating to criminal infringement of a copyright),[...]
> (c) It shall be unlawful for any person employed by or associated with any enterprise engaged in, or the activities of which affect, interstate or foreign commerce, to conduct or participate, directly or indirectly, in the conduct of such enterprise’s affairs through a pattern of racketeering activity[...].
“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”
> Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.
I think this is an easy distinction to make: copyright is bullshit and knowledge should be free. I have no problem with pirates sharing information freely. I do have a problem with a company taking someone else's work and profiting from it. The only thing worse than copyright as it exists is copyright that can be selectively ignored when the powerful will it. Attempt to use copyright to promote Free software with the GPL? Ha, nope, copyright for me and not for thee; I'll train on your code and sell it back to you. You want to preserve access to a game or film that's unavailable or unplayable? Time to send the C&D and destroy you. Only bad things are possible.
Until we progress as a society to the point that we can put this system behind us we should at least fight to make enforcement uniform. In fact, uniform enforcement is probably a good starting point for arguing for abolition, as the pain of that enforcement is felt by proles and elites alike.
People who don't believe in copyright shouldn't be punished for "breaking" it.
Corporations believe in copyright so if they "break" it they should get punished for breaking rules they made up themselves.
Generally the law should be more strict for corporations than for real people.
edit: People downvoting can you argue why you disagree? I do think it's fair for the law to be more strict on the powerful rather than on the powerless.
I'm gonna have to go dig up the link, but isn't there a guy that Nintendo basically has on indentured servitude for the rest of his life?
Ah, found it:
>In April 2023, a 54-year-old programmer named Gary Bowser was released from prison having served 14 months of a 40-month sentence. Good behaviour reduced time behind bars, but now his options are limited. For a while he was crashing on a friend’s couch in Toronto. The weekly physical therapy sessions, which he needs to ease chronic pain, were costing hundreds of dollars every week, and he didn’t have a job. And soon, he would need to start sending cheques to Nintendo. Bowser owes the makers of Super Mario $14.5m (£11.5m), and he’s probably going to spend the rest of his life paying it back.
I'm not even a tiny bit supportive, but there is precedent.
American executives have been pushing to criminalise copyright infringement for decades, and America has worked hard to pressure countries all round the world to do this as part of trade deals. There is, for example, a Brit serving an eleven year sentence right now *.
"American executives have been pushing to criminalise copyright infringement...Why should Zuckerberg be exempt?" Implicit relevence in the comment to which I'm replying.
Zuckerberg saying anything about copyright infringement is irrelevant to the actions Meta has taken in consuming and promoting the practice, and he should face criminal liability.
The non-strawman way to interpret the parent comment is that they want them to be treated the same as normal copyright violators. Jail is a common result of (criminal) copyright prosecution, with 44% of convicted offenders being imprisoned, averaging 25 months [0].
Now, I personally find the idea of imprisoning people for copyright offenses horrific, but I don't think it's remotely insane that someone else might come to that conclusion, given that we broadly accept it as a society.
From [0]: "In fiscal year 2017, there were 80 copyright/trademark infringement offenders who accounted for 0.1% of all offenders sentenced under the guidelines." This is such a low number that I assume most prosecuted cases are settled without ever making it to sentencing, or alternatively copyright infringement is just hardly ever prosecuted criminally at all.
I don't understand how the fact that 80 people were prosecuted for copyright violation in one year is an argument that one person shouldn't be prosecuted for copyright violation.
For better or for worse, the idea behind incorporation is that you, as an owner of part or all of the company, are separated from it financially and legally in most circumstances.
Zuckerberg may be CEO, majority shareholder, and on the board of Meta, but he didn't break copyright law, Meta did. So if there were to be a consequence, Meta would pay out the fine. Not sure how you jail a company.
Now, in a company with a real corporate governance structure, the board would look at the loss incurred by said fine, look at Zuckerberg, and immediately fire him for causing the loss. However, like I said before, Zuck's in charge of Meta, so that's not going to happen, and the fine is unlikely to be enough to drastically impact the company's profitability enough to sink his shares, which are the main repository of his wealth. So if he thinks he can make himself richer violating copyright law in the future, he will likely direct Meta to do so.
TL;DR, in the famous words of Bender from Futurama, "Hooray, the system fails again!"
> Zuckerberg may be CEO, majority shareholder, and on the board of Meta, but he didn't break copyright law, Meta did.
I'm still stuck on how Z telling Meta (or the relevant people at Meta, whatever) to go out there and do illegal shit doesn't make a court say that he's functionally done said illegal shit, or at least encouraged the company to do, and that he should thus be liable for that. It's not like there's much plausible deniability here. It'd be one thing if the lower ranks thought it'd be fine and did it of their own accord. It's quite another for Z to tell people to go nuts doing illegal shit.
The DMCA makes facilitation of copyright infringement illegal. Telling people to do copyright infringement is surely facilitation of copyright infringement. Surely then, Z having broken the DMCA is a fairly open and shut case, modulo calculating the damages. But apparently not?
I’ve sometimes pondered this about the legal personhood of a company - it has most of the rights as a human being but can’t suffer any of the major consequences, such as jail.
It could be possible to construct a legalistic jail for a company whereby if it has committed the type of crime that a human could be jailed for, then it could be frozen for the duration, say ten years, and all its assets, shareholder funds, contracts, everything were frozen and impounded.
Of course this seems completely ludicrous because it’s so “out there” but it’s worth having the thought experiment. Things like “corporate manslaughter” really have few consequences for the corporation itself - if it was actually jailed for twenty years and shareholders and officers left frozen out and on pause, then it might be the kind of punishment that really counted for something.
There aren't enough things an executive can go to jail for.
Fines don't do anything to deter bad behavior. Either:
* The company pays
* They pay and the company mysteriously increases next year's comp / grants a "loan" / etc
* D&O insurer pays
In all three cases the money comes out of the shareholders' hides. It provides zero personal deterrence. The payoff matrix, as seen by a sociopath, makes it rational to always defect against the common good.
The only punishment that can really focus attention is physical imprisonment in a facility they can't choose.
SOX did this for financial reporting and gee shucks it turned out executives can follow the law after all!
> I'm all for strong justice, but you want to imprison an executive for decades for copyright violations?
They stole the life's work of millions of people.
In less civilized times, they likely would have been drawn and quartered by strong horses, and had their limbs drug to the 4 corners of the continent as a warning to anyone else that would consider doing it again.
The human savant will remember where they read it and give you credit. It might lead more people to read your work, and ultimately you make money.
The AI won't even know where the page of text it's seeing came from, and people will avoid your book as they can just ask the AI. So you make less money. (Talking about specialized technical books here.)
There's a huge difference in scale. The human mind can only process a limited portion of all works available over a lifetime. Human learning is therefore naturally limited to small-scale reuse, which serves to keep it proportional.
A machine training on all copyrighted materials in the world for commercial purposes at an industrial scale makes it disproportionate.
It would hardly make a dent. And if you hired hundreds of savants, the knowledge would still be spread over hundreds of separate minds.
And even if we grant that those savants are also very skilled at creating "market substitutes" based on their training that are capable of competing with the original works, their maximum creative output would only be a relatively small number of new works, because they can only work at human speed.
This goes back to the original purpose of copyright, which is to serve as an economic incentive for individual creators and artists to make more art, by securing exclusive rights to use their own works commercially for a specified time. The goal is both the creation of more works, but also to protect the economic viability of artists.
This principle is quite universal and can be found in many places, including the US constitution and US (supreme) court decisions, many international jurisdictions, treaties and conventions.
I don't understand why it should be allowed for one savant to study and answer questions about one book, but wrong for a company to hire one million savants to answer questions about one million books.
And I'm asking where in the law or case law this is supported.
No one is asking human savants about what they read 1 million times per day.
Suppose they did, and some guy was filling stadiums regularly to hear him recite an entire audio book. That would probably get the attention of someone's lawyers.
I don't think anyone is arguing that the consumption is illegal. It's the reproduction that is illegal.
Read a book, that's fine. Write a book, that's fine. Read a book and then write a book that is 99.9% the same as the book that you read and sell it for profit without a license from the original author, that's infringement.
No, if you read the article, the point is in the training, not the reproduction.
That's what all these lawsuits are about - it's the training not the reproduction. I already agreed in my first comment that the reproduction is off limits.
In this case, it appears that Meta torrented illegal copies of the work to do the training. Obviously that's bad. But conflating that with training itself doesn't follow.
The point of these lawsuits is the piracy. My parent comment was about the general situation, not this specific article.
Pirating content is illegal, regardless of if it is to train an LLM.
Usage of LLMs trained on unlicensed content (basically all of them) might or might not be illegal.
Using any method to reproduce a copyrighted work by using that original as input in a way that supplants the market value of the original is probably illegal.
Well - maybe so. But the common belief is that training itself is a violation of copyright, no matter how it's done. That's the argument I'm countering here.
The issue is that the trainers have not sought licenses for the data and instead outright pirated it.
I don't think anyone thinks that all training is a copyright violation if all the training data is licensed. For example a LLM trained on CC0 content would be fine with basically everyone.
The problem is that training happens on data that is not licensed for that use. Some of that data also is pirated which makes it even clearer that it is illegal.
But why should separate licensing be required at all? A search engine reads and indexes every word of every page it crawls. No one argues that requires licensing, only that the outputs must respect copyright. Why should training be different?
Sharing copyrighted material is illegal. Presumably, if Meta blocked all seeding on the torrents they downloaded, they wouldn't have broken copyright, right?
If copyright law doesn't extend to the works being used for training, why should it extend to the model that is produced as a result? AI model creators have set up an ethical scenario where the right thing to do is ignore copyright laws when it comes to AI, which includes model use. It might never be legal, but it has become ethical to pirate models, distill them against ToS, etc.
>The problem is producing the copyrighted work, not processing it beforehand.
the distinction isn't particularly clear cut with an open source model. If it is able to reproduce copyright protected work with high fidelity such that the works produced would be derivative, that's like trying to get around laws against distribution of protected works by handing them to you in a zip file.
It's a kind of copyright washing to hand you the data as a binary blob and an algorithm to extract them out of it. That wouldn't really fly with any other technology.
And that's really where a lot of the value is mind you, these models are best thought of as lossily compressed versions of their input data. Otherwise Facebook ought to be perfectly fine to train them on public domain data.
I tend to agree - but you assume that it would not be possible to create a model that can train on copyrighted work and only output text which would be considered fair use.
That seems very possible to me, and undermines the "training is copyright violation" argument. It's not the training, it's the output.
So is it a problem when humans produce and monetize competing works? My understanding is that there quite an industry in humans reading books and synthesizing their points. Cliff's Notes, for example.
I did some quick googling and most of cliffs notes guides are on public domain works so no problem there, they've also paid to license content, and also have been protected by fair use as parody
There's nothing in the law to support your argument either. The law however does say, very unambiguously, that copying without permission isn't allowed . There aren't exceptions for "training" just because it's superficially similar to a human activity (reading a book). A human isn't allowed to hand-copy Harry Potter. Even if they bought all the Harry Potter books.
We're not talking about rights, we're talking about illegal acts. If it's illegal for a machine to do it, how can it be ok for a human?
Just from a rational argumentation point of view. Clearly if a law is written saying as much, then sure. But there is no such copyright law like that yet.
My apologies - I'm speaking loosely of course. Translate all my claims about machines breaking the law into claims about humans using machine breaking the law.
Sorry, I wasn't trying to be pedantic. I was trying to make the point (which I think is in line with your point) that the fact that AI is involved here doesn't make a difference. It is a tool, but the people using the tool are (as always) responsible for the outcome.
The issue is certainly not so simple. But it seems to me, purely theoretically, that the rules don't necessarily have to be the same for living people and non-living machines.
Well - actually - it is pretty simple. For something to be illegal, there must be a law saying it's illegal. There are no laws distinguishing humans from machines in copyright law.
The problem is people at large companies creating these AI models, wanting the freedom to copy artists’ works when using it, but these large companies also want to keep copyright protection intact, for their regular business activities. They want to eat the cake and have it too. And they are arguing for essentially eliminating copyright for their specific purpose and convenience, when copyright has virtually never been loosened for the public’s convenience, even when the exceptions the public asks for are often minor and laudable. If these companies were to argue that copyright should be eliminated because of this new technology, I might not object. But now that they come and ask… no, they pretend to already have, a copyright exception for their specific use, I will happily turn around and use their own copyright maximalist arguments against them.
>wanting the freedom to copy artists’ works when using it
Learning from copyrighted content is legal - for both humans and AI. If Meta is in hot water for anything, it's piracy and/or storage of copyrighted material.
I think it's more that the little guy gets the book thrown at them while the rich bitch gets a slap on the wrist. This is widespread, and is BAD regardless of your personal opinion on copyright.
I take issue with the use of tense used in this framing. Its not 'infringed' its 'infringing' and to say that it happened is wrong, its happening and happening continuously in these models that are in use. To say a one time payment settles it is missing the whole scope of this theft.
Royalties are owed and continuously owed as these models are deployed and doing inference. How is it any different to paying a small pittance to someone every time a song is played?
Royalties for inference are unrealistic in a way that even royalties for training aren't.
The LLaMA models were released openly. Copies exist everywhere in the world. You aren't going to be able to charge someone for running `llama.cpp`; a court order ceases to have practical relevance at that point.
First, LLMs do not reliably cite works. They are not looking things up in a database and repeating them. I think this false idea occurs a lot in people who don't understand what LLMs are or how they work.
Second, royalties are not required to cite a source.
Can you imagine how disastrous it would be to everything from news reporting to scientific publishing if that was the case?
Yeah well then I want my robot running this crap locally in its brain so I can get it to farm my two acres and haul water for me and I'll unplug from the rest of this nonsense going forward lol.
... LLMs cannot reliably provide citations. If you ask for citations, and the model did not use a web search tool, then whatever "citations" you receive are unreliable. Please do not trust these models to be honest. Just because they can discuss a topic doesn't mean they "know" where the knowledge came from in the same way that you don't need to have studied physics to catch a ball.
Even better, what if you transform that stolen CD into an MP3, so the data isn’t the same as a lossy process was used, then share the MP3 with the world as your own work?
I don’t get why the training process doesn’t count as any other form of transformation but then I’m not a lawyer.
I had to block meta's ASN on my personal cgit server a few weeks ago because they were ignoring robots.txt and torching it. Like hundreds of megabytes of access logs just from them, spread around different network blocks to clearly try and defeat IP based limiting. I couldn't believe it.
IMO ASN-based blocking should be much more common, but unfortunately it is not supported as a first-class configuration option in many common tools.
Yeah, I dont know how anybody stays sane without it. I have a list of over a thousand ASNs I blackhole at this point...
Mine is a daily bash cronjob that fetches a text-based database and uses grep to build an nftables-apply script with all the IPs for the blocked ASNs. I keep meaning to share it, but it's embarrassingly messy I haven't had time to clean it up...
It's a real pain in the ass because in the absence of ASN based blocking, you often have to give something a long list of IP ranges in CIDR notation, and be certain you don't "miss" even one ipv4 /23 or /24 or a crawler will get through.
A lot of people would be very pleased if this leads to Zuckerberg getting even the statutory minimum damages ($750?) on each infringement.
The previous infringement case with Anthropic said that while training an AI was transformative and not itself an infringement, pirating works for that purpose still was definitely infringement all by itself. The settlement was $1.5bn, so close to $3k for each of the 500k they pirated, so if Zuckerberg pirated "millions" (plural) it is quite plausible his settlement could be $6bn.
Nothing will happen to him/Meta while DJT is president.
He bought the best protection around for breaking the law.
When you're a big Trump donor they let you do it.
Grab them by the Epstein Files
There will be not a single consequence for any of this.
For context, his net worth is ~$220 billion.
And meta's worth is much more than that. He's not personally paying.
A company being "worth" some amount doesn't mean it has that much money and real property; it means there exist people willing to buy shares, on the margin, at a price which works out like that. One of the common (very rough) approximations is that a business is worth as much as the profit it's expected to make over the next 20 years. But one of the reasons (there are many) that this is only a rough guide, is that if you tried to sell too much of a big company all in one go, it usually depresses the price a lot, and the other way around (trying to buy a whole company) tends to raise the price a lot; both effects are because most people have different ideas about how much any given company is really worth despite that rough guide, and trade their shares at different prices while you're doing it. You may note this is a circular argument, this is indeed part of the problem.
IIRC, Facebook's cash is more like $81-82 billion.
At the same time, isn't Zuck's worth based on his shares of evilCorp while evilCorp's shares are what you just said. Ergo, the Zuck isn't worth all that either???
Yup. All the headlines following the pattern "${billionaire} {gains|loses} ${x} billion this week" are mostly just fluff, the marginal share price of any given stock wanders all over the place even without forced sales or people trying to buy them out.
There's some interesting exceptions, like how Musk has managed to sell Tesla shares totalling more or less as much as the business itself has made in total lifetime revenue; but even then, Musk's theoretical net worth is very different from how much he could get if he was forced to sell all his shares suddenly.
Owner-CEOs like Musk and Zuckerberg get all the effects of such randomness, but the only examples I can think of such people getting into billion-dollar legal troubles tend to be examples which go on to sink their companies completely, so I'm not sure what impact a fine of "merely" 10% of cash reserves would do to investor confidence as expressed in share price. And this is not the only legal case Meta's facing right now.
It doesn't seem to be mostly just fluff to me.
MacKenzie Scott (Jeff Bezos' ex wife) show it can be turned into real money. As of December 2025 She had given away $7.1 billion in 2025 charitable donations, and $26.3 billion since 2019.
In reality there is the ability to execute on the shares to turn them into real money.
Jeff Bezos holds less than 10% of Amazon stock himself. Which is a huge amount of money, and a not insignificant amount of which can be turned into "real" money and even with some decline is still a phenomenal amount.
In that same time period the stock valuation has more than doubled.
That's why billionaires use shares as collateral to get loans. It's money once removed, and it continues to be spendable so long as the share price stays high.
I sincerely doubt that Meta's share price would crash as a result of Zuckerberg getting an expensive judgement.
Zuck can just take out loans against his equity. He doesn’t need to sell any of it to benefit from Metas “worth”
Plus, the money he borrows is not taxable. If he sold stock he would have to pay taxes before he could spend the income. Sure, he now owes money to someone, but he can refinance those loans again and again, and live tax-free the rest of his life while we, poor working stiffs, pay the taxes that built the airport where he parks the private jet he bought with the money he borrowed.
People seem to get the weird idea that borrowing against their stock holdings is some special thing rich people get to do with products that the rest of us don't have access to. It's not. Margin loans are widely available to the tune of ff+1%ish or lower, and if your brokerage's publicly offered rates are probably a ripoff, they're almost certainly negotiable. The bar for access to "institutional" rates is basically 100k, the regulatory requirement for portfolio margin.
Yes, there are specialized products catered to billionaires. But those aren't getting them better rates than someone with a $200k portfolio (Zuck is not conventionally a less risky borrower than the Options Clearing Corporation!). They exist to work around the fact that some borrowers can't just casually liquidate their stock on the open market, let alone at face value. By all accounts these products are more expensive than retail.
Mostly this is an expensive (but maybe still less expensive than taxes, depending on the rate environment—it's more of a no-brainer in ZIRPland) way to diversify out of a single-stock portfolio without selling by adding leverage. At Zuck's age, it's still very unlikely to make sense to borrow instead of sell to spend. He's been known to pay real taxes in the past, they just look small relative to his imputed wealth growth because rich people don't spend a lot relative to their wealth growth because they, quite by definition, have a lot of wealth.
I think people take issue with the taxes loophole. They have GAINED from the VALUE of their stocks, but they don't pay taxes on that. It should be law if you realize value from stocks you pay capital gains on those stocks. So if a loan is collateralized by $1,000,000 worth of stock value taxes should be paid on $1,000,000.
Looking forward to the personal liability.
I've wondered what the legalese justification for letting liability evaporate as it does so often with corps. So far the reasons I'm left with are 'shrugs' and 'the relevant provision (seemingly? apparently?) simply don't apply', neither of which are any good.
I was going to make a joke about how we should attach magnets to Aaron Swartz' corpse, since that'd make for a pretty potent energy source, given how fast he must be spinning. But honestly, I think he would have seen this sort of thing coming, given how his case was handled and how things really haven't gotten any better.
Alternate reality Aaron Swartz escaped canonization and is now running an AI/crypto startup that pays you to upload training data with his YC alum buddies
Every now and then, I feel like we live in the worst possible world. Then I realise it could be much worse.
This does not comfort me.
I should hope that if Zuckerberg isn't severely punished for this, it at least sets a legal precedent for every other person to do the same with immunity.
All the Aaron Schwartzes of the future could freely share scientific papers with the world.
"You can be unethical and still be legal that’s the way i live my life"
- Mark Zuckerberg
I know personally a case of a engineer who was told to do something despite all the legal problems because the company had lawyers for a reason
I'd love for that to come out during discovery when the lawsuit hits, but it probably never will. Blowing the whistle is also not a great option in this economy, although I wish more people did.
So... "move fast and steal things"?
When the AI scrapers were just getting started, that is basically what I thought - their plan was to scrape / suck up everything they possibly could before people realized what was happening and blocked them.
The rate at which they were spidering and scraping was so far beyond what any other supposedly legit spider was doing, it seemed like the logical explanation.
It started at the top and at the beginning.
The biggest theft from the working class that has ever happened.
Always Has Been
Just gonna say... Aaron Swartz faced years of prison time and ultimately decided to take his own life... for downloading scientific journal articles... to share freely with the world (aka not even profiting from it).
But a multi-billion dollar corporation downloading millions of copyrighted creative works so that they can reshape the entire labor market by training a new type of artificial intelligence model on that data set? Meh, sounds like Silicon Valley disruption, give the man a medal!
And Jstor dropped the lawsuit when Aaron deleted his local copy. DOJ didn't drop theirs.
I doubt Meta has deleted their local copy though ...
And also I think MIT didn't defend Aaron but maybe I'm wrong about that
Aaron Swartz was treated unjustly because copyright sucks. we should oppose such laws and treatment, not wield them as retributive tools against our opponents
it is wrong to advocate for everyone to be treated equally unjustly. better to advocate for the removal of the bad laws/structures
One man illegally downloading copyrighted material is a crime. Multinational corporations illegally downloading copyrighted material is the only remaining growth area in the US economy and vital to national security.
Well, Meta also shared their AI models freely with world
Truly ahead of his time
Had Aaron copied Snapchat 5 times the DOJ would've been fine with it all. His fault for not having the foresight
(I'm being sarcastic. Zuck gets rewarded for continually copying Snapchat features into his products)
Waiting for the perp walk.
Tired of the double standard that CEOs get away when bad things happen (because they can’t be everywhere all the time) but all the benefits when the company makes a great profit (because they’re personally driving results!).
> a Meta spokesperson said, “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively.”
> Authors have sued AI companies for copyright infringement before - and lost.
So, basically nothing will come out of this
they'll litigate how meta acquired those materials to train. you can do whatever you want with a book after it's in your house. but how did it get there?
They’re already on record as hoovering up Library Genesis and Anna’s Archive. For their “fair use” copyright bonfire to train their LLM.
So not are these publishers rightfully pissed, Meta didn’t even give them the $6.99 for each epub to begin with. They’ve stolen the whole thing as part of this “fair use” campaign to destroy human authorship free of even the most basic remuneration.
Until Sony, Nintendo, Disney... sues them and Zuck craps down his pants. And the NSA themselves, too; because for sure they are half-backed from them. If they keep pirating down Japanese and European media, these can just wipe their asses with USA licenses and declare all media from the US un-Copyrighteable Europe and Japan.
"They then copied those stolen fruits"
How are these fruits "stolen" if they still have what was allegedley stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"
And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.
I think you are confusing the idiom "stolen fruits" with an actual accusation of criminal theft. Aside from its use in this phrasing, neither "theft" nor "steal" appears anywhere else in the article.
The article, references the complaint. And even then, why use it at all?
Rules for thee but not for me.
Except, as the article says.... it's not copyright infringement. Whether it should be or not is another issue.
>But the latest lawsuit alleges that Meta and Zuckerberg deliberately circumvented copyright-protection mechanisms — and had considered paying to license the works before abandoning that strategy at “Zuckerberg’s personal instruction.” The suit essentially argues that the conduct described falls outside protections afforded by fair-use provisions of the U.S. copyright code.
I don't have strong opinions on Zuck needing to be punished for this, because I have friends and family doing the same thing, although perhaps not at the same scale. I myself do not download copyrighted content. I think "rules for thee, not for me" goes both ways.
How much revenue have your friends and family made from "doing the same thing"?
Some. In some cases they've "stolen" tens of thousands in content. Like I said, not at the same scale, but the same "crime" nonetheless.
I'd much rather prosecution focus on Zuck's more serious crimes against privacy and civilization as a whole. But maybe this is a small start?
> Some. In some cases they've "stolen" tens of thousands in content.
That's not revenue.
Who will be the first to implement a one-layer three-weight model and add it to BitTorrent? Let it “train” on all downloaded files. That makes it fair use. Am I doing this right?
Shouldn't this stuff trigger RICO? Why do torrent site operators get led off in cuffs for running operations that usually lose money, but Zuck doesn't?
RICO specifically cites "criminal infringement of a copyright" as laid out in 18 U.S. Code § 2319. If the CEO tells his employees to download hundreds of thousands of works illegally in order to carry out his money-making scheme, how is that not organized crime even if (dubiously) LLM training on the material is fair use?
-----
RICO: https://www.law.cornell.edu/uscode/text/18/part-I/chapter-96
Definitions: https://www.law.cornell.edu/uscode/text/18/1961
> As used in this chapter — (1) “racketeering activity” means (A)[...]; (B) any act which is indictable under any of the following provisions of title 18, United States Code: [...], section 2319 (relating to criminal infringement of a copyright),[...]
18 U.S. Code § 2319 - Criminal infringement of a copyright: https://www.law.cornell.edu/uscode/text/18/2319
-----
edit:
> 18 U.S. Code § 1962 - Prohibited activities
> (c) It shall be unlawful for any person employed by or associated with any enterprise engaged in, or the activities of which affect, interstate or foreign commerce, to conduct or participate, directly or indirectly, in the conduct of such enterprise’s affairs through a pattern of racketeering activity[...].
https://www.law.cornell.edu/uscode/text/18/1962
From the lawsuit:
“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”
> Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.
WTF
The behavior will continue until a consequence is imposed.
I would rather Zuckerberg do 6 months in jail and probation than fine Meta.
You aren't going to be able to make me anti-piracy just because some corpo benefits from it too.
I think this is an easy distinction to make: copyright is bullshit and knowledge should be free. I have no problem with pirates sharing information freely. I do have a problem with a company taking someone else's work and profiting from it. The only thing worse than copyright as it exists is copyright that can be selectively ignored when the powerful will it. Attempt to use copyright to promote Free software with the GPL? Ha, nope, copyright for me and not for thee; I'll train on your code and sell it back to you. You want to preserve access to a game or film that's unavailable or unplayable? Time to send the C&D and destroy you. Only bad things are possible.
Until we progress as a society to the point that we can put this system behind us we should at least fight to make enforcement uniform. In fact, uniform enforcement is probably a good starting point for arguing for abolition, as the pain of that enforcement is felt by proles and elites alike.
People who don't believe in copyright shouldn't be punished for "breaking" it.
Corporations believe in copyright so if they "break" it they should get punished for breaking rules they made up themselves.
Generally the law should be more strict for corporations than for real people.
edit: People downvoting can you argue why you disagree? I do think it's fair for the law to be more strict on the powerful rather than on the powerless.
I agree, time to start handing out real punishments, I think 6 months is way to small.
If this was you or me, we would be in prison for decades and have a fine in the millions. Time for these people to feel consequences.
As someone said, they will probably settle for around 6 billion, that is the same as say a $100 fine for us.
This comment could get its own DSM classification for how insane it is.
I'm all for strong justice, but you want to imprison an executive for decades for copyright violations?
I'm gonna have to go dig up the link, but isn't there a guy that Nintendo basically has on indentured servitude for the rest of his life?
Ah, found it:
>In April 2023, a 54-year-old programmer named Gary Bowser was released from prison having served 14 months of a 40-month sentence. Good behaviour reduced time behind bars, but now his options are limited. For a while he was crashing on a friend’s couch in Toronto. The weekly physical therapy sessions, which he needs to ease chronic pain, were costing hundreds of dollars every week, and he didn’t have a job. And soon, he would need to start sending cheques to Nintendo. Bowser owes the makers of Super Mario $14.5m (£11.5m), and he’s probably going to spend the rest of his life paying it back.
I'm not even a tiny bit supportive, but there is precedent.
https://www.theguardian.com/games/2024/feb/01/the-man-who-ow...
American executives have been pushing to criminalise copyright infringement for decades, and America has worked hard to pressure countries all round the world to do this as part of trade deals. There is, for example, a Brit serving an eleven year sentence right now *.
Why should Zuckerberg be exempt?
* https://www.bbc.co.uk/news/uk-65697595
Facebook isn't one of the companies that's been pushing for that.
How is that relevant?
"American executives have been pushing to criminalise copyright infringement...Why should Zuckerberg be exempt?" Implicit relevence in the comment to which I'm replying.
I think we're misunderstanding one another.
Zuckerberg saying anything about copyright infringement is irrelevant to the actions Meta has taken in consuming and promoting the practice, and he should face criminal liability.
The non-strawman way to interpret the parent comment is that they want them to be treated the same as normal copyright violators. Jail is a common result of (criminal) copyright prosecution, with 44% of convicted offenders being imprisoned, averaging 25 months [0].
Now, I personally find the idea of imprisoning people for copyright offenses horrific, but I don't think it's remotely insane that someone else might come to that conclusion, given that we broadly accept it as a society.
[0] https://www.ussc.gov/sites/default/files/pdf/research-and-pu...
From [0]: "In fiscal year 2017, there were 80 copyright/trademark infringement offenders who accounted for 0.1% of all offenders sentenced under the guidelines." This is such a low number that I assume most prosecuted cases are settled without ever making it to sentencing, or alternatively copyright infringement is just hardly ever prosecuted criminally at all.
I don't understand how the fact that 80 people were prosecuted for copyright violation in one year is an argument that one person shouldn't be prosecuted for copyright violation.
Decades? Maybe not. A few years at minimum? Hell yeah!
Is this controversial? Executives should be held liable, certainly moreso than just regular people sharing files.
For better or for worse, the idea behind incorporation is that you, as an owner of part or all of the company, are separated from it financially and legally in most circumstances.
Zuckerberg may be CEO, majority shareholder, and on the board of Meta, but he didn't break copyright law, Meta did. So if there were to be a consequence, Meta would pay out the fine. Not sure how you jail a company.
Now, in a company with a real corporate governance structure, the board would look at the loss incurred by said fine, look at Zuckerberg, and immediately fire him for causing the loss. However, like I said before, Zuck's in charge of Meta, so that's not going to happen, and the fine is unlikely to be enough to drastically impact the company's profitability enough to sink his shares, which are the main repository of his wealth. So if he thinks he can make himself richer violating copyright law in the future, he will likely direct Meta to do so.
TL;DR, in the famous words of Bender from Futurama, "Hooray, the system fails again!"
> Zuckerberg may be CEO, majority shareholder, and on the board of Meta, but he didn't break copyright law, Meta did.
I'm still stuck on how Z telling Meta (or the relevant people at Meta, whatever) to go out there and do illegal shit doesn't make a court say that he's functionally done said illegal shit, or at least encouraged the company to do, and that he should thus be liable for that. It's not like there's much plausible deniability here. It'd be one thing if the lower ranks thought it'd be fine and did it of their own accord. It's quite another for Z to tell people to go nuts doing illegal shit.
The DMCA makes facilitation of copyright infringement illegal. Telling people to do copyright infringement is surely facilitation of copyright infringement. Surely then, Z having broken the DMCA is a fairly open and shut case, modulo calculating the damages. But apparently not?
> Not sure how you jail a company.
> the fine is unlikely to be enough to drastically impact the company's profitability enough to sink his shares
You lack imagination :-) but you've identified both the problem and the solution.
I’ve sometimes pondered this about the legal personhood of a company - it has most of the rights as a human being but can’t suffer any of the major consequences, such as jail.
It could be possible to construct a legalistic jail for a company whereby if it has committed the type of crime that a human could be jailed for, then it could be frozen for the duration, say ten years, and all its assets, shareholder funds, contracts, everything were frozen and impounded.
Of course this seems completely ludicrous because it’s so “out there” but it’s worth having the thought experiment. Things like “corporate manslaughter” really have few consequences for the corporation itself - if it was actually jailed for twenty years and shareholders and officers left frozen out and on pause, then it might be the kind of punishment that really counted for something.
> Not sure how you jail a company.
You jail the CEO and the others will stand up and take note.
"But they'll complain" who gives a fuck.
Well I guess the idea of incorporation is wrong then. Execs and major shareholder should absolutely be held personally held liable.
I would prefer a harsher punishment, but I would begrudgingly accept throwing him in jail for decades.
I always heard that criminals should be thrown in jail, it's time we started doing it to the real criminals.
There aren't enough things an executive can go to jail for.
Fines don't do anything to deter bad behavior. Either:
* The company pays
* They pay and the company mysteriously increases next year's comp / grants a "loan" / etc
* D&O insurer pays
In all three cases the money comes out of the shareholders' hides. It provides zero personal deterrence. The payoff matrix, as seen by a sociopath, makes it rational to always defect against the common good.
The only punishment that can really focus attention is physical imprisonment in a facility they can't choose.
SOX did this for financial reporting and gee shucks it turned out executives can follow the law after all!
> I'm all for strong justice, but you want to imprison an executive for decades for copyright violations?
They stole the life's work of millions of people.
In less civilized times, they likely would have been drawn and quartered by strong horses, and had their limbs drug to the 4 corners of the continent as a warning to anyone else that would consider doing it again.
I know people really hate AI training on their work - but is it really any different than a human reading it?
I know there's a complaint that AI can verbatim repeat that work. But so can human savants. No one is suing human savants for reading their books.
Producing copyrighted material, of course. Training on copyrighted material... I just don't see it.
EDIT: Making a perfectly valid point, but it's unpopular, so down I go.
I had to buy the copyrighted material before reading it... Meta apparently operates in a different legal system than me. That's my issue with it.
Yes, I have no objection to that part. It's the arguments that training itself is the problem.
Sarah Silverman as the most prominent example.
The human savant will remember where they read it and give you credit. It might lead more people to read your work, and ultimately you make money.
The AI won't even know where the page of text it's seeing came from, and people will avoid your book as they can just ask the AI. So you make less money. (Talking about specialized technical books here.)
Not necessarily.
There's a huge difference in scale. The human mind can only process a limited portion of all works available over a lifetime. Human learning is therefore naturally limited to small-scale reuse, which serves to keep it proportional.
A machine training on all copyrighted materials in the world for commercial purposes at an industrial scale makes it disproportionate.
I see that as a distinction - but does it make a difference?
If a company hired hundreds of savants, then it would be illegal for them to read books?
I don't follow.
It would hardly make a dent. And if you hired hundreds of savants, the knowledge would still be spread over hundreds of separate minds.
And even if we grant that those savants are also very skilled at creating "market substitutes" based on their training that are capable of competing with the original works, their maximum creative output would only be a relatively small number of new works, because they can only work at human speed.
Ok - but if a company were able to hire one million savants, you feel it should be illegal, because why?
Can you cite something in the copyright laws themselves that suggest this scale distinction?
This goes back to the original purpose of copyright, which is to serve as an economic incentive for individual creators and artists to make more art, by securing exclusive rights to use their own works commercially for a specified time. The goal is both the creation of more works, but also to protect the economic viability of artists.
This principle is quite universal and can be found in many places, including the US constitution and US (supreme) court decisions, many international jurisdictions, treaties and conventions.
But my question is about your point of scale.
I don't understand why it should be allowed for one savant to study and answer questions about one book, but wrong for a company to hire one million savants to answer questions about one million books.
And I'm asking where in the law or case law this is supported.
It’s different.
Hm. I'm not sure I follow your logic.
No one is asking human savants about what they read 1 million times per day.
Suppose they did, and some guy was filling stadiums regularly to hear him recite an entire audio book. That would probably get the attention of someone's lawyers.
I don't see your point. The problem is producing the copyrighted work, not processing it beforehand.
If it's illegal for AIs it should be illegal for humans, too. Is that really what you're arguing? It should be illegal for savants to read books?
I don't think anyone is arguing that the consumption is illegal. It's the reproduction that is illegal.
Read a book, that's fine. Write a book, that's fine. Read a book and then write a book that is 99.9% the same as the book that you read and sell it for profit without a license from the original author, that's infringement.
No, if you read the article, the point is in the training, not the reproduction.
That's what all these lawsuits are about - it's the training not the reproduction. I already agreed in my first comment that the reproduction is off limits.
In this case, it appears that Meta torrented illegal copies of the work to do the training. Obviously that's bad. But conflating that with training itself doesn't follow.
The point of these lawsuits is the piracy. My parent comment was about the general situation, not this specific article.
Pirating content is illegal, regardless of if it is to train an LLM.
Usage of LLMs trained on unlicensed content (basically all of them) might or might not be illegal.
Using any method to reproduce a copyrighted work by using that original as input in a way that supplants the market value of the original is probably illegal.
At least that is my rudimentary understanding.
Well - maybe so. But the common belief is that training itself is a violation of copyright, no matter how it's done. That's the argument I'm countering here.
The issue is that the trainers have not sought licenses for the data and instead outright pirated it.
I don't think anyone thinks that all training is a copyright violation if all the training data is licensed. For example a LLM trained on CC0 content would be fine with basically everyone.
The problem is that training happens on data that is not licensed for that use. Some of that data also is pirated which makes it even clearer that it is illegal.
But why should separate licensing be required at all? A search engine reads and indexes every word of every page it crawls. No one argues that requires licensing, only that the outputs must respect copyright. Why should training be different?
When google starting outputting summaries people asked the same questions.
If you supplant the value of the original with the original as input then you probably have some legal questions to answer.
Sharing copyrighted material is illegal. Presumably, if Meta blocked all seeding on the torrents they downloaded, they wouldn't have broken copyright, right?
If copyright law doesn't extend to the works being used for training, why should it extend to the model that is produced as a result? AI model creators have set up an ethical scenario where the right thing to do is ignore copyright laws when it comes to AI, which includes model use. It might never be legal, but it has become ethical to pirate models, distill them against ToS, etc.
I'm not sure I follow. Can you say it a different way?
I think the parent is basically saying that if you can legally pirate a book to train a LLM why can't you legally pirate a LLM model?
It's a "rules for thee and not for me" argument.
AH. Thank you.
Training requires making copies. Even if Meta had purchased each work they'd have had to make copies of it to distribute around the training cluster.
Does it though? If they bought a copy for each machine?
Then no copying happened so they'd be on firmer legal ground.
Good, we're agreed. My only point here is that training is not inherently a copyright violation.
>The problem is producing the copyrighted work, not processing it beforehand.
the distinction isn't particularly clear cut with an open source model. If it is able to reproduce copyright protected work with high fidelity such that the works produced would be derivative, that's like trying to get around laws against distribution of protected works by handing them to you in a zip file.
It's a kind of copyright washing to hand you the data as a binary blob and an algorithm to extract them out of it. That wouldn't really fly with any other technology.
And that's really where a lot of the value is mind you, these models are best thought of as lossily compressed versions of their input data. Otherwise Facebook ought to be perfectly fine to train them on public domain data.
I tend to agree - but you assume that it would not be possible to create a model that can train on copyrighted work and only output text which would be considered fair use.
That seems very possible to me, and undermines the "training is copyright violation" argument. It's not the training, it's the output.
reading it after stealing it: gray area. producing & monetizing competing works devaluing the original is a problem
So is it a problem when humans produce and monetize competing works? My understanding is that there quite an industry in humans reading books and synthesizing their points. Cliff's Notes, for example.
I did some quick googling and most of cliffs notes guides are on public domain works so no problem there, they've also paid to license content, and also have been protected by fair use as parody
To Kill a Mockingbird, The Catcher in the Rye, Beloved, The Kite Runner, The Handmaid's Tale are all copyrighted works with a Cliff's Notes guide.
> I know people really hate AI training on their work - but is it really any different than a human reading it?
Yes it's very different. Humans need to eat, sleep, and pay taxes. You also have to pay them competitive wages.
I'm not sure your argument is supported by the actual law as written.
https://news.ycombinator.com/item?id=48029673
There's nothing in the law to support your argument either. The law however does say, very unambiguously, that copying without permission isn't allowed . There aren't exceptions for "training" just because it's superficially similar to a human activity (reading a book). A human isn't allowed to hand-copy Harry Potter. Even if they bought all the Harry Potter books.
Yes. But training is not copying.
We already covered this: https://news.ycombinator.com/item?id=48029085
Why should an AI have the same rights as a human?
How about then to grant AI all other rights, for example, to allow voting?(sarcasm)
We're not talking about rights, we're talking about illegal acts. If it's illegal for a machine to do it, how can it be ok for a human?
Just from a rational argumentation point of view. Clearly if a law is written saying as much, then sure. But there is no such copyright law like that yet.
But machines don't do things. People do things, and they use tools/machines to do those things more easily or efficiently.
My apologies - I'm speaking loosely of course. Translate all my claims about machines breaking the law into claims about humans using machine breaking the law.
Sorry, I wasn't trying to be pedantic. I was trying to make the point (which I think is in line with your point) that the fact that AI is involved here doesn't make a difference. It is a tool, but the people using the tool are (as always) responsible for the outcome.
The issue is certainly not so simple. But it seems to me, purely theoretically, that the rules don't necessarily have to be the same for living people and non-living machines.
Well - actually - it is pretty simple. For something to be illegal, there must be a law saying it's illegal. There are no laws distinguishing humans from machines in copyright law.
> There are no laws distinguishing humans from machines in copyright law
Correct. Because until very recently there was no need.
AH. So you agree that it's not illegal.
HN really loves the copyright lobby when it's against someone they hate, huh
The problem is people at large companies creating these AI models, wanting the freedom to copy artists’ works when using it, but these large companies also want to keep copyright protection intact, for their regular business activities. They want to eat the cake and have it too. And they are arguing for essentially eliminating copyright for their specific purpose and convenience, when copyright has virtually never been loosened for the public’s convenience, even when the exceptions the public asks for are often minor and laudable. If these companies were to argue that copyright should be eliminated because of this new technology, I might not object. But now that they come and ask… no, they pretend to already have, a copyright exception for their specific use, I will happily turn around and use their own copyright maximalist arguments against them.
(Copied from a comment of mine written more than three years ago: <https://news.ycombinator.com/item?id=33582047>)
>wanting the freedom to copy artists’ works when using it
Learning from copyrighted content is legal - for both humans and AI. If Meta is in hot water for anything, it's piracy and/or storage of copyrighted material.
I think it's more that the little guy gets the book thrown at them while the rich bitch gets a slap on the wrist. This is widespread, and is BAD regardless of your personal opinion on copyright.
I take issue with the use of tense used in this framing. Its not 'infringed' its 'infringing' and to say that it happened is wrong, its happening and happening continuously in these models that are in use. To say a one time payment settles it is missing the whole scope of this theft.
Royalties are owed and continuously owed as these models are deployed and doing inference. How is it any different to paying a small pittance to someone every time a song is played?
Royalties for inference are unrealistic in a way that even royalties for training aren't.
The LLaMA models were released openly. Copies exist everywhere in the world. You aren't going to be able to charge someone for running `llama.cpp`; a court order ceases to have practical relevance at that point.
Inference might be unreasonable for a royalty agreement, but, in assessing damages, it is certainly relevant.
"I made enough copies for everyone" isn't a valid defense for copyright infringement.
These models can provide citations so I don't see why they can't tick a royalty owed. I'm sure many here could help build this pipeline.
First, LLMs do not reliably cite works. They are not looking things up in a database and repeating them. I think this false idea occurs a lot in people who don't understand what LLMs are or how they work.
Second, royalties are not required to cite a source.
Can you imagine how disastrous it would be to everything from news reporting to scientific publishing if that was the case?
Yeah well then I want my robot running this crap locally in its brain so I can get it to farm my two acres and haul water for me and I'll unplug from the rest of this nonsense going forward lol.
... LLMs cannot reliably provide citations. If you ask for citations, and the model did not use a web search tool, then whatever "citations" you receive are unreliable. Please do not trust these models to be honest. Just because they can discuss a topic doesn't mean they "know" where the knowledge came from in the same way that you don't need to have studied physics to catch a ball.
If you steal a book and read it, should you have to pay every time you use the knowledge gained or recall parts of it from memory?
No. People are not LLMs. And even if some argue that they are mechanically similar, they are legally distinct.
And yet most of the replies are giving examples of human action as if they are legally analogous.
If I charged people for the privilege of listening to me recite relevant parts of the book to them for profit? Yes. Depending on the copyright.
So like a teacher?
If I perform a song in public then yes, I should pay the creator every time I play it. I fail to see the difference here.
What if you are performing your own song which was heavily influenced by other artists?
Also I believe performing covers is legal
What if you steal a CD and then play it on your radio station each morning?
Even better, what if you transform that stolen CD into an MP3, so the data isn’t the same as a lossy process was used, then share the MP3 with the world as your own work?
I don’t get why the training process doesn’t count as any other form of transformation but then I’m not a lawyer.
even better if it is a pirate radio station