Categories: Tech and Auto

How Did Amazon Detect So Much Child Sexual Abuse Material In Its AI Training Data?

Amazon detected hundreds of thousands of child sexual abuse cases in AI training data, raising concerns over data sourcing and safety.

Published by Sofia Babu Chacko
Published: January 30, 2026 04:24:01 IST

Amazon.com Inc. revealed that last year it detected hundreds of thousands of pieces of content in its AI training data that appeared to contain child sexual abuse material (CSAM).

After 12 Years of DIVORCE, Is Jennifer Winget Marrying Again? Wedding Rumors, Karan Wahi’s Ex-Girlfriend Secrets & Personal Details Exposed
Kannada TV Star Kavya Gowda Alleges Rape Threats Amid Family Feud As Husband Somshekar Undergoes Treatment After Being Stabbed By A Close Relative
‘Thought She Was A Bollywood Diva’: Meet IPS Officer Poorva Choudhary, UPSC 2024 Officer Breaks Internet With Her Charm, Chose Uniform Over Glamour

While the company removed the material before using it to train AI models, child safety officials say Amazon has not shared enough information about where the content came from, making it harder for law enforcement to protect victims and track down offenders.

How Amazon Found the Material?

Amazon uses an automatic scanning tool that compares content against a database of known CSAM, a process called hashing.

According to the company, nearly all the reports came from non-proprietary training data obtained from external sources, like publicly available web content.

The company also admitted it tends to over-report potential CSAM to avoid missing anything, which can lead to a high number of false positives.

A Dramatic Increase in Reports

The number of AI-related reports from Amazon jumped dramatically in 2025. The company accounted for most of over 1 million AI-related CSAM reports submitted to the National Center for Missing and Exploited Children (NCMEC), compared with just 67,000 reports from the rest of the tech industry the year before.

Experts say this surge is an outlier, raising concerns about the source of the material and the safeguards in place during AI training.

Challenges for Law Enforcement

While Amazon is required to report suspected CSAM to NCMEC, the company has provided very little detail on where the content came from or who shared it, limiting the ability of authorities to remove the material or investigate offenders. NCMEC officials said that without these details, the reports are often “inactionable.” according to Bloomberg News.

AI Development and the Risks of Fast Data Collection

The spike in reports comes amid a fast-paced AI race, where companies are rapidly gathering large amounts of data to improve their models.

Experts warn that this speed increases the risk that exploitative material can enter AI training pipelines, and training AI on illegal content could unintentionally teach models to manipulate or sexualize images of children.

Amazon’s Response

Amazon said it is committed to preventing CSAM across all its businesses. A spokesperson emphasized that none of the flagged material was AI-generated, and the company’s AI models have not produced any CSAM. They also highlighted that Amazon’s tools scan training data carefully and remove known illegal content before it is used.

Industry Perspective

Other tech companies, including Google, OpenAI, Meta, and Anthropic, also scan AI training data for CSAM. But according to NCMEC, Amazon’s reporting is far higher than its peers, while providing much less information about the source of the material. Experts say this underscores the need for greater transparency and stronger safeguards in AI development.

Calls for Greater Transparency

Experts like David Thiel, former technologist at the Stanford Internet Observatory, say companies should be more open about where their AI training data comes from and how it is cleaned. Without transparency, there is always a risk that illegal material slips through, and children remain at risk of exploitation.

The discovery of hundreds of thousands of CSAM instances in Amazon’s AI training data highlights the challenges of developing AI responsibly.

While Amazon has systems in place to scan and remove illegal content, experts say more transparency, oversight, and safety measures are urgently needed to protect children and prevent AI from being trained on exploitative material.

ALSO READ: PlayStation Plus Games February 2026 ANNOUNCED: Check Out Full List Inside

Sofia Babu Chacko

Sofia Babu Chacko is a journalist with over five years of experience covering Indian politics, crime, human rights, gender issues, and stories about marginalized communities. She believes that every voice matters, and journalism has a vital role to play in amplifying those voices. Sofia is committed to creating impact and shedding light on stories that truly matter. Beyond her work in the newsroom, she is also a music enthusiast who enjoys singing.

Bridgerton Season 4: The Meaning Of ‘Ward’ And Painful Secret Behind Sophie Baek’s Past- Explained »

After Maduro’s Capture, Trump Orders Reopening Of Venezuela Airspace, Americans Soon Able To Travel