AI storage is booming—can Filecoin step in to pick up the junk? — What is hot tiering, cold storage??
Preface: Filecoin hasn't sought partnerships for years, and Juan has become reclusive. I write about Filecoin because I have a neighboring Filecoin whale, Kang Ge @tktang88, and many big Filecoin miner friends who constantly share knowledge and future expectations about Filecoin. In particular, a point Kang raised this time caught my interest.
Thus this tweet was born—not a commercial advertisement, nor an encouragement to buy $FIL, but a new perspective on decentralized storage.
Main text
Two days ago, Micron's earnings outlook cast a shadow over the market; yesterday, better-than-expected results triggered a short‑term rally, even pushing Micron's market cap above Meta and Tesla. The driver is that AI‑era storage demand may exceed many people's imagination.
AI training and inference require high‑speed read/write; vector databases, KV cache offloading, model parameters, and intermediate inference states need stronger memory and storage capacity. This is a hardware‑level logic, more deterministic, and revenue is more direct.
However, AI storage demand will not stay limited to high‑speed memory and SSDs. As model training, inference, agents, and user‑generated content increase, another troublesome class of data will emerge: large amounts of short‑term valueless data with extremely low access frequency, possibly never needed again, yet companies are reluctant to delete it.
That's the focus of today's discussion—storage of junk data!
Data in the AI era is naturally tiered. At the front are hot data, currently used for training and inference, requiring high‑speed access, dominated by HBM, DRAM, NVMe SSDs, and high‑speed networks.
In the middle are warm data, potentially reusable in the near term, such as model checkpoints, training shards, vector indexes, experiment logs, evaluation data, and datasets still under iteration.
Finally, cold data—already completed training and not called upon in the short term, but may be needed later due to re‑training, rollbacks, copyright, regulation, audit, security incidents, or model reproducibility.
Notably, cold data falls outside Micron's current focus. Micron dominates high‑speed storage used for training and inference. This data has the highest value and price, making the necessary hardware scarce.
Cold data, on the other hand, is used extremely infrequently—original training data, cleaned data, deduplication logs, annotation records, early user‑generated images and videos—essentially considered junk. Most of these are never opened again, perhaps not read for years, yet cannot be simply deleted.
Because future re‑training, model rollbacks, output explanations, copyright disputes, regulatory audits, or simply new models may render previously useless data valuable.
Thus, the biggest headache in the AI era is the growing volume of data and increasing risk associated with deleting it.
Many early‑stage AI businesses manage data coarsely, without separating hot, warm, and cold tiers. Especially low‑frequency data occupying high‑cost storage is uneconomical in the long run, dramatically increasing storage costs. Using high‑speed cloud storage is even less viable. So, can we just toss these cold data into a hard‑disk ‘cold warehouse’?
The answer is no.
If AI data is merely dumped into a cold warehouse without indexes, tags, provenance, model‑version mapping, or cleaning process logs, the data is essentially lost even if it physically remains.
What’s needed is hot metadata and cold data bodies. The data bodies can reside in cold storage, but the directory, provenance, hash, CID, license, creation time, cleaning method, associated model, usage logs, privacy tags, retention period, and recovery test results must reside in a searchable, readable, auditable hot index layer.
This is why Filecoin and decentralized storage can be revisited—especially those with network storage capabilities.
Filecoin offers massive network storage capacity; while having many disks alone isn’t significant, the disks on the blockchain already form a prototype of verifiable cold storage. Filecoin’s distinctive features compared to traditional cloud storage are content addressing, multi‑provider storage, and on‑chain proof.
In plain terms, customers don’t have to trust a single cloud provider’s claim that “the data is stored”; they can continuously verify that the data remains unchanged and can be retrieved later via the same content identifier.
This capability is meaningful for AI cold data.
From this perspective, the real opportunity for decentralized storage may be the AI cold‑data management layer: migrating data from training clusters, cloud object storage, and on‑prem servers, performing deduplication, compression, privacy scanning, copyright tagging, encryption, and sharding, then placing large files into cold storage while retaining a hot index.
When a model needs re‑training, the system can retrieve data by source, time, tags, and model version. Without this ability, Filecoin is merely a warehouse; with it, decentralized storage could become part of AI data infrastructure.
Different decentralized storage projects should be evaluated separately. Filecoin is better suited for verifiable cold data warehouses, as its core is the storage market and data proofs, fitting large files, low‑frequency access, version‑stable dataset snapshots, model checkpoints, research data, public training corpora, and privacy‑processed audit logs.
Arweave is better for permanently public data, model documentation, data provenance records, immutable public archives, but data involving privacy or the right to delete is hard to store there due to compliance issues.
Storj and Sia are closer to decentralized object storage; if the user experience and pricing are competitive, they can capture some backup and archival needs, but they must also prove availability, recovery speed, enterprise services, and long‑term economic models.
Of course, the most important factor is being cheap enough.
AWS Glacier Deep Archive, Google Archive, Azure Archive, enterprise tape libraries, on‑prem object storage, disk manufacturers, and cloud providers will all vie for AI cold data.
Especially for ultra‑low‑frequency data, tape and deep archive remain competitive. Decentralized storage must first be cheap, but also meet verifiability, multi‑provider, vendor neutrality, and content addressing. Cheapness is only a door opener.
As AI continues to evolve, cold or junk data will increase, potentially becoming one of the biggest cost headaches for AI companies.
That’s why I believe the existing, cheap decentralized storage solutions deserve renewed discussion.
Historically, projects like Filecoin had supply (miners) but lacked real demand. There are many disks and storage providers on the network, and a decentralized narrative, yet real customers and paying users are virtually nonexistent.
If AI cold data becomes a large market and decentralized storage can deliver “hot index, cold storage” cheaper than traditional solutions, those existing disks could see real use.
From an investment perspective, Micron’s rise doesn’t automatically imply Filecoin should follow; their business models are entirely different.
Micron sells hardware; Filecoin’s value depends on paid storage volume, genuine customer count, renewal rate, retrieval success rate, restoration cost, storage provider profit, and whether this growth translates into $FIL demand, staking, fees, or burns.
Decentralized storage still has a long way to go, especially in implementing a functional “hot index, cold storage” system; that’s where Filecoin projects need to focus.
AI cold‑data demand is likely to materialize, but where it ends up will depend on who can be cheap enough, stable enough, searchable enough, and auditable enough.
If Filecoin can only prove it has many disks, that’s not very meaningful.
If Filecoin can demonstrate that these disks can handle real paid data and retrieve it reliably years later, with full restoration and sustained renewals, then the seemingly unwanted junk data of the AI era could indeed give decentralized storage a second chance.
End
