- Rakuten Mobile said it uses container-comatible object storage to manage its piles of network data
- More than 100 of its network apps feed data into its storage platform
- But the operator noted good data management is key to avoiding the "data swamp" trap
Operators are on a quest to turn their mounds of raw data into AI gold. But unlike the alchemy of old, telcos like Rakuten Mobile have shown this transformation is actually possible — and object storage is a key ingredient.
As we’ve noted before, telcos are sitting on a heap of unstructured data: everything from old emails, web pages, PDFs, purchase orders, invoices, training manuals, repair guides and sensor readings. But when to comes to organizing this, Rakuten Mobile has a leg up.
Though it started as an MVNO in 2014, Rakuten Mobile built out and launched its own greenfield network in 2020. That means it was able to build a network completely hosted on a Kubernetes containerized software stack and leverage container-compatible object storage from MinIO to manage its “humungous” amount of mobile data, a representative for Rakuten Mobile’s AI and Data team told Fierce.
More than 100 apps deployed across Rakuten Mobile store their data in object storage, which in turn yields datasets for training AI models and use cases. Specifically, object storage is a core component of MLOps pipelines for storing experimental data, managing data versions and archiving model artifacts, the representative said.
“Its ability to store unstructured and semi-structured data at massive scale and low cost makes it ideal for ingesting raw data before processing,” the rep said of its object storage platform.
Rakuten Mobile's data alchemy strategy
But, while immensely helpful, object storage isn’t a panacea for telco data woes.
Asked what hurdles object storage can present, Rakuten Mobile’s team pointed to data discovery and governance, challenges processing raw unstructured data and data consistency.
“The ‘schema-on-read’ nature of object storage data lakes can lead to a ‘data swamp’ if not properly managed,” the representative said. Schema-on-read simply means that data is stored without a predefined structure and instead is structured during analysis stage.
While this is great for quickly ingesting large amounts of data, it also means “finding specific datasets, understanding their lineage, and ensuring data quality (metadata management, cataloging) consumes lot of time & effort,” the rep said. So, Rakuten Mobile has a “self-serve catalogue function available in the platform to create catalogues for data processed by data engineers or data scientists.”
There are other challenges beyond this. For instance, raw unstructured data requires a significant amount of processing before it can be used for analytics or AI. “Orchestrating these complex data pipelines efficiently can be challenging,” the rep said. The same goes for maintaining strong data consistency across regions and cloud environments.
That’s in part why Rakuten Mobile is taking the approach that the best defense is a good offense to avoid the infamous garbage-in-garbage-out issue.
“AIOps with anomaly detection modules have been deployed, proactively identifying missing data that manual monitoring would miss,” the rep said. “Data quality is continuously measured to detect and alert on drops, skewness or junk data being ingested from sources.”
Data management and object storage
Of course, Rakuten Mobile isn’t the only telco using object storage. MinIO also counts Verizon as a customer, while rival vendor Pure Storage lists KDDI, SoftBank and Virgin Media O2 among its clients.
Notably, Nokia recently shined a spotlight on the importance of data pipelines for telco AI, tapping PureStorage to provide block, file and object storage for the edge to core reference architecture it created with Red Hat.