Data cleaning company

The ETL process of massaging training data into a consistent and normalized format is often one of the most difficult and time consuming steps in machine learning and data science. While there are plenty of tools and services to help build models, they all require clean data inputs to be effective. As a result, people who work with data end up spending the vast majority of their time as "data janitors" to prepare for their analyses.

I wonder whether there'd be demand for a data cleaning company. From an economic perspective this seems like it'd be more efficient since work would be increasingly specialized and companies wouldn't need to hold excess capacity for these tasks in-house. The main difficulty would likely be ensuring that the data cleaning staff has the context necessary to develop a useful taxonomy for the data.