Anyone who has worked with data has felt the pain of data
preparation. It’s a struggle—if you are working with the wrong tools—to massage
data or cleanse data of anomalies, outliers, and just plain old dirty data.
Eric Anderson, a Product Manager at Google working on Cloud Dataprep and a Harvard Business School alumni, understands how time-consuming it is for data scientists to clean up and ready large data sets for big data and machine learning initiatives. He has first-hand experience of how, in the real world, one of the bumps that slows down big data or machine learning projects is bad data.
He shares how Google Dataprep helps summarize, transform, visualize and clean up data. It focuses on that initial step, for any kind of data work, which is to get your data in position, in the right structure, joined with proper data sets so that it can be analysed.
If, for example, you have address data that is in a single string and you want to parse out the states into a new column, Dataprep could you help do that easily. It’ll even look at your data and alert you that you have union territories—not states—and it’ll ask you what you want to do.
Find out more about Google Dataprep. Listen in now.