Statistical Data Cleaning with Applications in R

Mark van der Loo & Edwin de Jonge

Language: English

Publisher: Wiley

Published: Jan 25, 2018

Description:

A comprehensive guide to automated statistical data cleaning 

The production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy.

Key features:

  • Focuses on the automation of data cleaning methods, including both theory and applications written in R.
  • Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.
  • Explores statistical techniques for solving issues such as incompleteness, contradictions and...