People who have been interested in any form of data science work have often heard these two sayings:
you will spend more time cleaning your data than running analysis on it; and
garbage in, garbage out.
In any data science project, clean data is almost always a prerequisite for accurate outcomes. There are several reasons for this:
our analysis can impact real lives on the ground if we’re working with community-facing information and we don’t want to make biased conclusions; ...