What is data cleaning? How can we do that?

Questions by GeekAdmin   answers by GeekAdmin

Showing Answers 1 - 6 of 6 Answers

srinivas vadlakonda

  • Oct 6th, 2006
 

clearning means cleaning the data like filtering and merging before loading the data into datawarehouse

  Was this answer useful?  Yes

jessa

  • Oct 25th, 2006
 

data cleaning is removing discrepencies from the record like removing duplication , redundancy etc..in short making the data as relevant as can be made for the ultimate purpose of business analysis

  Was this answer useful?  Yes

jyothi

  • Feb 10th, 2007
 

Data Cleaning is a process of avoiding the unnecessary information in the process of data maintainance. Data Cleaning can be done by using clustering

  Was this answer useful?  Yes

chinnodu

  • Mar 2nd, 2007
 

It is a process of identifying and changing the inconsistencies and inaquerecies

  Was this answer useful?  Yes

Data cleaning is a self explainatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possiblities of the following are there:
1. Missing information for a column from one of the data sources;
2. Inconsistent information among different data sources;
3. Orphan records;
4. Outliar data points;
5. Different data types for the same information among various data sources, leading to improper conversion;
6. Data breaching business rules


In order to ensure that the data warehouse is not infected by any of these discrepencies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.

sub_guha

  • Jan 24th, 2010
 

Data cleaning, technically called "Data Cleansing"is a group of methods for making data more reliable and accurate. Usually companies store data in warehouses so they can make meaning out of it and take related decisions; so if the data is un-reliable or in-accurate the decisions are usually the same, leading to millions of dollars loss.


How can we cleanse:
Standardization - making data follow same rules, notifications, codes
Enrichment - filling in missing data based on some reference value (eg. City name)
De-duplication - finding and removing seemingly same but actually duplicate data 
Validations - commonly used for making sure data follows business rules


  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions