Different Sources of Data for Data Analysis - GeeksforGeeks (2024)

Improve

Data collection is the process of acquiring, collecting, extracting, and storing the voluminous amount of data which may be in the structured or unstructured form like text, video, audio, XML files, records, or other image files used in later stages of data analysis.
In the process of big data analysis, “Data collection” is the initial step before starting to analyze the patterns or useful information in data. The data which is to be analyzed must be collected from different valid sources.

Different Sources of Data for Data Analysis - GeeksforGeeks (1)

The data which is collected is known as raw data which is not useful now but on cleaning the impure and utilizing that data for further analysis forms information, the information obtained is known as “knowledge”. Knowledge has many meanings like business knowledge or sales of enterprise products, disease treatment, etc. The main goal of data collection is to collect information-rich data.

Data collection starts with asking some questions such as what type of data is to be collected and what is the source of collection. Most of the data collected are of two types known as “qualitative data“ which is a group of non-numerical data such as words, sentences mostly focus on behavior and actions of the group and another one is “quantitative data” which is in numerical forms and can be calculated using different scientific tools and sampling data.

The actual data is then further divided mainly into two types known as:

  1. Primary data
  2. Secondary data

Different Sources of Data for Data Analysis - GeeksforGeeks (2)

1.Primary data:

The data which is Raw, original, and extracted directly from the official sources is known as primary data. This type of data is collected directly by performing techniques such as questionnaires, interviews, and surveys. The data collected must be according to the demand and requirements of the target audience on which analysis is performed otherwise it would be a burden in the data processing.

Few methods of collecting primary data:

1. Interview method:

The data collected during this process is through interviewing the target audience by a person called interviewer and the person who answers the interview is known as the interviewee. Some basic business or product related questions are asked and noted down in the form of notes, audio, or video and this data is stored for processing. These can be both structured and unstructured like personal interviews or formal interviews through telephone, face to face, email, etc.

2. Survey method:

The survey method is the process of research where a list of relevant questions are asked and answers are noted down in the form of text, audio, or video. The survey method can be obtained in both online and offline mode like through website forms and email. Then that survey answers are stored for analyzing data. Examples are online surveys or surveys through social media polls.

3. Observation method:

The observation method is a method of data collection in which the researcher keenly observes the behavior and practices of the target audience using some data collecting tool and stores the observed data in the form of text, audio, video, or any raw formats. In this method, the data is collected directly by posting a few questions on the participants. For example, observing a group of customers and their behavior towards the products. The data obtained will be sent for processing.

4. Experimental method:

The experimental method is the process of collecting data through performing experiments, research, and investigation. The most frequently used experiment methods are CRD, RBD, LSD, FD.

  • CRD- Completely Randomized design is a simple experimental design used in data analytics which is based on randomization and replication. It is mostly used for comparing the experiments.
  • RBD- Randomized Block Design is an experimental design in which the experiment is divided into small units called blocks. Random experiments are performed on each of the blocks and results are drawn using a technique known as analysis of variance (ANOVA). RBD was originated from the agriculture sector.
  • LSD – Latin Square Design is an experimental design that is similar to CRD and RBD blocks but contains rows and columns. It is an arrangement of NxN squares with an equal amount of rows and columns which contain letters that occurs only once in a row. Hence the differences can be easily found with fewer errors in the experiment. Sudoku puzzle is an example of a Latin square design.
  • FD- Factorial design is an experimental design where each experiment has two factors each with possible values and on performing trail other combinational factors are derived.

2. Secondary data:

Secondary data is the data which has already been collected and reused again for some valid purpose. This type of data is previously recorded from primary data and it has two types of sources named internal source and external source.

Internal source:

These types of data can easily be found within the organization such as market record, a sales record, transactions, customer data, accounting resources, etc. The cost and time consumption is less in obtaining internal sources.

External source:

The data which can’t be found at internal organizations and can be gained through external third party resources is external source data. The cost and time consumption is more because this contains a huge amount of data. Examples of external sources are Government publications, news publications, Registrar General of India, planning commission, international labor bureau, syndicate services, and other non-governmental publications.

Other sources:

  • Sensors data: With the advancement of IoT devices, the sensors of these devices collect data which can be used for sensor data analytics to track the performance and usage of products.
  • Satellites data: Satellites collect a lot of images and data in terabytes on daily basis through surveillance cameras which can be used to collect useful information.
  • Web traffic: Due to fast and cheap internet facilities many formats of data which is uploaded by users on different platforms can be predicted and collected with their permission for data analysis. The search engines also provide their data through keywords and queries searched mostly.

Last Updated : 08 Jul, 2022

Like Article

Save Article

Previous

Software Framework vs Library

Next

MapReduce Architecture

Share your thoughts in the comments

Please Login to comment...

Different Sources of Data for Data Analysis - GeeksforGeeks (2024)

FAQs

What are the sources of data and data analysis? ›

Data can be gathered from two places: internal and external sources. The information collected from internal sources is called “primary data,” while the information gathered from outside references is called “secondary data.” For data analysis, it all must be collected through primary or secondary research.

How do you Analyse data from different sources? ›

How do you analyze data from different sources?
  1. Define your research objectives and questions.
  2. Choose the right data analysis methods and tools.
  3. Clean and prepare your data for analysis.
  4. Analyze your data from different sources separately and together. ...
  5. Visualize and communicate your data analysis results.
Jun 26, 2023

What are the different types of data sources explain? ›

The sources of data can be classified into two types: statistical and non-statistical. Statistical sources refer to data that is gathered for some official purposes, incorporate censuses, and officially administered surveys.

What are the 4 types of data analytics? ›

Analytics is a broad term covering four different pillars in the modern analytics model: descriptive, diagnostic, predictive, and prescriptive. Each type of analytics plays a role in how your business can better understand what your data reveals and how you can use those insights to drive business objectives.

What are the five main sources of data? ›

Examples of primary data sources include surveys and questionnaires, direct observations, experiments, interviews, and focus groups. As a researcher, you must be familiar with primary data sources, which are original data collected firsthand specifically for your research purposes.

What are the 3 most common data analysis? ›

Four Types of Data Analysis
  • Descriptive Analysis.
  • Diagnostic Analysis.
  • Predictive Analysis.
  • Prescriptive Analysis.

How do you choose sources of data? ›

What to Look for When Choosing a Data Source
  1. High-Quality. ...
  2. Continuously updated. ...
  3. Relevant information. ...
  4. Strong online presence. ...
  5. Conclusion.

What are three different ways to analyze data? ›

Analyzing the data
  • Descriptive analysis, which identifies what has already happened.
  • Diagnostic analysis, which focuses on understanding why something has happened.
  • Predictive analysis, which identifies future trends based on historical data.
  • Prescriptive analysis, which allows you to make recommendations for the future.
May 10, 2023

What are the different data sources in big data analytics? ›

Main sources of big data can be grouped under the headings of social (human), machine (sensor) and transactional. Social (human) – this source is becoming more and more relevant to organisations. This source includes all social media posts, videos posted etc.

What is an example of a data source? ›

Concretely, a data source may be a database, a flat file, live measurements from physical devices, scraped web data, or any of the myriad static and streaming data services which abound across the internet.

What are the three sources of data? ›

Sources of information or evidence are often categorized as primary, secondary, or tertiary material. These classifications are based on the originality of the material and the proximity of the source or origin.

What is the difference between data analysis and data analytics? ›

Data analytics is a general term for a field that includes the entire management of data and the methods and instruments employed. On the other hand, data analysis is the process of dissecting a given data set into its component pieces and analyzing each one separately, as well as how the parts relate to one another.

What is an example of data analysis? ›

For example, a researcher wants to study children and achievement in math. The research will compile data such as age, gender, grade level, and mathematics grades. This raw data is then interpreted through specific statistical programs to show relationships between the different variables.

What are the 3 common categories of data analytics? ›

Descriptive, predictive and prescriptive analytics.

What are the 3 types of data analysis? ›

Four main types of data analytics
  • Predictive data analytics. Predictive analytics may be the most commonly used category of data analytics. ...
  • Prescriptive data analytics. ...
  • Diagnostic data analytics. ...
  • Descriptive data analytics.

What are the 3 sources of data and information? ›

Sources of information or evidence are often categorized as primary, secondary, or tertiary material.

What are the sources of data for content analysis? ›

Sources of data could be from interviews, open-ended questions, field research notes, conversations, or literally any occurrence of communicative language (such as books, essays, discussions, newspaper headlines, speeches, media, historical documents). A single study may analyze various forms of text in its analysis.

What are the sources of data in statistical analysis? ›

What are Common Data Sources for Statistical Analyses?
  • Surveys or questionnaires.
  • Social media platforms.
  • Medical records.
  • Sales or financial reports.
  • Experiment or observation results.
  • Government records or databases.
Feb 28, 2023

Top Articles
Latest Posts
Article information

Author: Carlyn Walter

Last Updated:

Views: 6211

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.