Fixody

Introduction to Data Gathering

Data Science Guide

Sources of Data

Databases:

  • Definition: Structured collections of data.
  • Examples: MySQL, PostgreSQL, MongoDB.
  • Advantages: Easily queried and analyzed.

Surveys:

  • Definition: Primary data collection from individuals or groups.
  • Methods: Questionnaires, interviews, online forms.
  • Advantages: Tailored to research objectives.

APIs (Application Programming Interfaces):

  • Definition: Automated data retrieval from online platforms.
  • Examples: Social media, weather services, financial markets.
  • Advantages: Programmatically accessible data.

Web Scraping:

  • Definition: Extracting data from websites.
  • Methods: Manual or automated using tools/scripts.
  • Advantages: Access to data not available through APIs.

Sensor Data Collection:

  • Definition: Real-time data collection using sensors.
  • Examples: Temperature sensors, GPS devices, cameras.
  • Advantages: High-quality, continuous data.

Data Collection Methods

Manual Entry:

  • Definition: Manually inputting data into a system.
  • Advantages: Straightforward.
  • Disadvantages: Time-consuming, prone to errors.

Web Scraping:

  • Definition: Automated extraction of data from websites.
  • Tools: Automated scripts.
  • Advantages: Quick collection of large amounts of data.

Sensor Data Collection:

  • Definition: Automated real-time data collection using sensors.
  • Examples: IoT applications, environmental monitoring.
  • Advantages: High-quality, continuous data.

APIs:

  • Definition: Automated retrieval of data from online services.
  • Advantages: Efficient, reliable, suitable for large datasets or real-time data.

Data Quality Assessment

Accuracy:

  • Criteria: Free from errors and mistakes.
  • Examples: Missing, duplicate, incorrect data.

Completeness:

  • Criteria: Contains all necessary data points.
  • Examples: Missing values.

Consistency:

  • Criteria: Consistency across sources and time periods.
  • Examples: Inconsistencies leading to misleading results.

Relevance:

  • Criteria: Relevant to research objectives.
  • Examples: Irrelevant or unnecessary data.

Timeliness:

  • Criteria: Up-to-date data.
  • Examples: Outdated data not reflecting current trends.

Conclusion

  • Summary: Data gathering is a critical step involving exploration of sources, selection of collection methods, and assessment of data quality.
  • Importance: Ensures reliable and accurate results for analysis and visualization.
 
Recent Posts