Sources of Data
Databases:
- Definition: Structured collections of data.
- Examples: MySQL, PostgreSQL, MongoDB.
- Advantages: Easily queried and analyzed.
Surveys:
- Definition: Primary data collection from individuals or groups.
- Methods: Questionnaires, interviews, online forms.
- Advantages: Tailored to research objectives.
APIs (Application Programming Interfaces):
- Definition: Automated data retrieval from online platforms.
- Examples: Social media, weather services, financial markets.
- Advantages: Programmatically accessible data.
Web Scraping:
- Definition: Extracting data from websites.
- Methods: Manual or automated using tools/scripts.
- Advantages: Access to data not available through APIs.
Sensor Data Collection:
- Definition: Real-time data collection using sensors.
- Examples: Temperature sensors, GPS devices, cameras.
- Advantages: High-quality, continuous data.
Data Collection Methods
Manual Entry:
- Definition: Manually inputting data into a system.
- Advantages: Straightforward.
- Disadvantages: Time-consuming, prone to errors.
Web Scraping:
- Definition: Automated extraction of data from websites.
- Tools: Automated scripts.
- Advantages: Quick collection of large amounts of data.
Sensor Data Collection:
- Definition: Automated real-time data collection using sensors.
- Examples: IoT applications, environmental monitoring.
- Advantages: High-quality, continuous data.
APIs:
- Definition: Automated retrieval of data from online services.
- Advantages: Efficient, reliable, suitable for large datasets or real-time data.
Data Quality Assessment
Accuracy:
- Criteria: Free from errors and mistakes.
- Examples: Missing, duplicate, incorrect data.
Completeness:
- Criteria: Contains all necessary data points.
- Examples: Missing values.
Consistency:
- Criteria: Consistency across sources and time periods.
- Examples: Inconsistencies leading to misleading results.
Relevance:
- Criteria: Relevant to research objectives.
- Examples: Irrelevant or unnecessary data.
Timeliness:
- Criteria: Up-to-date data.
- Examples: Outdated data not reflecting current trends.
Conclusion
- Summary: Data gathering is a critical step involving exploration of sources, selection of collection methods, and assessment of data quality.
- Importance: Ensures reliable and accurate results for analysis and visualization.