Appendix: Data Cleaning
To remove duplicates and clean titles to make the data suitable for further analysis. Data cleaning is a fundamental aspect of data analysis and is particularly important when working with real-world datasets, which often contain missing, duplicate, or inconsistent records. We provide detailed steps and rationale behind the steps.
The data-cleaning process is broken down into two major steps:
- Duplicate Removal: Removing duplicate entries based on multiple criteria.
- Title Cleaning: Removing conference proceeding information from the titles.