Research on Organization, Analysis and Application of Software Repository Data

Duration: 2015--2019

Abstract: In the past 20 years, a huge amount of software life cycle data has been accumulated in software repositories, recording the history of what happened during development and maintenance. It is very challenging to leverage these data to improve software productivity and quality in an efficient and effective way. In this project, we aim to utilize the big data to generate measurable, verifiable, and reproducible software engineering practices. The primary research topics of this project include: (1) semantic-consistent models to organize software lifecycle data and big-data oriented methods for user-customized data acquisition; (2) mechanisms of software lifecycle data quality and low-quality software lifecycle data handling methods; (3) models and methods to measure micro-processes in different development tasks and different projects; (4) methods and techniques to utilize the data to better exert quality assurance; Based on the research results, we will build a platform for research validation, data sharing and practice recommendation.

