Selected Projects
Location: Home -> Research -> Selected Projects -> Content

Programming Semantic Learning and Onsite Code Generation Technology Based on Coding Big Data

Date: 2020-03-07   Click:


Duration: 2018--2020

Abstract: The software industry is the leading and strategic industry in the new era. Modern software is becoming more and more complex, with the characteristics of deep industry penetration, wide-field integration, and high behavioral interaction. The normalized large-scale cross-regional collaborative development, the rapidly expanding programming field big data, and the rapidly increasing instant response requirements have brought new challenges to the rapid and high-quality development of software. Therefore, how to use the massive programming field big data, through the construction of new intelligent software development methods and environment, improve the productivity and quality of software development, has become an essential scientific issue in the software industry. The project focuses on two key technologies: 1) real-time perception of onsite programming big data and 2) intelligent human-machine pair programming and co-evolution. In particular, this project studies the composition and internal relationship of programming field big data, constructs cross-regional programming field database, developer portrait database, and standard source code sample database, and establishes real-time dynamic big data environment and intelligent abstract model based on program analysis, deep learning and other technologies. This project also constructs and trains the virtual intelligent programming robot in the programming field environment supports automatic code generation and recommendation, defect detection, and repair; through the co-evolution with programmers in the programming field, it improves the productivity and quality of software development. Furthermore, this project studies the representation method and semantic learning method of the source code of the field program, establishes the code semantic learning model based on the deep neural network, and studies the real-time code generation and quality efficiency measurement method based on the semantic model of the source code of the field program. Based on the above technologies, this project builds a virtual intelligent programming robot that supports the dynamic human-computer interaction between the virtual intelligent programming robot and the real human programmer. By which, a cloud platform supporting the human-computer pair programming and co-evolution is constructed in the onsite programming environment.


Copyright © Software Engineering Institute, Peking University

Room 1541, Science Building 1, No.5 Yiheyuan Road, Haidian District, Beijing, P.R.China 100871