Big Data Research

Big data based intelligent software development methodology and environment

Bing XIE, Xin PENG, Gang YIN, Xuandong LI, Jun WEI, Hailong SUN

2021, 7(1): 3-21. doi:10.11959/j.issn.2096-0271.2021001

Asbtract ( 596 )

HTML ( 187)

PDF (2620KB) ( 628 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

A series of researches were conducted on the collection and organization of software engineering big data, software development knowledge representation and extraction, intelligent software development tools and service platforms. The purpose is to establish big data based intelligent software development technique systems, develop intelligent software development supporting tools, and form the next-generation intelligent software development environment and cloud-based platforms incorporating human, tools, and data. The outcome of the project includes a public service platform for the widespread innovation of the people and a series of intelligent software development environments for enterprises.

Software knowledge graph construction and Q＆amp;A technology based on big data

Yanzhen ZOU, Min WANG, Bing XIE, Zeqi LIN

2021, 7(1): 22-36. doi:10.11959/j.issn.2096-0271.2021002

Asbtract ( 992 )

HTML ( 238)

PDF (2526KB) ( 1181 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

With the increasing of software scale and software evolution, it is more and more important to construct software project knowledge graph for software maintenance and software development. Automatically constructing software knowledge graph with complex structure and rich semantic relations based on the multi-source heterogeneous mass data such as source code, mailing list, issue report and Q＆amp;amp;A document generated in the process of software project development is a key challenge to be solved urgently in the field of software engineering. A code-centric software knowledge model was proposed, a two-layer plugin framework for knowledge graph construction and software Q＆amp;amp;A was provided, which improves the efficiency of software understanding and software reuse. At present, software project knowledge graph has successfully deployed in the Apache open source community and in the domestic famous enterprises.

Context-based intelligent recommendation for code reuse

Xin PENG, Chi CHEN, Yun LIN

2021, 7(1): 37-47. doi:10.11959/j.issn.2096-0271.2021003

Asbtract ( 340 )

HTML ( 69)

PDF (2071KB) ( 455 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Intelligent code reuse recommendation based on code-related big data analysis, mining, and learning can improve the efficiency and quality of software reuse significantly. The targets of reuse include domain specific common code units and domain independent common code units. Context-based intelligent recommendation for code reuse was focused, template mining based code reuse recommendation and deep learning based code reuse recommendation were described. Based on these two parts of work, the future trend of context based intelligent recommendation for code reuse was discussed further.

Big-data based intelligent bug triage techniques for open-source projects

Shengqu XI, Feng XU, Xin CHEN, Xuandong LI

2021, 7(1): 48-63. doi:10.11959/j.issn.2096-0271.2021004

Asbtract ( 203 )

HTML ( 43)

PDF (1432KB) ( 360 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Bug triage aims to determine the priority and repair measures and is critical in ensuring software trustability. However, in the increasingly popular open-source projects, due to a large number of defects and lack of organization and management, it is challenging to triage all the bug reports by hand on time, making big-data based, automated and intelligent bug triage urgent. An intelligent bug triage technical framework based on industry and academia’s cognition was proposed, and three key tasks: bug priority classification, bug assignment, and bug reassignment, were identified comprehensively and systematically. Related technologies for the characteristics of open-source projects were proposed. The preliminary experiment results show the reasonableness and effectiveness of the above techniques.

An approach to automatically building Docker images by using domain knowledge

Wei CHEN, Hongjie YE, Jiahong ZHOU, Jun WEI

2021, 7(1): 64-75. doi:10.11959/j.issn.2096-0271.2021005

Asbtract ( 274 )

HTML ( 46)

PDF (1552KB) ( 547 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

A Dockerfile builds a Docker image by specifying how to construct a software system by downloading, installing and configuring software packages and their dependencies. However, manually writing a Dockerfile can be error-prone because system dependency resolution requires a lot of domain knowledge. Therefore, an approach to automating Dockerfile generation based on domain knowledge was proposed. The approach automatically parses Dockerfiles and extracts knowledge of building Docker images and stores the knowledge in a graph database. When generating new Dockerfiles, the system dependencies and their installation operations for the designated software based on the knowledge base were inferred. Experiments indicate that it is viable to automate Dockerfile generation for diversified software by inferring system dependencies and software package installations with the domain knowledge.

Data driven intelligent collaboration of software developers

Jian ZHANG, Xiangxin MENG, Hailong SUN, Xu WANG, Xudong LIU

2021, 7(1): 76-93. doi:10.11959/j.issn.2096-0271.2021006

Asbtract ( 384 )

HTML ( 45)

PDF (2388KB) ( 552 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Mining big software data and utilizing the knowledge contained in it to explore intelligent methods for software development is an active research topic. However, existing researches on software developer and crowd collaboration have not yet formed systematic methods. Therefore, the key technologies for intelligent collaboration through in-depth analysis of developer behavior were studied. Besides, the corresponding support environment was also developed on the basis of the key technologies to improve the efficiency and quality of software development. Firstly, a large amount of data related to developers were collected and analyzed. Secondly, a systematic approach of analyzing developers and their collaboration which is called developer knowledge graph was proposed. Thirdly, supported by the developer knowledge graph, the collaborative development method based on intelligent recommendation was introduced thoroughly. Depending on the above technologies, the corresponding supporting tools were developed, and a system of intelligent collaborative development environment was provided. Finally, the future work was prospected.

Big data of open source ecosystem for intelligent software development

Yang ZHANG, Tao WANG, Gang YIN, Yue YU, Jingquan HUANG

2021, 7(1): 94-106. doi:10.11959/j.issn.2096-0271.2021007

Asbtract ( 394 )

HTML ( 64)

PDF (1840KB) ( 580 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The open source software development process contains a lot of valuable data, which is huge in scale, fragmented, and rapidly expanding. Aimming to the characteristics, the big data structure of open source ecosystem of software engineering was studied, and a self-growing collection and processing framework and a convergence and sharing environment was proposed. The related research on the development of intelligent software based on open source big data of software engineering, and typical applications based on analysis and mining of open source big data of software engineering were expounded, and relevant guidance for the research and application of big data of open source ecosystem for intelligent software development was provided.

Travel time estimation based on urban traffic surveillance data

Wenming LI, Fang LIU, Peng LYU, Yanwei YU

2021, 7(1): 107-123. doi:10.11959/j.issn.2096-0271.2021008

Asbtract ( 471 )

HTML ( 117)

PDF (1940KB) ( 617 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

With the development of intelligent transportation, more and more surveillance cameras are deployed at the intersections of urban roads, which makes it possible to use the urban traffic surveillance data to estimate the vehicle travel time and query the route. Aiming at the problem of urban travel time estimation, a travel time estimation method based on the urban traffic surveillance data was proposed, which is called UTSD. Firstly, the traffic surveillance cameras were mapped into the urban road network, and a directed weighted urban road network graph was constructed based on traffic monitoring data recording. Secondly, a spatio-temporal index and a reverse index structure were built for travel time estimation, the former was used to quick search the camera records of all vehicles, and the latter was used to fast obtain the travel time and the passing camera trajectory of each vehicle. These two indexes significantly improved the efficiency of data query and travel time estimation. Finally, based on the constructed indexing structures, an effective travel time estimation and path query method was given. According to the departure time, origin and destination, the vehicles with the same origin and destination were matched on the spatio-temporal index structure, and then the reverse index was used to quickly obtain the travel time estimate and vehicle route. Using the real traffic monitoring big data of a provincial capital city for experimental evaluation, compared with Dijkstra shortest path algorithm based on directed graph and Baidu algorithm, the accuracy rate of the proposed method UTSD is improved by 65.02% and 40.94%, respectively. In addition, the average query time of UTSD is less than 0.3 s when the 7-day monitoring data is used as historical data, which verifies the effectiveness and efficiency of the proposed method.

Application of big data technology in precise prevention and control of epidemic situation

Gang LI, Xiangchun ZHENG, Huashan YIN, Wenchao HUANG

2021, 7(1): 124-134. doi:10.11959/j.issn.2096-0271.2021009

Asbtract ( 955 )

HTML ( 288)

PDF (1746KB) ( 911 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Taking City X as an example, based on the actual situation of a mega-city, big data processing and analysis methods, a large database for epidemic situation prevention and control based on “four standards and four realities” data was built. And through big data technology, to assist epidemic situation prevention and control, a system of real-time awareness of the epidemic situation, precise personnel control, and precise enterprise assistance was built. The specific technical methods were analyzed in detail, such as the data construction status in the system, the association rule mining algorithm adopted, the infection warning mechanism based on expectation maximum probability clustering, and the unstructured data utilization strategy based on text mining. The system approximately saved more than 100 000 hours for country cadres, precisely located and traced tens of thousands of susceptible people who were focused, had played a huge role in blocking epidemic infection, elevating the rate of production resumption, and reducing economic losses, therefore it has a reference significance for all parts of the country.

Primary exploration of transborder data folw supervision

Yangyong ZHU, Yun XIONG

2021, 7(1): 135-144. doi:10.11959/j.issn.2096-0271.2021010

Asbtract ( 297 )

HTML ( 70)

PDF (993KB) ( 491 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

With the increasing awareness of the value of data, transborder data flow has attracted more and more attention. On one hand, transborder data flow is necessary for economic globalization and the development of the digital economy. On the other hand, it may cause damage to national data security without effective supervision. Therefore, it is necessary to distinguish between reasonable transborder data flow and malicious one, and formulate appropriate regulations. According to the analysis, two types of current transborder data flow and four transborder data flow channels were given. Furthermore, a classification supervision method for transborder data flow was proposed. This work provides the support for transborder data flow supervision and the legislation for transborder data flow.

当期目录