Scopus Harvesting Series

Towards Better ML-Based Software Services: An Investigation of Source Code Engineering Impact

Yanli Li, The University of Sydney
Chongbin Ye, The University of Sydney
Huaming Chen, The University of Sydney
Shiping Chen, CSIRO Data61
Minhui Xue, CSIRO Data61
Jun Shen, University of Wollongong

Publication Name

Proceedings - 2023 IEEE International Conference on Software Services Engineering, SSE 2023

Abstract

In recent years, the development of machine learning-based solutions for software services, particularly for source code, has grown rapidly. It is witnessed that many machine learning models for software services require the input of source code snippets in a desired form of abstract syntax tree (AST), which is mostly generated from an external tool. However, such data pre-processing tasks could be done by different engineering tools, and the impact of these tools towards final models is often neglected. In this work, we aim to investigate the source code engineering impacts towards machine learning-based software services. Three different types of parsing tools are identified, which are parser generator, parsing library and parser developed for a certain purpose. They are thoroughly evaluated towards the impacts on the prediction model of Code2Vec for the prediction task of the method name in Java language. The collective result on the Java-small dataset shows that the generated ASTs differ a lot in terms of source code structures and contents when using different parsing tools. The difference could influence the performance of the trained model significantly. Our result suggests that when machine learning models are implemented for software services, especially for code-related tasks, the selection of parsing tools should be thoroughly considered during the data pre-processing stage. While there are some interesting findings on Java-med and Java-small, we anticipate this work could provide some insights for better ML-based software service solutions from the perspective of source code engineering.

Open Access Status

This publication is not available as open access

First Page

128

Last Page

137

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1109/SSE60056.2023.00027

Scopus Harvesting Series

Towards Better ML-Based Software Services: An Investigation of Source Code Engineering Impact

Publication Name

Abstract

Open Access Status

First Page

Last Page

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

Towards Better ML-Based Software Services: An Investigation of Source Code Engineering Impact

Authors

Publication Name

Abstract

Open Access Status

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Browse

Links