Chinese Journal of Network and Information Security ›› 2022, Vol. 8 ›› Issue (5): 111-120.doi: 10.11959/j.issn.2096-109x.2022046

• Papers • Previous Articles     Next Articles

Bytecode-based approach for Ethereum smart contract classification

Dan LIN1, Kaixin LIN2, Jiajing WU2, Zibin ZHENG1   

  1. 1 School of Software Engineering, Sun Yat-sen University, Zhuhai 519082, China
    2 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
  • Revised:2022-09-01 Online:2022-10-15 Published:2022-10-01
  • Supported by:
    The National Key R&D Program of China(2020YFB1006005);The National Natural Science Foundation of China(61973325);The Natural Science Foundations of Guangdong Province(2021A1515011661);Guangzhou Basic and Applied Basic Research Project(202102020616)

Abstract:

In recent years, blockchain technology has been widely used and concerned in many fields, including finance, medical care and government affairs.However, due to the immutability of smart contracts and the particularity of the operating environment, various security issues occur frequently.On the one hand, the code security problems of contract developers when writing contracts, on the other hand, there are many high-risk smart contracts in Ethereum, and ordinary users are easily attracted by the high returns provided by high-risk contracts, but they have no way to know the risks of the contracts.However, the research on smart contract security mainly focuses on code security, and there is relatively little research on the identification of contract functions.If the smart contract function can be accurately classified, it will help people better understand the behavior of smart contracts, while ensuring the ecological security of smart contracts and reducing or recovering user losses.Existing smart contract classification methods often rely on the analysis of the source code of smart contracts, but contracts released on Ethereum only mandate the deployment of bytecode, and only a very small number of contracts publish their source code.Therefore, an Ethereum smart contract classification method based on bytecode was proposed.Collect the Ethereum smart contract bytecode and the corresponding category label, and then extract the opcode frequency characteristics and control flow graph characteristics.The characteristic importance is analyzed experimentally to obtain the appropriate graph vector dimension and optimal classification model, and finally the multi-classification task of smart contract in five categories of exchange, finance, gambling, game and high risk is experimentally verified, and the F1 score of the XGBoost classifier reaches 0.913 8.Experimental results show that the algorithm can better complete the classification task of Ethereum smart contracts, and can be applied to the prediction of smart contract categories in reality.

Key words: blockchain, smart contract, bytecode, classification

CLC Number: 

No Suggested Reading articles found!