网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (5): 111-120.doi: 10.11959/j.issn.2096-109x.2022046

• 学术论文 • 上一篇    下一篇

基于字节码的以太坊智能合约分类方法

林丹1, 林凯欣2, 吴嘉婧2, 郑子彬1   

  1. 1 中山大学软件工程学院,广东 珠海 519082
    2 中山大学计算机学院,广东 广州 510006
  • 修回日期:2022-09-01 出版日期:2022-10-15 发布日期:2022-10-01
  • 作者简介:林丹(1996- ),女,广东揭阳人,中山大学博士生,主要研究方向为区块链、加密货币、网络科学的理论和应用
    林凯欣(2000- ),女,广东广州人,中山大学硕士生,主要研究方向为区块链、智能合约以及加密“货币”
    吴嘉婧(1989- ),女,江西吉安人,中山大学副教授,主要研究方向为网络科学、区块链交易网络、网络表示学习
    郑子彬(1982- ),男,广东潮州人,中山大学教授,主要研究方向为区块链、大数据、服务计算、机器学习、软件可靠性
  • 基金资助:
    国家重点研发计划(2020YFB1006005);国家自然科学基金(61973325);广东省自然科学基金(2021A1515011661);广州市科技计划项目(202102020616)

Bytecode-based approach for Ethereum smart contract classification

Dan LIN1, Kaixin LIN2, Jiajing WU2, Zibin ZHENG1   

  1. 1 School of Software Engineering, Sun Yat-sen University, Zhuhai 519082, China
    2 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
  • Revised:2022-09-01 Online:2022-10-15 Published:2022-10-01
  • Supported by:
    The National Key R&D Program of China(2020YFB1006005);The National Natural Science Foundation of China(61973325);The Natural Science Foundations of Guangdong Province(2021A1515011661);Guangzhou Basic and Applied Basic Research Project(202102020616)

摘要:

近年来,区块链技术已在金融、医疗和政务等领域得到了广泛应用和关注。然而,由于智能合约的不易篡改性和运行环境的特殊性,各类安全问题频繁出现。一方面是合约开发者在编写合约时出现的代码安全问题,另一方面是以太坊出现不少高风险智能合约,普通用户很容易被高风险合约提供的高回报所吸引,但对合约的风险却无从知晓。然而,关于智能合约安全的研究主要集中于代码安全方面,对合约功能识别的研究相对较少。假如能对智能合约功能进行准确分类,将有助于人们更好地理解智能合约的行为,同时保障智能合约生态安全,减少或挽回用户的损失。已有的智能合约分类方法通常依赖于对智能合约开源代码的分析,但以太坊发布的合约仅强制要求部署字节码,且只有极少数合约公布了其开源代码。因此,提出了一种基于字节码的以太坊智能合约分类方法。收集以太坊智能合约字节码和对应类别标签,然后提取操作码频率特征以及控制流图特征;通过实验对特征重要性进行分析,获取适合的图向量维度及最优的分类模型;在交易所、金融、赌博、游戏和高风险5个类别的智能合约多分类任务中进行实验验证,使用XGBoost分类器时的F1值达到0.913 8。实验结果表明所提方法能较好地完成以太坊智能合约的分类任务,并且能够应用于现实中的智能合约类别预测。

关键词: 区块链, 智能合约, 字节码, 分类

Abstract:

In recent years, blockchain technology has been widely used and concerned in many fields, including finance, medical care and government affairs.However, due to the immutability of smart contracts and the particularity of the operating environment, various security issues occur frequently.On the one hand, the code security problems of contract developers when writing contracts, on the other hand, there are many high-risk smart contracts in Ethereum, and ordinary users are easily attracted by the high returns provided by high-risk contracts, but they have no way to know the risks of the contracts.However, the research on smart contract security mainly focuses on code security, and there is relatively little research on the identification of contract functions.If the smart contract function can be accurately classified, it will help people better understand the behavior of smart contracts, while ensuring the ecological security of smart contracts and reducing or recovering user losses.Existing smart contract classification methods often rely on the analysis of the source code of smart contracts, but contracts released on Ethereum only mandate the deployment of bytecode, and only a very small number of contracts publish their source code.Therefore, an Ethereum smart contract classification method based on bytecode was proposed.Collect the Ethereum smart contract bytecode and the corresponding category label, and then extract the opcode frequency characteristics and control flow graph characteristics.The characteristic importance is analyzed experimentally to obtain the appropriate graph vector dimension and optimal classification model, and finally the multi-classification task of smart contract in five categories of exchange, finance, gambling, game and high risk is experimentally verified, and the F1 score of the XGBoost classifier reaches 0.913 8.Experimental results show that the algorithm can better complete the classification task of Ethereum smart contracts, and can be applied to the prediction of smart contract categories in reality.

Key words: blockchain, smart contract, bytecode, classification

中图分类号: 

No Suggested Reading articles found!