核电子学与探测技术

2026, 03, v.46 297-308

基于HLS的机器学习FPGA加速及其在STCF触发系统预研中的初步应用

HLS-Based FPGA Acceleration of Machine Learning and Its Preliminary Application in the STCF Trigger System Pre-research

周子煊^1,2 郝艺迪^1,2 封常青^1,2

方竹君^1,2 刘树彬^1,2

1.中国科学技术大学近代物理系 2.核探测与核电子学国家重点实验室,中国科学技术大学

基金项目(Foundation): 国家自然科学基金资助（项目编号：12341503）

邮箱(Email): fengcq@ustc.edu.cn;

DOI: 10.20173/j.cnki.ned.20260127.001

182	0	87
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

以人工神经网络为代表的机器学习算法目前已得到日益广泛的应用。当前主流的现场可编程逻辑门阵列(FPGA)芯片拥有丰富的逻辑资源与高速数据接口,并内置大量硬件乘法器等计算单元。相比基于CPU、GPU软件部署的复杂算法模型,利用FPGA可实现中小规模神经网络的硬件化部署,且具有低功耗、低延迟和高吞吐率等优势。近年来出现的高层次综合(High-Level Synthesis,HLS)技术,能够将采用高级编程语言编写的软件程序编译为数字逻辑电路描述,为神经网络算法的FPGA加速提供了便捷的手段。本文介绍了利用一款HLS工具——hls4ml实现神经网络算法编译综合、模型优化及FPGA部署的流程。同时围绕超级陶粲装置(STCF)触发系统关键技术预研中的主漂移室(MDC)三维径迹z向顶点重建需求,介绍了基于hls4ml的神经网络算法设计优化及FPGA加速的具体步骤。测试结果表明,该方法能够有效提升触发系统的实时处理能力,在保证算法精度的同时,显著降低了资源利用率,使神经网络的轻量化硬件部署成为现实。本文的研究为机器学习算法在大型粒子物理实验触发系统中的FPGA加速提供了参考范例。

关键词： HLS; FPGA加速; hls4ml; STCF触发系统;

Abstract：

Machine learning algorithms,represented by artificial neural networks,have found increasingly wide applications.Modern FPGA(Field Programmable Gate Array) devices provide abundant logic resources,high-speed data interfaces,and a large number of hardware multipliers,making them well-suited for hardware deployment(i.e.,algorithm acceleration) of small and mediumscale neural networks in nuclear electronics.Compared with CPU-and GPU-based software implementations,FPGA deployment offers advantages in power efficiency,latency,and throughput.The emerging high-level synthesis(HLS) technology enables the conversion of high-level software programs into digital circuit descriptions,providing an efficient path for deploying neural networks on FPGAs.This paper presents the implementation process of compiling neural networks using hls4ml,an HLS tool,including model optimization and FPGA deployment.Focusing on the z-vertex reconstruction task in the MDC(Main Drift Chamber) of the trigger system in the Super Tau-Charm Facility(STCF) pre-research,we detail the algorithm design,optimization,and hardware deployment procedures using hls4ml.Experimental results show that the hls4ml-based neural network acceleration method can effectively enhance the real-time processing capability of the trigger system while maintaining algorithm accuracy,and significantly reduce resource utilization,enabling lightweight hardware deployment of neural networks.This study provides a reference case for FPGA acceleration of machine learning algorithms in large-scale particle physics experiments trigger systems.

KeyWords： HLS; FPGA acceleration; hls4ml; STCF trigger system;

参考文献

[1]丁伟．在ATLAS探测器中标定高横动量希格斯粒子和寻找新物理[D]．北京：清华大学，2021.

[2]Aad G, Abbott B, Abbott D C, et al. ATLAS flavour-tagging algorithms for the LHC Run 2 pp collision dataset[J]. Eur Phys J C, 2023, 83(1)：/////681. https:doi. org10. 1140epjcs10052-023-11699-1.

[3]Acciarri R, Adams C, An R, et al. Convolutional neural networks applied to neutrino events in a liquid argon time projection chamber[J]. Journal of Instrumentation, 2017, 12(3):P03011. DOI：////10. 10881748-02211203P03011.

[4]Ha Minh M, IceCube Collaboration. Reconstruction of Neutrino Events in IceCube using Graph Neural/Networks[EBOL]. arXiv:2107. 12187[astro-ph．//IM], 2021-07-24[2025-09-05]. https:doi．//org10. 48550arXiv.2107. 12187.

[5]Reuter L, De Pietro G, Stefkova S, et al. End-toend multi-track reconstruction using graph neural networks at Belle II[J]. Computing and Software///for Big Science, 2025, 9(1):6. https:doi. org/10. 1007s41781-025-00135-6.

[6]Clerbaux B, Molla M C, Petitjean P A, Xu Y,Yang Y. Study of using machine learning for level 1trigger decision in JUNO experiment[J]. IEEE Transactions on Nuclear Science, 2021, 68(8)：/2187-2193. DOI:10. 1109TNS.2021. 3085428.

[7]Coussy P, Morawiec A. High-Level Synthesis:From Algorithm to Digital Circuit[M]. New York:Springer, 2008．//

[8]FastML Team. fastmachinelearninghls4ml[EB//OL]. Zenodo, 2024. Available:https:github．///comfastmachinelearninghls4ml. DOI:10. 5281zenodo.1201549.

[9]DUARTE J, et al. Fast inference of deep neural/networks in FPGAs for particle physics[JOL]．/JINST, 2018, 13(07):P07027. DOI:10. 1088///1748-02211307P07027．//

[10]Zhou Qidong. Belle-II trigger with ML[C]Presented at the 6 th FTCF International Workshop,Guangzhou, China, 2024.

[11]AMD Inc. AMD Vitis? High-Level Synthesis/（HLS）[EBOL]. San Jose:AMD, 2025[2025-/////09-05]. https:www. amd. comenproducts///softwareadaptive-socs-and-fpgasvitisvitis-hls.html.

[12]Intel Corporation. IntelⓇHigh Level Synthesis/Compiler:Pro Edition Reference Manual[EB OL]. Santa Clara:Intel, 2025-01-23[2025-09-//////05]. https:www. intel. comcontentwwwus////ensoftwareprogrammablequartus-primehlscompiler. html.

[13]Cadence Design Systems. Stratus High-Level/Synthesis[EBOL]. San Jose:Cadence Design//Systems,[2025-09-05]. https:www. cadence．/////comen_UShometoolsdigital-design-and-signoff/synthesisstratus-high-level-synthesis. html.

[14]Keerthan Jaic, Melissa C. Smith. Enhancing//Hardware Design Flows with MyHDL[C]/Proceedings of the 2015 ACMSIGDA International Symposium on Field-Programmable Gate Arrays.Association for Computing Machinery, 2015:28-////31. https:doi. org10. 11452684746. 2689092.

[15]Abadi M, et al. TensorFlow:Large-scale machine/learning on heterogeneous systems[EBOL]. 2015//[2025-09-05]. Available:https:www．/tensorflow. org. Software available from tensorflow.org．/

[16]Chollet F, et al. Keras[EBOL]. 2015[2025-//09-05]. Available:https:keras. io.

[17]PASZKE A, GROSS S, MASSA F, et al.PyTorch:An Imperative Style, High-Performance/Deep Learning Library[EBOL]. arXiv, 2019-12///[2025-09-05]. Available:https:doi. org/10. 48550arXiv.1912. 01703.

[18]FAHIM F, HAWKS B, HERWIG C, et al.hls4ml:an open-source codesign workflow to empower scientific low-power machine learning/devices[EBOL]. arXiv, 2021-03-09．////Available:https:arxiv. orgabs2103. 05579．/DOI:10. 48550arXiv.2103. 05579.

[19]Coelho, C. N., Kuusela, A., Li, S. et al.Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors[J]. Nat Mach Intell 3, 675–////686(2021). https:doi. org10. 1038s42256-021-00356-5.

[20]BASKIN C, LISS N, SCHWARTZ E, et al. UNIQ:Uniform Noise Injection for Non-Uniform/Quantization of Neural Networks[EBOL]. ACM Transactions on Computer Systems, 2019[2025-09///-05]. Available:https:doi. org/10. 11453444943.

[21]SUN C，ÅRRESTAD T K, LONCAR V,NGADIUBA J, SPIROPULU M. Gradient-based automatic per-weight mixed precision quantization/for neural networks on-chip[EBOL]. arXiv, 2024////-05. Available:https:doi. org10. 48550arXiv.2405. 00645.

[22]ZHU M, GUPTA S. To prune, or not to prune:exploring the efficacy of pruning for model/compression[EBOL]. OpenReview, 2018．///Available:https:openreview. netforum? id=S1lN69 AT-.

[23]Achasov, M., Ai, X. C., An, L. P. et al. STCF conceptual design report(Volume 1):Physics&detector[J]. Front. Phys. 19, 14701(2024)．////https:doi. org10. 1007s11467-023-1333-z.

[24]PENG Hai-Ping, ZHENG Yang-Heng, ZHOU Xiao-Rong. Super Tau-Charm Facility of China[J]. PHYSICS, 2020, 49(8):513-524. DOI：/10. 7693wl20200803.

[25]Dong W, Feng C, Hao Y, et al. FPGA-based fast track reconstruction for the conceptual design of STCF MDC trigger[J]. Journal of Instrumentation，/2022, 17(10):P10027. DOI:10. 10881748-///02211710P10027.

[26]Y. Hao et al. A 3-D Track Reconstruction Algorithm for Preresearch of the STCF MDC L1Trigger[J]. in IEEE Transactions on Nuclear Science, vol. 72, no. 3, pp. 429-437, March/2025, doi:10. 1109TNS.2024. 3503068.

[27]LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-/444. DOI:10. 1038nature14539.

[28]BENGIO Y. Practical recommendations for//gradient-based training of deep architectures[M]MONTAVON G, ORR G B, MÜLLER K R, eds.Neural networks:tricks of the trade. Lecture notes in computer science, vol 7700. Berlin,Heidelberg:Springer, 2012:437-478. DOI：/10. 1007978-3-642-35289-8_26.

[29]ZENG W, URTASUN R. MLPrune:multi-layer pruning for automated neural network compression//[C] Proceedings of the International Conference on Learning Representations(ICLR). 2019.

(1)该精度未设置更高以排除更多的本底是因为考虑到了离线物理分析要求保留所有|z|<10 cm的径迹，以便分析次级顶点。

基本信息:

DOI：10.20173/j.cnki.ned.20260127.001

中图分类号:O572.2;TN791;TP181

引用信息:

[1]周子煊,郝艺迪,封常青,等.基于HLS的机器学习FPGA加速及其在STCF触发系统预研中的初步应用[J].核电子学与探测技术,2026,46(03):297-308.DOI:10.20173/j.cnki.ned.20260127.001.

基金信息:

国家自然科学基金资助（项目编号：12341503）

请选择需要下载的pdf数据

核电子学与探测技术

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文