飞扬围棋

标题: 使用 TensorRT 设置 KataGo [打印本页]

作者: 大桥英雄    时间: 2024-7-30 12:58
标题: 使用 TensorRT 设置 KataGo
分享2个高水平帖子:https://www.reddit.com/r/baduk/c ... o_and_katrain_with/
https://blog.csdn.net/nirendao/article/details/135326597

1、使用 TensorRT 设置 KataGo 和 KaTrain 的指南

[size=0.875]
有人[size=1em]要求提供使用 TensorRT 设置 KataGo/KaTrain 的指南,所以我认为值得写一篇文章来记录我的笔记。KataGo 通过 TensorRT 的速度明显快于 OpenCL。如果您想让我运行快速基准测试,请询问。
要使用 TensorRT,您需要 CUDA 工具包(抱歉,这仅适用于拥有相对较新的 Nvidia GPU 的人)。如果您想使用,请遵循以下安装指南:[size=1em]CUDA 工具包、[size=1em]TensorRT。遗憾的是,您需要创建一个免费的 Nvidia 帐户。
步骤如下:
CUDA_PATH_V12_4 设置为 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
注意]

<span]
2、如何配置TensorRT版的Katago无法复制,请点上面的链接。





作者: 大桥英雄    时间: 2024-7-30 13:04
如果提示找不到nvinfer.dll,把NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64下的nvinfer_10.dll,拷贝改个名nvinfer.dll。
作者: 大桥英雄    时间: 2024-7-30 13:05
这3个path环境变量手工加一下:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib
作者: 大桥英雄    时间: 2024-7-30 13:07
缺cudnn64_8.dll,见第二个帖子。
作者: 大桥英雄    时间: 2024-7-30 13:08
第一个帖子原文我也发一下。

A guide to setting up KataGo and KaTrain with TensorRT
Someone asked for a guide to setting up KataGo/KaTrain with TensorRT so I figured it was worth a post capturing my notes. KataGo is significantly faster via TensorRT than OpenCL. Ask if you want me to run a quick benchmark.

To use TensorRT you need the CUDA toolkit (so yeah sorry this only applies to people with relatively modern Nvidia GPUs). Here are the install guides if you want to use follow them: CUDA Toolkit, TensorRT. Saddly you need to make a free Nvidia account.

The steps are:

Install CUDA Toolkit. Technically you need 12.4 but they've gotten better about backward compatibility so I'm guessing all the newer versions will work.

You used to need to set environment variables but I think the installer now does it for use. I have

CUDA_PATH_V12_4 set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4

Download and install TensorRT. You need the Zip version instructions for windows.

Add TensorRT to your path via set env variables above. I have both

C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT-10.0.1.6\lib

C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT-10.0.1.6\include

added to my PATH

You are mostly done. Download the latest KataGo. Get something that looks like TRT8.6.1-CUDA12.1 (or higher numbers). Download it somewhere, rename the katago.exe file to something a bit more descriptive.

Download the latest stable model (checkout the fancy new b28 models)

Run a benchmark of the TRT KataGo

./katago-TRT8.6.1-CUDA12.1.exe benchmark -model <NEURALNET>.bin.gz

Only once the benchmark is running properly (it should say it's using TensorRT somewhere I think), copy everything over to your Katrain directory (it's usually in your user directory as .katrain).

Enjoy the firepower of your fully armed and operational battlestation

Lightvector/icosaplex is amazing and we all owe him!

Watchout - the initialization time with TensorRT is stupidly long vs CPU or even OpenCL (surely this is fixable somehow?)
作者: 大桥英雄    时间: 2024-7-30 13:10
注意:必须有N卡才行。2个文档已经很清楚了,互补一下刚刚好。
作者: 大桥英雄    时间: 2024-7-30 13:24
KataGo is really way faster with TensorRT
I just figured out how to run KataGo with TensorRT instead of OpenCL, I expected slightly better performance but it turned out to be more than twice as fast: from 290 to 673 visits/s on my laptop (3050 RTX). I want to share in case someone else is missing out on this.

This only works with nVidia graphics cards, and I only know Linux, not Windows or macOS.

So there are 3 so called compute backends: OpenCL, CUDA and TensorRT. This is clearly explained in the KataGo repository, it's nothing new. But there are two cruxes:

Our beloved KaTrain lets you choose between KataGo versions and then automatically downloads and configures them, but as far as I know it never shows the faster CUDA and TensorRT options, only OpenCL.

While KataGo provides precompiled binaries for CUDA and TensorRT, those are compiled for specific versions of CUDA/cuDNN/TensorRT which can make them difficult to run if your Linux distro provides newer versions.

I got hung up on KataGo's documentation saying that it has to be version this and that of the nVidia stuff, and didn't realize that if you just compile it yourself then it works fine with the latest versions of everything.

All I had to do differently was to add add_compile_options(-fpermissive) to CMakeLists.txt because the latest version of gcc is stricter and failed otherwise. So it was just cmake . -DUSE_BACKEND=TENSORRT + make -j 8
----------------------------------------------------------------------
KataGo 借助 TensorRT 确实速度更快
我刚刚弄清楚了如何使用 TensorRT 而不是 OpenCL 来运行 KataGo,我期望性能会稍微好一些,但结果却快了两倍多:在我的笔记本电脑 (3050 RTX) 上,访问次数从 290 次增加到 673 次/秒。我想分享一下,以防其他人错过这个。

这只适用于 nVidia 显卡,而且我只知道 Linux,不知道 Windows 或 macOS。

因此有 3 个所谓的计算后端:OpenCL、CUDA 和 TensorRT。这在 KataGo 存储库中有清楚的解释,这并不是什么新鲜事。但有两个关键点:

我们心爱的 KaTrain 让您可以在 KataGo 版本之间进行选择,然后自动下载和配置它们,但据我所知,它从不显示更快的 CUDA 和 TensorRT 选项,只显示 OpenCL。

虽然 KataGo 为 CUDA 和 TensorRT 提供了预编译的二进制文件,但它们是针对 CUDA/cuDNN/TensorRT 的特定版本编译的,如果您的 Linux 发行版提供了较新的版本,则它们可能难以运行。

我被 KataGo 的文档难住了,它说它必须是 nVidia 的这个版本和那个版本,但我没有意识到如果你自己编译它,那么它可以与所有东西的最新版本兼容。

我唯一要做的不同就是添加,add_compile_options(-fpermissive)因为CMakeLists.txt最新版本的 gcc 更严格,否则会失败。所以它只是cmake . -DUSE_BACKEND=TENSORRT+make -j 8
作者: syfy    时间: 2024-7-31 14:44
谢谢分享
作者: hred9D    时间: 2024-8-2 09:40
谢谢,尽管已知道怎么设置
作者: 攀登11    时间: 2024-8-21 20:33
好帖子,我都忘了我怎么配置的了,之前拿自己电脑看网上教程瞎搞一通,竟然直接把环境搞好了,trt引擎直接就能运行就是可惜16系的显卡用trt没啥用
作者: dongxiaohui    时间: 2024-9-28 17:26
论坛有打包好的,一键启动




欢迎光临 飞扬围棋 (http://bbs.flygo.net/Bbs/) Powered by Discuz! X3.2