Summary: | clpeak OpenCL benchmark hangs during compilation on Clover RadeonSI | ||
---|---|---|---|
Product: | Mesa | Reporter: | Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | 0xe2.0x9a.0x9b, ricardo.ribalda, vedran, virtuousfox, znmeb |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 99553 | ||
Attachments: |
gdb backtrace
clinfo for the system |
Description
Jan Ziak (http://atom-symbol.net)
2016-07-12 11:10:26 UTC
Created attachment 125023 [details]
gdb backtrace
Looks like deep recursion in clover / LLVM code. Interesting, I will look into this. Not anymore on both LLVM 3.9.1 and LLVM git from today: input.cl:34:106: error: call to 'mad' is ambiguous input.cl:30:22: note: expanded from macro 'MAD_64' input.cl:29:22: note: expanded from macro 'MAD_16' input.cl:28:25: note: expanded from macro 'MAD_4' /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function input.cl:34:106: error: call to 'mad' is ambiguous Did clpeak change or did we change? If we changed, did we regress? With LLVM 4.0.0 I am getting the following results: $ clinfo Platform ID: 0x7ff6aaf2ed60 Name: AMD HAWAII (DRM 3.10.0 / 4.11.0-rc2+, LLVM 4.0.0) Vendor: AMD Device OpenCL C version: OpenCL C 1.1 Driver version: 17.1.0-devel Profile: FULL_PROFILE Version: OpenCL 1.1 Mesa 17.1.0-devel (git-ad13bd2) $ ./clpeak Platform: Clover Device: AMD HAWAII (DRM 3.10.0 / 4.11.0-rc2+, LLVM 4.0.0) Driver version : 17.1.0-devel (Linux x64) Compute units : 40 Clock frequency : 1000 MHz clpeak: /var/tmp/portage/sys-devel/clang-4.0.0/work/x/y/cfe-4.0.0.src/lib/Sema/Sema.cpp:317: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed. Aborted (core dumped) Same for me on tonga + git llvm/libclc/mesa/clpeak Platform: Clover Device: AMD TONGA (DRM 3.13.0 / 4.11.0-rc1-g00c1259, LLVM 5.0.0) Driver version : 17.1.0-devel (Linux x64) Compute units : 28 Clock frequency : 973 MHz clpeak: /mnt/sdb1/Gits/llvm/tools/clang/lib/Sema/Sema.cpp:316: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed. Aborted (In reply to Andy Furniss from comment #6) > Same for me on tonga + git llvm/libclc/mesa/clpeak > > Platform: Clover > Device: AMD TONGA (DRM 3.13.0 / 4.11.0-rc1-g00c1259, LLVM 5.0.0) > Driver version : 17.1.0-devel (Linux x64) > Compute units : 28 > Clock frequency : 973 MHz > clpeak: /mnt/sdb1/Gits/llvm/tools/clang/lib/Sema/Sema.cpp:316: > clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected > typos!"' failed. > Aborted This starts with clpeak commit - 16e1b207a4d4e81a0c48c77c950437dca1364cb6 is the first bad commit commit 16e1b207a4d4e81a0c48c77c950437dca1364cb6 Author: espes <espes@pequalsnp.com> Date: Mon Jul 18 17:06:15 2016 -0700 Add support for halfs Before this it completes OK, but there is some delay ~40 seconds, before results start appearing. With: Device: AMD CARRIZO (DRM 3.9.0 / 4.10.0-qtec-standard, LLVM 4.0.1) Driver version : 17.0.3 (Linux x64) Compute units : 8 Clock frequency : 800 MHz I am getting the same error as Vedran: error: call to 'mad' is ambiguous After reverting: 16e1b207a4d4e81a0c48c77c950437dca1364cb6 is the first bad commit commit 16e1b207a4d4e81a0c48c77c950437dca1364cb6 Author: espes <espes@pequalsnp.com> Date: Mon Jul 18 17:06:15 2016 -0700 I am experiencing an endless loop as reported by Jan. I get the same endless loop with: Platform: Clover Device: AMD PALM (DRM 2.49.0 / 4.10.0-qtec-standard, LLVM 4.0.1) Driver version : 17.0.3 (Linux x64) Compute units : 2 Clock frequency : 0 MHz I have something like this on Fedora - both 25 (stable) and 26 (alpha). I type "clpeak" and the CPU goes to 100% and nothing else happens. I'll attach a 'clinfo' printout. Created attachment 131104 [details] clinfo for the system Note: this bug is in Fedora's bugzilla as well - https://bugzilla.redhat.com/show_bug.cgi?id=1433632 Linking to a clpeak GitHub issue: https://github.com/krrishnarraj/clpeak/issues/32 Note: I'm now on Arch Linux and I have the non-looping version of this. > input.cl:34:106: error: call to 'mad' is ambiguous This looks to be caused by the lack of half precision builtins in libclc. GCN+ GPUs advertise support for cl_khr_fp16 in CLC but libclc is not ready yet. You can try my experimental cl_khr_fp16 branch: https://github.com/jvesely/libclc/tree/cl_khr_fp16 Initial support for cl_khr_fp16 builtins has been added to libclc in r332677. It should be enough to run clpeak. clpeak still takes few mins to compile the kernels (~7mins on my carrizo laptop) (In reply to Jan Vesely from comment #13) > Initial support for cl_khr_fp16 builtins has been added to libclc in r332677. > It should be enough to run clpeak. > clpeak still takes few mins to compile the kernels (~7mins on my carrizo > laptop) GREAT work Jan! After 3 min and ~12 sec float start crunching on my X3470 Xeon (only one core would be used for kernel compile => 3.6 GHz turbo mode) My desktop was frozen during float 'Global memory bandwidth (GBPS)' compute and partly frozen during 'Double-precision compute (GFLOPS)'. Whole benchmark finished after 6 min and 17 secs. /home/dieter> time clpeak Platform: Clover Device: Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 4.16.9-1.g4f45b1e-default, LLVM 7.0.0) Driver version : 18.2.0-devel (Linux x64) Compute units : 36 Clock frequency : 1411 MHz Global memory bandwidth (GBPS) float : 2.64 float2 : 2.64 float4 : 2.64 float8 : 2.54 float16 : 1.45 Single-precision compute (GFLOPS) float : 6341.87 float2 : 6131.34 float4 : 6105.61 float8 : 5933.91 float16 : 5939.44 half-precision compute (GFLOPS) half : 6307.47 half2 : 6193.25 half4 : 6114.34 half8 : 5729.57 half16 : 6047.90 Double-precision compute (GFLOPS) double : 404.52 double2 : 404.41 double4 : 404.06 double8 : 403.08 double16 : 401.53 Integer compute (GIOPS) int : 1222.75 int2 : 1213.90 int4 : 1210.72 int8 : 1208.57 int16 : 1213.99 Transfer bandwidth (GBPS) enqueueWriteBuffer : 8.78 enqueueReadBuffer : 4.86 enqueueMapBuffer(for read) : 4871.79 memcpy from mapped ptr : 4.94 enqueueUnmap(after write) : 3528.56 memcpy to mapped ptr : 4.94 Kernel launch latency : 293.57 us 206.285u 3.765s 6:17.14 55.6% 0+0k 0+0io 0pf+0w For reference AMD 17.40 /home/dieter> time clpeak Platform: AMD Accelerated Parallel Processing Device: Ellesmere Driver version : 2482.3 (Linux x64) Compute units : 36 Clock frequency : 1411 MHz Global memory bandwidth (GBPS) float : 202.59 float2 : 209.30 float4 : 209.63 float8 : 162.15 float16 : 138.41 Single-precision compute (GFLOPS) float : 6342.71 float2 : 6374.96 float4 : 6178.29 float8 : 5973.53 float16 : 6018.79 half-precision compute (GFLOPS) half : 6306.97 half2 : 6366.06 half4 : 6350.41 half8 : 6154.31 half16 : 6280.47 Double-precision compute (GFLOPS) double : 404.64 double2 : 404.38 double4 : 398.54 double8 : 403.25 double16 : 401.53 Integer compute (GIOPS) int : 1206.77 int2 : 1221.26 int4 : 1225.83 int8 : 1225.88 int16 : 1227.35 Transfer bandwidth (GBPS) enqueueWriteBuffer : 9.03 enqueueReadBuffer : 5.08 enqueueMapBuffer(for read) : 149130.81 memcpy from mapped ptr : 5.09 enqueueUnmap(after write) : 75882.81 memcpy to mapped ptr : 5.08 Kernel launch latency : 93.33 us 23.056u 1.592s 1:08.29 36.0% 0+0k 0+0io 0pf+0w |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.