Bug 96881

Summary: ViennaCL fails dense_blas-bench-opencl benchmark with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Product: Mesa Reporter: ubizjak
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: jv356, vedran
Version: 11.2   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 99510    
Bug Blocks: 99553    

Description ubizjak 2016-07-10 15:20:57 UTC
The dense_blas-bench-opencl benchmark from ViennaCL suite fails with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0):

$ ./dense_blas-bench-opencl 

----------------------------------------------
               Device Info
----------------------------------------------

Name:                AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Vendor:              AMD
Type:                GPU 
Available:           1
Max Compute Units:   10
Max Work Group Size: 256
Global Mem Size:     1073741824
Local Mem Size:      32768
Local Mem Type:      1
Host Unified Memory: 1


Benchmark : BLAS
----------------
sCOPY : 64.3 GB/s
sAXPY : 95.4 GB/s
sDOT : 85.3 GB/s
sGEMV-N : 20.8 GB/s
sGEMV-T : 44.3 GB/s
sGEMM-NN : 126 GFLOPs/s
sGEMM-NT : 87.6 GFLOPs/s
sGEMM-TN : 90.5 GFLOPs/s
sGEMM-TT : 72.3 GFLOPs/s
----
Build Status = -2 ( Err = -11 )
Log: unsupported call to function __subdf3 in av_cpu
Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void av_cpu( 
  __global double * vec1, 
  uint4 size1, 
...

It looks like DFmode (double) instructions are not enabled correctly in LLVM for targets that report cl_khr_fp64 extension.

clinfo reports:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.2.2
  Driver Version                                  11.2.2
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               10
  Max clock frequency                             850MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    ...
Comment 1 Vedran Miletić 2016-07-11 09:54:51 UTC
Jan, does R600 backend in LLVM have relevant instructions implemented?
Comment 2 Jan Vesely 2016-07-11 16:40:52 UTC
(In reply to Vedran Miletić from comment #1)
> Jan, does R600 backend in LLVM have relevant instructions implemented?

I don't think they are.
Comment 3 GitLab Migration User 2019-09-18 19:21:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/589.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.