I've got a trivial kernel that draws a sphere in a voxel cube; each voxel should end up as 0 or 1; if I use global id 0 as z, 1 as y, 2 as x I get corruptions where some voxels have random junk in; if I reverse the order so that global id 0 is x, 1 is y and 2 is z then it's happy. (Confirmed the code is clean with oclgrind and happy on Intel. Versions: Number of devices 1 Device Name AMD TURKS (DRM 2.50.0 / 4.13.0-1-amd64, LLVM 5.0.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 17.2.4 Driver Version 17.2.4 Device OpenCL C Version OpenCL C 1.1 (on debian testing, was on stable, but same behaviour) 01:00.0 0300: 1002:6841 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M] (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Thames [Radeon HD 7550M/7570M/7650M] Flags: bus master, fast devsel, latency 0, IRQ 37 Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d4300000 (64-bit, non-prefetchable) [size=128K] I/O ports at 4000 [size=256] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: radeon Kernel modules: radeon in an HP Elitebook laptop. Code that triggers this: https://github.com/penguin42/opencl-play/commit/c98470685874769e4a59975791459180564b6f6e build and run with: g++ -O2 ocl.cpp -lOpenCL && ./a.out 2> z then check output with: tr '01' ' ' <z|grep -v '^ *$'|egrep -v 'got_dev|^Z' which should be empty, (In some builds I've found I've had to increase the SIZE constant to 256 to trigger it) Then my commit e89fe62 fixes it with: diff --git a/sphere.ocl b/sphere.ocl index b4f23af..c89ecb9 100644 --- a/sphere.ocl +++ b/sphere.ocl @@ -1,10 +1,10 @@ __kernel void hello(__global uint* o) { - int z = get_global_id(0); + int z = get_global_id(2); int y = get_global_id(1); - int x = get_global_id(2); - int zr = get_global_size(0); + int x = get_global_id(0); + int zr = get_global_size(2); int yr = get_global_size(1); - int xr = get_global_size(2); + int xr = get_global_size(0); float zf = ((float)z - ((float)zr)/2) / (float)zr; float yf = ((float)y - ((float)yr)/2) / (float)yr; float xf = ((float)x - ((float)xr)/2) / (float)xr; by just swapping z/x around - which should make no difference given it's a cube. But....hmm, I've seen it fail in that direction now as well. The corruptions all seem to be near the maximum x/y/z value - almost like one small chunk in the max corner. Here's the kernel: __kernel void hello(__global uint* o) { int z = get_global_id(0); int y = get_global_id(1); int x = get_global_id(2); int zr = get_global_size(0); int yr = get_global_size(1); int xr = get_global_size(2); float zf = ((float)z - ((float)zr)/2) / (float)zr; float yf = ((float)y - ((float)yr)/2) / (float)yr; float xf = ((float)x - ((float)xr)/2) / (float)xr; o[z*yr*xr + y*xr + x] = ((zf * zf) + (yf * yf) + (xf * xf)) < 0.25; }
can you run using CLOVER_DEBUG=llvm,native CLOVER_DEBUG_FILE=foo and attach both llvm and isa dumps?
Created attachment 135311 [details] foo.ll from debug run
Created attachment 135312 [details] foo.link-0.asm
Created attachment 135313 [details] foo.link-0.ll That's all 3 of the debug files it produced. (I wasn't sure which were the llvm and which the isa dumps; I guess the asm is the isa? and the ll's are both llvm dumps?)
(In reply to Dave Gilbert from comment #4) > Created attachment 135313 [details] > foo.link-0.ll > > That's all 3 of the debug files it produced. > (I wasn't sure which were the llvm and which the isa dumps; I guess the asm > is the isa? and the ll's are both llvm dumps?) yes. the first .ll is from compilation step, the other one is from linking step. .ll dump looks correct. .asm also looks correct. you can try producing multiple asm dumps for working and non-working runs. But I don't think that the llvm is the culprit here. Can you try waiting for the kernel execution to complete explicitly before mapping the buffer? Ideally call clFinish() on line 63.
(In reply to Jan Vesely from comment #5) > (In reply to Dave Gilbert from comment #4) > > Created attachment 135313 [details] > > foo.link-0.ll > > > > That's all 3 of the debug files it produced. > > (I wasn't sure which were the llvm and which the isa dumps; I guess the asm > > is the isa? and the ll's are both llvm dumps?) > > yes. the first .ll is from compilation step, the other one is from linking > step. > > .ll dump looks correct. > .asm also looks correct. > > you can try producing multiple asm dumps for working and non-working runs. > But I don't think that the llvm is the culprit here. > > Can you try waiting for the kernel execution to complete explicitly before > mapping the buffer? > Ideally call clFinish() on line 63. Since I'm on the C++ binding (probably a mistake) I used: queue.finish(); and it seems to be working. (This also corresponds possibly to what I'm seeing on a more complex kernel; with a more complex kernel I'm seeing on a whole pile of data on the last few Z slices as being bogus suggesting it's not finished). Dave
Created attachment 135318 [details] annotated asm dump
(In reply to Dave Gilbert from comment #6) > (In reply to Jan Vesely from comment #5) > > (In reply to Dave Gilbert from comment #4) > > > Created attachment 135313 [details] > > > foo.link-0.ll > > > > > > That's all 3 of the debug files it produced. > > > (I wasn't sure which were the llvm and which the isa dumps; I guess the asm > > > is the isa? and the ll's are both llvm dumps?) > > > > yes. the first .ll is from compilation step, the other one is from linking > > step. > > > > .ll dump looks correct. > > .asm also looks correct. > > > > you can try producing multiple asm dumps for working and non-working runs. > > But I don't think that the llvm is the culprit here. > > > > Can you try waiting for the kernel execution to complete explicitly before > > mapping the buffer? > > Ideally call clFinish() on line 63. > > Since I'm on the C++ binding (probably a mistake) I used: > queue.finish(); > > and it seems to be working. > > (This also corresponds possibly to what I'm seeing on a more complex kernel; > with a more complex kernel I'm seeing on a whole pile of data on the last > few Z slices as being bogus suggesting it's not finished). > > Dave thanks for testing. I see you are using mesa 17.2. there were few changes to blocking call synchronization that went to mesa 17.3: 02f8ac6b70033a1b240d497c4664c359d2398cc3 (clover: Wrap event::wait_count in a method taking care of the required locking.) bc4000ee40c78efe1e5e8a6244d4bb55389d8418 (clover: Run the associated action before an event is signalled.) 3a5b69c09ba355c616c274b0c7f5aba3bd21fd54 (clover: Wait for requested operation if blocking flag is set) which might help address the issue. Can you test mesa 17.3?
(In reply to Jan Vesely from comment #8) > (In reply to Dave Gilbert from comment #6) > > (In reply to Jan Vesely from comment #5) > > > (In reply to Dave Gilbert from comment #4) > > > > Created attachment 135313 [details] > > > > foo.link-0.ll > > > > > > > > That's all 3 of the debug files it produced. > > > > (I wasn't sure which were the llvm and which the isa dumps; I guess the asm > > > > is the isa? and the ll's are both llvm dumps?) > > > > > > yes. the first .ll is from compilation step, the other one is from linking > > > step. > > > > > > .ll dump looks correct. > > > .asm also looks correct. > > > > > > you can try producing multiple asm dumps for working and non-working runs. > > > But I don't think that the llvm is the culprit here. > > > > > > Can you try waiting for the kernel execution to complete explicitly before > > > mapping the buffer? > > > Ideally call clFinish() on line 63. > > > > Since I'm on the C++ binding (probably a mistake) I used: > > queue.finish(); > > > > and it seems to be working. > > > > (This also corresponds possibly to what I'm seeing on a more complex kernel; > > with a more complex kernel I'm seeing on a whole pile of data on the last > > few Z slices as being bogus suggesting it's not finished). > > > > Dave > > thanks for testing. I see you are using mesa 17.2. > > there were few changes to blocking call synchronization that went to mesa > 17.3: > 02f8ac6b70033a1b240d497c4664c359d2398cc3 (clover: Wrap event::wait_count in > a method taking care of the required locking.) > bc4000ee40c78efe1e5e8a6244d4bb55389d8418 (clover: Run the associated action > before an event is signalled.) > 3a5b69c09ba355c616c274b0c7f5aba3bd21fd54 (clover: Wait for requested > operation if blocking flag is set) > > which might help address the issue. Can you test mesa 17.3? Yeh, I'll figure out how to get 17.3 built on this box.
I believe I'm still seeing this: dg@hath:~/ocl2$ clinfo Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 17.4.0-devel (git-a16dc04ad5) .... dg@hath:~/ocl2$ echo $LD_LIBRARY_PATH /home/dg/mesa/try/lib: so I *think* it's using my build. and I believe I'm still seeing it. Is my test valid or do I really need that finish?
(In reply to Dave Gilbert from comment #10) > I believe I'm still seeing this: > > dg@hath:~/ocl2$ clinfo > Number of platforms 1 > Platform Name Clover > Platform Vendor Mesa > Platform Version OpenCL 1.1 Mesa > 17.4.0-devel (git-a16dc04ad5) > .... > dg@hath:~/ocl2$ echo $LD_LIBRARY_PATH > /home/dg/mesa/try/lib: > > so I *think* it's using my build. yes, that looks OK. > and I believe I'm still seeing it. > Is my test valid or do I really need that finish? it should be OK. Can you replace the clFinish with clWaitForEvents (or the respective C++ method) to wait for kernel execution? It looks to me that clover creates new map without waiting for all the dep events.
(In reply to Jan Vesely from comment #11) > (In reply to Dave Gilbert from comment #10) > > I believe I'm still seeing this: > > > > dg@hath:~/ocl2$ clinfo > > Number of platforms 1 > > Platform Name Clover > > Platform Vendor Mesa > > Platform Version OpenCL 1.1 Mesa > > 17.4.0-devel (git-a16dc04ad5) > > .... > > dg@hath:~/ocl2$ echo $LD_LIBRARY_PATH > > /home/dg/mesa/try/lib: > > > > so I *think* it's using my build. > > yes, that looks OK. > > > and I believe I'm still seeing it. > > Is my test valid or do I really need that finish? > > it should be OK. Can you replace the clFinish with clWaitForEvents (or the > respective C++ method) to wait for kernel execution? > It looks to me that clover creates new map without waiting for all the dep > events. It doesn't seem to help, if I add: --- a/ocl.cpp +++ b/ocl.cpp @@ -74,6 +74,7 @@ static int got_dev(cl::Platform &plat, std::vector<cl::Device> &devices, cl::Dev cl::Event eventBarrier2; queue.enqueueBarrierWithWaitList(NULL,&eventBarrier2); std::cerr << __func__ << "enqueueMapBuffer gave: " << err << std::endl; + event.wait(); eventMap.wait(); eventBarrier2.wait(); that doesn't seem to help and I think event is the event triggered by the kernel.
(In reply to Dave Gilbert from comment #6) > (In reply to Jan Vesely from comment #5) > > (In reply to Dave Gilbert from comment #4) > > > Created attachment 135313 [details] > > > foo.link-0.ll > > > > > > That's all 3 of the debug files it produced. > > > (I wasn't sure which were the llvm and which the isa dumps; I guess the asm > > > is the isa? and the ll's are both llvm dumps?) > > > > yes. the first .ll is from compilation step, the other one is from linking > > step. > > > > .ll dump looks correct. > > .asm also looks correct. > > > > you can try producing multiple asm dumps for working and non-working runs. > > But I don't think that the llvm is the culprit here. > > > > Can you try waiting for the kernel execution to complete explicitly before > > mapping the buffer? > > Ideally call clFinish() on line 63. > > Since I'm on the C++ binding (probably a mistake) I used: > queue.finish(); > > and it seems to be working. > > (This also corresponds possibly to what I'm seeing on a more complex kernel; > with a more complex kernel I'm seeing on a whole pile of data on the last > few Z slices as being bogus suggesting it's not finished). > > Dave This reminds me of a certain issue I experienced with OpenMM. Is it limited to Turks, or it happens on SI+ cards?
(In reply to Dave Gilbert from comment #12) > > It doesn't seem to help, if I add: > --- a/ocl.cpp > +++ b/ocl.cpp > @@ -74,6 +74,7 @@ static int got_dev(cl::Platform &plat, > std::vector<cl::Device> &devices, cl::Dev > cl::Event eventBarrier2; > queue.enqueueBarrierWithWaitList(NULL,&eventBarrier2); > std::cerr << __func__ << "enqueueMapBuffer gave: " << err << std::endl; > + event.wait(); > eventMap.wait(); > eventBarrier2.wait(); > > > that doesn't seem to help and I think event is the event triggered by the > kernel. can you move it few lines up? (before the call to mapBuffer).
Hi Jan, Yes, doing: --- a/ocl.cpp +++ b/ocl.cpp @@ -65,6 +65,7 @@ static int got_dev(cl::Platform &plat, std::vector<cl::Device> &devices, cl::Dev events.push_back(event); cl::Event eventMap; queue.enqueueBarrierWithWaitList(&events); + event.wait(); mapped = (cl_uint*)queue.enqueueMapBuffer(output, CL_TRUE /* blocking */, CL_MAP_READ, 0 /* offset */, SIZE * SIZE * SIZE * sizeof(cl_uint) /* size */, does seem to work. Vedran: I've only got a Turks to play with; feel free to try my test on something else.
(In reply to Dave Gilbert from comment #15) > Hi Jan, > Yes, doing: > --- a/ocl.cpp > +++ b/ocl.cpp > @@ -65,6 +65,7 @@ static int got_dev(cl::Platform &plat, > std::vector<cl::Device> &devices, cl::Dev > events.push_back(event); > cl::Event eventMap; > queue.enqueueBarrierWithWaitList(&events); > + event.wait(); > mapped = (cl_uint*)queue.enqueueMapBuffer(output, CL_TRUE /* blocking > */, CL_MAP_READ, > 0 /* offset */, > SIZE * SIZE * SIZE * sizeof(cl_uint) /* size */, > > does seem to work. thanks, that means the kernel work event works correctly. I'll need to double check the specs wrt synchronization points. we either miss a wait, or fail to update mapped buffers after kernel finishes execution. > > Vedran: I've only got a Turks to play with; feel free to try my test on > something else.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/615.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.