{"id":995,"date":"2012-04-29T22:44:44","date_gmt":"2012-04-29T14:44:44","guid":{"rendered":"http:\/\/www.magicandlove.com\/blog\/?p=995"},"modified":"2012-06-23T00:13:37","modified_gmt":"2012-06-22T16:13:37","slug":"opencl-particles-system-with-processing","status":"publish","type":"post","link":"http:\/\/www.magicandlove.com\/blog\/2012\/04\/29\/opencl-particles-system-with-processing\/","title":{"rendered":"OpenCL Particles System with Processing"},"content":{"rendered":"<p>I ported the particles demo program in <a href=\"http:\/\/code.google.com\/p\/javacl\/\">JavaCL<\/a> to Processing 2.0 alpha. It has reasonable performance in my iMac up to 500,000 particles at twenty something frames per second. The video is captured using the QuickTime screen recording. The performance is much slower than the original screen version.<br \/>\n&nbsp;<br \/>\n<iframe loading=\"lazy\" width=\"560\" height=\"315\" src=\"http:\/\/www.youtube.com\/embed\/uWEprPigBNs?rel=0\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n<pre lang=\"java\">\r\nimport processing.opengl.*;\r\nimport javax.media.opengl.*;\r\nimport javax.media.opengl.glu.GLU;\r\nimport java.util.Random;\r\n\r\nimport com.nativelibs4java.opencl.*;\r\nimport com.nativelibs4java.opencl.CLMem.Usage;\r\nimport org.bridj.Pointer;\r\n\r\nimport static org.bridj.Pointer.*;\r\n\r\nfinal int particlesCount = 200000;\r\n\r\nGL2 gl;\r\nPGL pgl;\r\n\r\nint [] vbo = new int[1];\r\n\r\nCLContext context;\r\nCLQueue queue;\r\n\r\nPointer<Float> velocities;\r\nCLKernel updateParticleKernel;\r\n\r\nCLBuffer<Float> massesMem, velocitiesMem;\r\nCLBuffer<Byte> interleavedColorAndPositionsMem;\r\nPointer<Byte> interleavedColorAndPositionsTemp;\r\n\r\nint elementSize = 4*4;\r\n\r\nvoid setup() {\r\n  size(800, 600, OPENGL);\r\n  background(0);\r\n  randomSeed(millis());\r\n\r\n  PGraphicsOpenGL pg = (PGraphicsOpenGL) g;\r\n  pgl = pg.beginPGL();\r\n  gl = pgl.gl.getGL().getGL2();\r\n  gl.glClearColor(0, 0, 0, 1);\r\n  gl.glClear(GL.GL_COLOR_BUFFER_BIT);\r\n  gl.glEnable(GL.GL_BLEND);\r\n  gl.glEnable(GL2.GL_POINT_SMOOTH);\r\n  gl.glPointSize(1f);\r\n  initOpenCL();\r\n  pg.endPGL();\r\n}\r\n\r\nvoid initOpenCL() {\r\n  context = JavaCL.createContextFromCurrentGL();\r\n  queue = context.createDefaultQueue();\r\n\r\n  Pointer<Float> masses = allocateFloats(particlesCount).order(context.getByteOrder());\r\n  velocities = allocateFloats(2 * particlesCount).order(context.getByteOrder());\r\n  interleavedColorAndPositionsTemp = allocateBytes(elementSize * particlesCount).order(context.getByteOrder());\r\n\r\n  Pointer<Float> positionsView = interleavedColorAndPositionsTemp.as(Float.class);\r\n  for (int i = 0; i < particlesCount; i++) {\r\n    masses.set(i, 0.5f + 0.5f * random(1));\r\n    velocities.set(i * 2, random(-0.5, 0.5) * 0.2f);\r\n    velocities.set(i * 2 + 1, random(-0.5, 0.5) * 0.2f);\r\n    int colorOffset = i * elementSize;\r\n    int posOffset = i * (elementSize \/ 4) + 1;\r\n    byte r = (byte) 220, g = r, b = r, a = r;\r\n    interleavedColorAndPositionsTemp.set(colorOffset++, r);\r\n    interleavedColorAndPositionsTemp.set(colorOffset++, g);\r\n    interleavedColorAndPositionsTemp.set(colorOffset++, b);\r\n    interleavedColorAndPositionsTemp.set(colorOffset, a);\r\n    float x = random(-0.5, 0.5) * width\/2.0, \r\n    y = random(-0.5, 0.5) * height\/2.0;\r\n    positionsView.set(posOffset, (float) x);\r\n    positionsView.set(posOffset + 1, (float) y);\r\n  }\r\n  velocitiesMem = context.createBuffer(Usage.InputOutput, velocities, false);\r\n  massesMem = context.createBuffer(Usage.Input, masses, true);\r\n\r\n  gl.glGenBuffers(1, vbo, 0);\r\n  gl.glBindBuffer(GL.GL_ARRAY_BUFFER, vbo[0]);\r\n  gl.glBufferData(GL.GL_ARRAY_BUFFER, (int) interleavedColorAndPositionsTemp.getValidBytes(), interleavedColorAndPositionsTemp.getByteBuffer(), GL2.GL_DYNAMIC_COPY);\r\n  gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);\r\n\r\n  interleavedColorAndPositionsMem = context.createBufferFromGLBuffer(Usage.InputOutput, vbo[0]);\r\n  String pgmSrc = join(loadStrings(dataPath(\"ParticlesDemoProgram.cl\")), \"\\n\");\r\n  CLProgram program = context.createProgram(pgmSrc);\r\n  updateParticleKernel = program.build().createKernel(\"updateParticle\");\r\n  callKernel();\r\n}\r\n\r\nvoid draw() {\r\n  queue.finish();\r\n  gl.glClear(GL.GL_COLOR_BUFFER_BIT);\r\n  gl.glBlendFunc(GL.GL_SRC_ALPHA, GL.GL_SRC_COLOR);\r\n  gl.glMatrixMode(GL2.GL_PROJECTION);\r\n  gl.glLoadIdentity();\r\n  pgl.glu.gluOrtho2D(-width\/2 - 1, width\/2 + 1, -height\/2 - 1, height\/2 + 1);\r\n  gl.glMatrixMode(GL2.GL_MODELVIEW);\r\n\r\n  gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, vbo[0]);\r\n  gl.glInterleavedArrays(GL2.GL_C4UB_V2F, elementSize, 0);\r\n  gl.glDrawArrays(GL.GL_POINTS, 0, particlesCount);\r\n\r\n  gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, 0);\r\n  callKernel();\r\n}\r\n\r\nvoid callKernel() {\r\n  CLEvent kernelCompletion;\r\n  synchronized(updateParticleKernel) {\r\n    interleavedColorAndPositionsMem.acquireGLObject(queue);\r\n    updateParticleKernel.setArgs(massesMem, \r\n    velocitiesMem, \r\n    interleavedColorAndPositionsMem.as(Float.class), \r\n    new float[] {\r\n      mouseX-width\/2, height\/2-mouseY\r\n    }\r\n    , \r\n    new float[] {\r\n      width, height\r\n    }\r\n    , \r\n    2.0, \r\n    2.0, \r\n    0.9, \r\n    0.8, \r\n    (byte) 0);\r\n\r\n    int [] globalSizes = new int[] {\r\n      particlesCount\r\n    };\r\n    kernelCompletion = updateParticleKernel.enqueueNDRange(queue, globalSizes);\r\n    interleavedColorAndPositionsMem.releaseGLObject(queue);\r\n  }\r\n}\r\n<\/pre>\n<p>&nbsp;<br \/>\nHere is the OpenCL source.<\/p>\n<pre lang=\"cpp\">\r\n\/\/ Ported to JavaCL\/OpenCL4Java (+ added colors) by Olivier Chafik\r\n\r\n#define REPULSION_FORCE 4.0f\r\n#define CENTER_FORCE2 0.0005f\r\n\r\n#define PI 3.1416f\r\n\r\n\/\/#pragma OpenCL cl_khr_byte_addressable_store : enable\r\n\r\nuchar4 HSVAtoRGBA(float4 hsva)\r\n{\r\n    float h = hsva.x, s = hsva.y, v = hsva.z, a = hsva.w;\r\n    float r, g, b;\r\n\r\n        int i;\r\n        float f, p, q, t;\r\n        if (s == 0) {\r\n                \/\/ achromatic (grey)\r\n                r = g = b = v;\r\n                return (uchar4)(r * 255, g * 255, b * 255, a * 255);\r\n        }\r\n        h \/= 60;                        \/\/ sector 0 to 5\r\n        i = floor( h );\r\n        f = h - i;                      \/\/ factorial part of h\r\n        p = v * ( 1 - s );\r\n        q = v * ( 1 - s * f );\r\n        t = v * ( 1 - s * ( 1 - f ) );\r\n        switch( i ) {\r\n                case 0:\r\n                        r = v;\r\n                        g = t;\r\n                        b = p;\r\n                        break;\r\n                case 1:\r\n                        r = q;\r\n                        g = v;\r\n                        b = p;\r\n                        break;\r\n                case 2:\r\n                        r = p;\r\n                        g = v;\r\n                        b = t;\r\n                        break;\r\n                case 3:\r\n                        r = p;\r\n                        g = q;\r\n                        b = v;\r\n                        break;\r\n                case 4:\r\n                        r = t;\r\n                        g = p;\r\n                        b = v;\r\n                        break;\r\n                default:                \/\/ case 5:\r\n                        r = v;\r\n                        g = p;\r\n                        b = q;\r\n                        break;\r\n        }\r\n    return (uchar4)(r * 255, g * 255, b * 255, a * 255);\r\n}\r\n\r\n__kernel void updateParticle(\r\n        __global float* masses,\r\n        __global float2* velocities,\r\n        \/\/__global Particle* particles,\r\n        __global float4* particles,\r\n        \/\/__global char* pParticles,\r\n        const float2 mousePos,\r\n        const float2 dimensions,\r\n        float massFactor,\r\n        float speedFactor,\r\n        float slowDownFactor,\r\n        float mouseWeight,\r\n        char limitToScreen\r\n) {\r\n\tint id = get_global_id(0);\r\n\r\n        float4 particle = particles[id];\r\n\r\n        uchar4 color = as_uchar4(particle.x);\r\n\r\n        float2 position = particle.yz;\r\n    \tfloat2 diff = mousePos - position;\r\n\r\n        float invDistSQ = 1.0f \/ dot(diff, diff);\r\n\tfloat2 halfD = dimensions \/ 2.0f;\r\n        diff *= (halfD).y * invDistSQ;\r\n\r\n        float mass = massFactor * masses[id];\r\n        float2 velocity = velocities[id];\r\n        velocity -= mass * position * CENTER_FORCE2 - diff * mass * mouseWeight;\r\n        position += speedFactor * velocities[id];\r\n        \r\n        if (limitToScreen) {\r\n            float2 halfDims = dimensions \/ 2.0f;\r\n            position = clamp(position, -halfDims, halfDims);\r\n        }\r\n\r\n        float dirDot = cross((float4)(diff, (float2)0), (float4)(velocity, (float2)0)).z;\r\n        float speed = length(velocity);\r\n\r\n        float f = speed \/ 4 \/ mass;\r\n        float hue = (dirDot < 0 ? f : f + 1) \/ 2;\r\n        hue = clamp(hue, 0.0f, 1.0f) * 360;\r\n\r\n        float opacity = clamp(0.1f + f, 0.0f, 1.0f);\r\n        float saturation = mass \/ 2;\r\n        float brightness = 0.6f + opacity * 0.3f;\r\n\r\n        uchar4 targetColor = HSVAtoRGBA((float4)(hue, saturation, brightness, opacity));\r\n        \r\n        float colorSpeedFactor = min(0.01f * speedFactor, 1.0f), otherColorSpeedFactor = 1 - colorSpeedFactor;\r\n        color = (uchar4)(\r\n            (uchar)(targetColor.x * colorSpeedFactor + color.x * otherColorSpeedFactor),\r\n            (uchar)(targetColor.y * colorSpeedFactor + color.y * otherColorSpeedFactor),\r\n            (uchar)(targetColor.z * colorSpeedFactor + color.z * otherColorSpeedFactor),\r\n            (uchar)(targetColor.w * colorSpeedFactor + color.w * otherColorSpeedFactor)\r\n        );\r\n\r\n        particle.x = as_float(color);\r\n        particle.yz = position;\r\n\r\n    \tparticles[id] = particle;\r\n\r\n        velocity *= slowDownFactor;\r\n        velocities[id] = velocity;\r\n}\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I ported the particles demo program in JavaCL to Processing 2.0 alpha. It has reasonable performance in my iMac up to 500,000 particles at twenty something frames per second. The video is captured using the QuickTime screen recording. The performance is much slower than the original screen version. &nbsp; import processing.opengl.*; import javax.media.opengl.*; import javax.media.opengl.glu.GLU; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[79,66],"tags":[106,62],"class_list":["post-995","post","type-post","status-publish","format-standard","hentry","category-software-2","category-testing","tag-opencl","tag-processing-org"],"_links":{"self":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/comments?post=995"}],"version-history":[{"count":6,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/995\/revisions"}],"predecessor-version":[{"id":1034,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/995\/revisions\/1034"}],"wp:attachment":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/media?parent=995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/categories?post=995"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/tags?post=995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}