Currently, Cuda ConvNet is not necessarily faster than the one running on the CPU. This is due to the lack of the breadth in parallelization of the computations. I anticipate to improve this with better implementation as I learn more about parallel computing, In the Udacity Parallel Programming Intro Course.
Error Over Training:
Result: