Matrix calculation with cuBLAS

Hello
I started new project.
It uses cuBLAS to calculate matrix products.
It uses NIFs.
I want to apply it to Deep Learning.

7 Likes

I wrote addition, subtraction and multiplication code in CUDA and cuBLAS. So I measured it and compared it with the Matrix library.
Multiplication is fast, but addition and subtraction are not much different.

------ cuMatrix ----------------
iex(1)> a = Cumatrix.new(1000,1000,:rand);0
0
iex(2)> b = Cumatrix.new(1000,1000,:rand);0
0
iex(3)> require(Time)
nil
iex(4)> Time.time(Cumatrix.mult(a,b));0
“time: 226896 micro second”
“-------------”
0
iex(5)> Time.time(Cumatrix.mult(a,b));0
“time: 67511 micro second”
“-------------”
0
iex(6)> Time.time(Cumatrix.mult(a,b));0
“time: 33814 micro second”
“-------------”
0
iex(7)> Time.time(Cumatrix.add(a,b));0
“time: 24058 micro second”
“-------------”
0
iex(8)> Time.time(Cumatrix.add(a,b));0
“time: 31787 micro second”
“-------------”
0
iex(9)> Time.time(Cumatrix.add(a,b));0
“time: 28815 micro second”
“-------------”
0
iex(10)> Time.time(Cumatrix.sub(a,b));0
“time: 27208 micro second”
“-------------”
0
iex(11)> Time.time(Cumatrix.sub(a,b));0
“time: 25860 micro second”
“-------------”
0
iex(12)> Time.time(Cumatrix.sub(a,b));0
“time: 24574 micro second”
“-------------”
0

------ Matrix ---------------------
iex(1)> a = Matrix.rand(1000,1000);0
0
iex(2)> b = Matrix.rand(1000,1000);0
0
iex(3)> require(Time)
nil
iex(4)> Time.time(Matrix.mult(a,b));0
“time: 29913660 micro second”
“-------------”
0
iex(5)> Time.time(Matrix.mult(a,b));0
“time: 30599437 micro second”
“-------------”
0
iex(6)> Time.time(Matrix.mult(a,b));0
“time: 30030455 micro second”
“-------------”
0
iex(7)> Time.time(Matrix.add(a,b));0
“time: 21835 micro second”
“-------------”
0
iex(8)> Time.time(Matrix.add(a,b));0
“time: 84429 micro second”
“-------------”
0
iex(9)> Time.time(Matrix.add(a,b));0
“time: 76458 micro second”
“-------------”
0
iex(10)> Time.time(Matrix.sub(a,b));0
“time: 22082 micro second”
“-------------”
0
iex(11)> Time.time(Matrix.sub(a,b));0
“time: 25556 micro second”
“-------------”
0
iex(12)> Time.time(Matrix.sub(a,b));0
“time: 25680 micro second”
“-------------”
0

3 Likes

You should try out Matrex as well. https://github.com/versilov/matrex.

IIRC, you can pass cuBLAS at build time as the target blas library. With noblas option I was getting Matrix dotting about 10% faster than numpy on my Mac. I’d be curious to learn how it compares with cuBLAS.

I use Matrex with my own library of deep learning. Matrex is great. I use Matrex with Deep Pipe.

1 Like

Nice! I saw DeepPipe earlier but didn’t realize it used Matrex. I’ll have to take a look into it! It’s just a layer for building simple NN like porting (simple) tensor flow models over?

Out of curiosity, are you experimenting with cuBLAS for speed or do you have to manually write .cu code to
wrap cuBLAS? I think you could write a more efficient gradient descent using that approach.

Thank you.
Deep Pipe was written for me to understand DL. It can run only about MNIST.

GPU was indispensable for runing DL in practical use. But, I could not find the library. So I decided to make it myself. cuMatrix is still experimental. I plan to make improvements as engine for the Deep Pipe.

1 Like

Sasawaga-san, I think you are doing some of the most interesting work in the Elixir ecosystem. I hope one day to understand it all but I know its very valuable to our community. It looks like a potential path to making Elixir a more valuable platform in ML.

Please keep going!

2 Likes

Thank you very much.
It is very encouraging.

I was encouraged.
The loss function, the activation function, and the differential calculation are also added. I will start GPU version of Deep Pipe2.

3 Likes

I have updated to the latest version of Elixir ver1.10. I have modified cuMatrix to work.

4 Likes

I improved the data structure with reference to Matrex code. Speed up.
Thanks to Mr. versilov.

iex(1)> m = Cumatrix.new([[1.0,2.0],[3.0,4.0]])
{2, 2, <<0, 0, 128, 63, 0, 0, 64, 64, 0, 0, 0, 64, 0, 0, 128, 64>>}
iex(2)> Cumatrix.print(Cumatrix.mult(m,m))
7.000000 10.000000
15.000000 22.000000

true
iex(3)> a = Cumatrix.rand(1000,1000);0
0
iex(4)> b = Cumatrix.rand(1000,1000);0
0
iex(5)> require(Time)
Time
iex(6)> Time.time(Cumatrix.mult(a,b));0
“time: 14599 micro second”
“-------------”
0
iex(7)> Time.time(Cumatrix.mult(a,b));0
“time: 12263 micro second”
“-------------”
0
iex(8)> Time.time(Cumatrix.mult(a,b));0
“time: 11781 micro second”

1 Like