Cblas

This article is for those who love programming in C and look for a tool to perform linear algebra operations with the best performance.

As its creators say ‘The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks to perform basic vector and matrix operations’. The routines are split in 3 levels where level 1 operations involve vector only operations, level 2 matrix vector operations and level 3 matrix matrix operations. The Library was originally developed on fortran, but there are implementations on C that can be used including the header file #include <cblas.h>

Notation Notes

The first character on the name of all functions specify the type of values contained on the vectors or matrices.

D: double
S: float
Z: complex16
C: complex

for example the function cblas_ddot, computes the dot product of two vectors with double values.

Level 1 (Vector only operations)

On level 1 operations we will be dealing with vector only operations the following examples will show some how to perform the most common operations.

Scale a vector: x <- c x

To scale a vector you must provide the coefficient variable (0.5) the size of the vector (4), the vector (x) and the most confusing one the increment (1), it indicates cblas how to increment the index, in this case one by element on the array.

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z;
 7    double x[] = {1, 2, 2, 3};
 8    double x_scaled[4];
 9
10    cblas_dcopy(4, x, 1, x_scaled, 1);
11
12    // cblas_dscal(dim, scale_factor, vector_x, inc_x)
13    cblas_dscal(4, 0.5, x_scaled, 1);
14
15    printf("x  ->  0.5 x\n");
16    for (int i = 0; i < 4; i++) {
17        printf("%2.2f -> %2.2f\n", x[i], x_scaled[i]);
18    }
19    return 0;
20}

Dot product: <x.y>

For the dot product you need the size of both vectors (4), the vectors (x, y), and the increment of each one;

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z;
 7    double x[] = {1, 2, 2, 3};
 8    double y[] = {2, 1, 1, 1};
 9
10    // cblas_ddot(dim, vector_x, inc_x, vector_y, inc_y)
11    z = cblas_ddot(4, x, 1, y, 1);
12
13    for (int i = 0; i < 4; i++) printf("%2.2f %2.2f\n", x[i], y[i]);
14    printf("\ndot: %f\n", z);
15    return 0;
16}

Vector Sum: y <- c x + y

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z[4];
 7    double x[] = {1, 2, 2, 3};
 8    double y[] = {3, 1, 2, 2};
 9
10    cblas_dcopy(4, y, 1, z, 1);
11
12    // cblas_daxpy(dim, coeff, vector_x, inc_x, vector_y, inc_y);
13    cblas_daxpy(4, 0.5, x, 1, z, 1);
14
15    printf("x  ->  0.5 x\n");
16    for (int i = 0; i < 4; i++) {
17        printf("0.5 %2.2f + %2.2f -> %2.2f\n", x[i], y[i], z[i]);
18    }
19    return 0;
20}

As you may noticed cblas modify the value provided as input, that’s the reason on the scale and sum examples above there is the function cblas_dcopy(), but if you don’t need the original input vector(s) copy them is not necessary.

Note on Level 2 and 3 cblas routines

To use Level 2 and 3 subroutines you have to take into account how the library handles matrices, the reason is that fortran, the original language where blas was developed, use column major order to store values into an array while C use row major order. Cblas you to determine which order to use, however the use of row major order means a decrease of speed on the routines that handle matrices that way.

Level 2 (Matrix Vector operations)

On level 2 subroutines we deal with operations between matrices and vectors.

General matrix vector multiplication: y <- a A x + b y

Lets explain how to work with these kind of operations.

In the following example CblasRowMajor, is a Cblas parameter that specify how to handle the array, because of the way Fortran handles arrays, if you want more performance follow the column major order with the parameter CblasColMajor, and change the A variable putting rows as columns as.

1double A[2][2] = {
2    { 1, 1},
3    { 2, 3},
4};

Next CblasNoTrans is used to not transpose the matrix A,
The two parameters that go after CBlasNoTrans tells cblas the dimensions of A matrix (in this case 2 rows and 2 cols).
The 1 is the a scalar factor a that multiplies Ax.
A[0] is the pointer to the first element of the array, Cblas actually expects 1 dimensional arrays from C, so putting the matrix A as A[4] = {1, 2, 1, 3} is valid, the next parameter help cblas to handle the matrix A.
LDA defines the stride of the leading dimension which in this case is 2 because to go to another row we need to pass through 2 elements, in other words it tells to cblas to iterate the matrix as A[row * LDA + col], in the case of specifying CblasColMajor the matrix will be iterated as A[row + col*LDA].
The 1 parameter next to x, 1 pair correspond to the b scale factor of the y vector.
And finally the pairs x, 1 and z, 1 correspond to the vectors x and y with its respective way of passing through its elements (which in this example it just iterates element by element on the array).

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main()
 5{
 6    double A[2][2] = {
 7        { 1, 2},
 8        { 1, 3},
 9    };
10
11    double x[] = {1, 3};
12    double y[] = {1, 2};
13    double z[2];
14
15    cblas_dcopy(2, y, 1, z, 1);
16    //          Lead dim       TRANS         DIM   a  Mat   LDA vect  b  vect
17    cblas_dgemv(CblasRowMajor, CblasNoTrans, 2, 2, 1, A[0], 2,  x, 1, 1, z, 1);
18
19    int i, j;
20    printf("A\n");
21    for (i = 0; i < 2; i++) {
22        for(j = 0; j < 2; j++) {
23            printf("%2.2lf\t", A[i][j]);
24        }
25        printf("\n");
26    }
27
28    printf("\nz\ty\tx\n");
29    for (i = 0; i < 2; ++i) {
30        printf("%2.2lf\t%2.2lf\t%2.2lf\n", z[i], y[i], x[i]);
31    }
32    return 0;
33}

Level 3 (Matrix only operations)

If you understood the previous example working with matrices at this point will be easy, for the sake of completeness in this post, lets put one last example that illustrates how to implement a matrix by matrix multiplication.

General Matrix Matrix multiplication: C <- a A transpose(B) + b C

For the following example consider that:

A is a 2 by 3 matrix
B is a 2 by 3 matrix
C is a 2 by 2 matrix

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main()
 5{
 6    double A[] = {
 7        1, 2, 2,
 8        1, 3, 1
 9    };
10
11    double B[] = {
12        1, 3, 2,
13        1, 2, 1
14    };
15
16    double C[4] = {
17        2, 1,
18       -2, 1,
19    };
20
21    double Z[4];
22
23    //for (int i; i < 4; i++) Z[i] = C[i];
24    cblas_dcopy(4, C, 1, Z, 1);
25    //          Lead dim       TRANSA        TRANSB      m  n  k  a    Mat   Mat   b    Mat
26    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasTrans, 2, 2, 3, 2.0, A, 3, B, 3, 1.0, Z, 2);
27
28    printf("C <= 2.0 A B + C:\n");
29    for (int i = 0; i < 2; i++) {
30        for (int j = 0; j < 2; j++) {
31            if (!j) printf("  ");
32            printf("%2.2lf\t", Z[i*2 + j]);
33        }
34        printf("\n");
35    }
36
37    return 0;
38}

References

netlib