Cblas

This article is for those who love programming in C and look for a tool to perform linear algebra operations with the best performance.

As its creators say ‘The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks to perform basic vector and matrix operations’. The routines are split in 3 levels where level 1 operations involve vector only operations, level 2 matrix vector operations and level 3 matrix matrix operations. The Library was originally developed on fortran, but there are implementations on C that can be used including the header file #include <cblas.h>

Notation Notes

The first character on the name of all functions specify the type of values contained on the vectors or matrices.

for example the function cblas_ddot, computes the dot product of two vectors with double values.

Level 1 (Vector only operations)

On level 1 operations we will be dealing with vector only operations the following examples will show some how to perform the most common operations.

Scale a vector: x <- c x

To scale a vector you must provide the coefficient variable (0.5) the size of the vector (4), the vector (x) and the most confusing one the increment (1), it indicates cblas how to increment the index, in this case one by element on the array.

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z;
 7    double x[] = {1, 2, 2, 3};
 8    double x_scaled[4];
 9
10    cblas_dcopy(4, x, 1, x_scaled, 1);
11
12    // cblas_dscal(dim, scale_factor, vector_x, inc_x)
13    cblas_dscal(4, 0.5, x_scaled, 1);
14
15    printf("x  ->  0.5 x\n");
16    for (int i = 0; i < 4; i++) {
17        printf("%2.2f -> %2.2f\n", x[i], x_scaled[i]);
18    }
19    return 0;
20}

Dot product: <x.y>

For the dot product you need the size of both vectors (4), the vectors (x, y), and the increment of each one;

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z;
 7    double x[] = {1, 2, 2, 3};
 8    double y[] = {2, 1, 1, 1};
 9
10    // cblas_ddot(dim, vector_x, inc_x, vector_y, inc_y)
11    z = cblas_ddot(4, x, 1, y, 1);
12
13    for (int i = 0; i < 4; i++) printf("%2.2f %2.2f\n", x[i], y[i]);
14    printf("\ndot: %f\n", z);
15    return 0;
16}

Vector Sum: y <- c x + y

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main(void)
 5{
 6    double z[4];
 7    double x[] = {1, 2, 2, 3};
 8    double y[] = {3, 1, 2, 2};
 9
10    cblas_dcopy(4, y, 1, z, 1);
11
12    // cblas_daxpy(dim, coeff, vector_x, inc_x, vector_y, inc_y);
13    cblas_daxpy(4, 0.5, x, 1, z, 1);
14
15    printf("x  ->  0.5 x\n");
16    for (int i = 0; i < 4; i++) {
17        printf("0.5 %2.2f + %2.2f -> %2.2f\n", x[i], y[i], z[i]);
18    }
19    return 0;
20}

As you may noticed cblas modify the value provided as input, that’s the reason on the scale and sum examples above there is the function cblas_dcopy(), but if you don’t need the original input vector(s) copy them is not necessary.

Note on Level 2 and 3 cblas routines

To use Level 2 and 3 subroutines you have to take into account how the library handles matrices, the reason is that fortran, the original language where blas was developed, use column major order to store values into an array while C use row major order. Cblas you to determine which order to use, however the use of row major order means a decrease of speed on the routines that handle matrices that way.

Level 2 (Matrix Vector operations)

On level 2 subroutines we deal with operations between matrices and vectors.

General matrix vector multiplication: y <- a A x + b y

Lets explain how to work with these kind of operations.

1double A[2][2] = {
2    { 1, 1},
3    { 2, 3},
4};
 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main()
 5{
 6    double A[2][2] = {
 7        { 1, 2},
 8        { 1, 3},
 9    };
10
11    double x[] = {1, 3};
12    double y[] = {1, 2};
13    double z[2];
14
15    cblas_dcopy(2, y, 1, z, 1);
16    //          Lead dim       TRANS         DIM   a  Mat   LDA vect  b  vect
17    cblas_dgemv(CblasRowMajor, CblasNoTrans, 2, 2, 1, A[0], 2,  x, 1, 1, z, 1);
18
19    int i, j;
20    printf("A\n");
21    for (i = 0; i < 2; i++) {
22        for(j = 0; j < 2; j++) {
23            printf("%2.2lf\t", A[i][j]);
24        }
25        printf("\n");
26    }
27
28    printf("\nz\ty\tx\n");
29    for (i = 0; i < 2; ++i) {
30        printf("%2.2lf\t%2.2lf\t%2.2lf\n", z[i], y[i], x[i]);
31    }
32    return 0;
33}

Level 3 (Matrix only operations)

If you understood the previous example working with matrices at this point will be easy, for the sake of completeness in this post, lets put one last example that illustrates how to implement a matrix by matrix multiplication.

General Matrix Matrix multiplication: C <- a A transpose(B) + b C

For the following example consider that:

 1#include <stdio.h>
 2#include <cblas.h>
 3
 4int main()
 5{
 6    double A[] = {
 7        1, 2, 2,
 8        1, 3, 1
 9    };
10
11    double B[] = {
12        1, 3, 2,
13        1, 2, 1
14    };
15
16    double C[4] = {
17        2, 1,
18       -2, 1,
19    };
20
21    double Z[4];
22
23    //for (int i; i < 4; i++) Z[i] = C[i];
24    cblas_dcopy(4, C, 1, Z, 1);
25    //          Lead dim       TRANSA        TRANSB      m  n  k  a    Mat   Mat   b    Mat
26    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasTrans, 2, 2, 3, 2.0, A, 3, B, 3, 1.0, Z, 2);
27
28    printf("C <= 2.0 A B + C:\n");
29    for (int i = 0; i < 2; i++) {
30        for (int j = 0; j < 2; j++) {
31            if (!j) printf("  ");
32            printf("%2.2lf\t", Z[i*2 + j]);
33        }
34        printf("\n");
35    }
36
37    return 0;
38}

References

netlib