Imagine you have a large \(m\times n\)-matrix \(D\) which consists of nonnegative real numbers, and we would like to decompose it as \(D = W H\) where \(W\) is a \(m\times k\)-matrix and \(H\) is \(k\times n\)-matrix for a relatively small \(k\) and both of the matrices consist of only nonnegative numbers. This is called non-negative matrix decomposition. I am going to use algorithms described in the paper Algorithms for Non-negative Matrix Factorization by D.D.Lee and H.S.Seung, and implement them in scala.
I am using the scala kernel almond for jupyter to write this blog post.
import $ivy.`org.scalanlp:breeze_2.13:1.0`
import breeze.linalg.{DenseVector,DenseMatrix}
def nnmd(D:DenseMatrix[Double],
k:Int,
costFn:(DenseMatrix[Double],DenseMatrix[Double])=>Double,
epocs:Int=500,
tol:Double=1e-2): (DenseMatrix[Double],DenseMatrix[Double],Int) = {
var w = DenseMatrix.rand[Double](D.rows, k)
var h = DenseMatrix.rand[Double](k, D.cols)
val u = DenseMatrix.fill[Double](D.rows, D.cols)(1.0)
var i = 0
while(i < epocs && tol < costFn(D, w*h)/D.size) {
val wt = w.t
val ht = h.t
val tm = D /:/ (w*h) - u
val et = h /:/ (wt*u)
val mu = w /:/ (u*ht)
h += et *:* (wt*tm)
w += mu *:* (tm*ht)
i += 1
}
return((w,h,i))
}
For this implementation, one can pass his/her own error function. Today, I am going to use the total square error function:
def totSq(A:DenseMatrix[Double], B:DenseMatrix[Double]) =
(A - B).map(x=>x*x).sum
Now, let us test the code on a random matrix of size (300,100):
val a = DenseMatrix.rand[Double](300,100)
val b = nnmd(a,30,totSq)
println("result=%f\n".format(totSq(a, b._1*b._2)/a.size))
result=0.049115