I am trying to build a recommendation system using Non-negative matrix factorization. Using scikit-learn NMF as the model, I fit my data, resulting in a certain loss(i.e., reconstruction error). Then I generate recommendation for new data using the inverse_transform method.
Now I do the same using another model I built in TensorFlow. The reconstruction error after training is close to that obtained using sklearn's approach earlier. However, neither are the latent factors similar to one another nor the final recommendations.
One difference between the 2 approaches that I am aware of is: In sklearn, I am using the Coordinate Descent solver whereas in TensorFlow, I am using the AdamOptimizer which is based on Gradient Descent. Everything else seems to be the same:
- Loss function used is the Frobenius Norm
- No regularization in both cases
- Tested on the same data using same number of latent dimensions
Relevant code that I am using:
1. scikit-learn approach:
model = NMF(alpha=0.0, init='random', l1_ratio=0.0, max_iter=200,
n_components=2, random_state=0, shuffle=False, solver='cd', tol=0.0001,
verbose=0)
model.fit(data)
result = model.inverse_transform(model.transform(data))
2. TensorFlow approach:
w = tf.get_variable(initializer=tf.abs(tf.random_normal((data.shape[0],
2))), constraint=lambda p: tf.maximum(0., p))
h = tf.get_variable(initializer=tf.abs(tf.random_normal((2,
data.shape[1]))), constraint=lambda p: tf.maximum(0., p))
loss = tf.sqrt(tf.reduce_sum(tf.squared_difference(x, tf.matmul(w, h))))
My question is that if the recommendations generated by these 2 approaches do not match, then how can I determine which are the right ones? Based on my use case, sklearn's NMF is giving me good results, but not the TensorFlow implementation. How can I achieve the same using my custom implementation?