I have been learning for myself for several months artificial intelligence through a project of character recognition and transcription of handwriting. Until now I have successfully used Keras, Theano and Tensorflow by implementing CNN, CTC neural networks.

Today, I try to use Gaussian mixture models, the first step towards hidden markov models with Gaussian emission. To do so, I used the sklearn mixture with pca reduction to select the best model with Akaike and Bayesian information criterion. With type of covariance Full for Aic which provides a nice U-curve and Tied for Bic, because with Full covariance Bic gives just a linear curve. With 12.000 samples, I get the best model at 60 n-components for Aic and 120 n-components for Bic.

My input images have 64 pixels aside which represent only the capital letters of the English alphabet, 26 categories numbered from 0 to 25.

The fit method of Sklearn GaussianMixture ignore labels and the predict method returns the position of the component (0 to 59 or 0 to 119) into the n-components regarding the probabilities.

How to retrieve the original label the position of the character in a list using sklearn GaussianMixture ?