|
# |
Experiment |
Accuracy |
Correctness |
Notes |
|
1 |
Baseline |
72.12 |
75.71 |
Iteration 48 |
|
2 |
SVD_0 |
71.55 |
73.75 |
Reduction to 50 dimensions |
|
3 |
Baseline + SVD_0 |
72.11 |
75.88 |
Using 50 dim SVD |
|
4 |
SVD_1 |
71.63 |
73.75 |
50 dimensions |
|
5 |
SVD_2 |
71.62 |
73.69 |
40 dimensions |
|
6 |
SVD_3 |
71.56 |
73.73 |
50 dimensions |
|
7 |
SVD_1_Kmeans_sub-frames |
71.65 |
73.80 |
50 dims, iteration 44 |
|
8 |
SVD_1_Kmeans_sub-frames_sub-phones |
71.66 |
73.82 |
50 dims, iteration 29 |
|
9 |
Kmeans all, preSVD |
71.52 |
73.53 |
7 clusters, 50 dims, iteration 27 |
|
10 |
Kmeans all, postSVD |
71.34 |
73.30 |
9 clusters, 50 dims, iteration 46 |
|
11 |
Baseline + Kmeans all |
72.36 |
76.11 |
105 features (exp 1)+ 336 features (exp 9) |
|
12 |
SVD_0 + Kmeans all |
72.07 |
74.31 |
145 features (exp 2)+ 336 features (exp 9) |
|
13 |
Posteriors + Linear + LinKLT |
70.31 |
72.15 |
315 features reduced to SVD50, 145 cos |
|
14 |
FeatureSpace to CRF |
71.18 |
73.00 |
50 features, 50 cosines |
|
15 |
FeatureSpace to MLP to CRF |
72.99 |
75.23 |
SVD50, MLP48, 48 feats to CRF |
|
16 |
FeatureSpace-MLP + Baseline |
73.46 |
77.32 |
153 features |
|
17 |
Baseline + FeatureSpace-MLP + FeatureSpace |
73.46 |
77.37 |
203 features |
|
18 |
FeatureSpace, Scaled |
72.05 |
74.56 |
50 features |
|
18.A |
FeatureSpace, Scaled + MLP48 (scaled) |
73.39 |
76.39 |
98 features |
|
18.B |
FeatureSpace, Scaled + Baseline | 71.76 |
76.21 |
155 features |
|
18.C |
FeatureSpace, Scaled + Baseline + MLP48 | 73.19 |
77.75 |
203 features |
|
19 |
FeatureSpace, Scaled, to MLP to CRF |
72.99 |
75.26 |
48 features |
|
20 |
FeatureSpace, Full SVD to 50 dims, to CRF |
6.91 |
6.91 |
50 features |
|
20.A |
Fspace, Full SVD to 300 dims, to CRF | crashed on svd, blue |
? |
300 features |
|
20.B |
Fspace, Full SVD to 50 dims, scaled by 100, to CRF | 71.03 |
72.88 |
50 features |
|
| ||||
|
21 |
FeatureSpace, Phon.Feat. MLP to CRF |
71.93 |
75.20 |
44 features |
|
22 |
FeatureSpace, Full SVD, to MLP(48) to CRF |
72.82 |
74.95 |
48 features |
|
23 |
FspaceMLP48 + FspaceMLP44 to CRF |
73.24 |
77.16 |
92 features |
|
24 |
FspaceMLP48+FspaceMLP44+Baseline |
72.91 |
78.20 |
197 features |
|
25 |
Fspace, triphone SVD | 70.61 |
72.32 |
50 features |
|
25.A |
Fspace, triphone SVD,100 dims | 71.18 |
72.99 |
100 features |
|
25.B |
Fspace, triphone SVD, 100 dims, scaled; stopped training at 22 |
71.51 |
76.02 |
100 features |
|
26 |
MLP48+MLP44+Fspace50 | 73.29 |
77.15 |
142 features |
|
27 |
Fspace, triphone SVD to 100, + baseline | 72.07 |
75.62 |
205 features |
|
27.B |
Fspace, triphone SVD to 100, Scaled + baseline; stopped training at 14 | 69.77 |
77.17 |
205 features |
|
28 |
4space - 44pos,61pos,44lin,61lin | 3.32 |
9.65 |
200 features |
|
29 |
Fspace + baseline | 72.15 |
75.77 |
155 features |
|
30 |
Fspace from triphone SVD to 100, to MLP48 | 72.66 |
75.88 |
100 -> 48 features |
|
30.A |
Triphone SVD to 100 to MLP48 + Baseline | 71.76 |
77.38 |
153 features |
|
30.B |
Triphone SVD to 100 to MLP48 + Baseline + Triphone Fspace, Scaled; stopped training at 25; wrong number of states also | 70.46 |
78.68 |
203 features |
|
31 |
Fspace from triphone SVD to 100, to MLP44 | 70.68 |
74.92 |
100 -> 44 features |
|
32 |
Posteriors to KLT to CRF | 70.89 |
76.34 |
50 features, trained only to it 39 |
|
32.a |
Avg Posteriors to KLT to CRF | 69.98 |
76.39 |
50 features, trianed only to it 39 |
|
33 |
Posteriors to KLT to MLP to CRF | 72.77 |
75.19 |
48 features |
|
33.a |
Avg Posteriors to KLT to MLP to CRF | 73.12 |
75.48 |
48 features |
Explanation of Experiments:
1. Baseline: The MLP activation of each of 105 phonetic and sub-phonetic features for each frame of data. 105 state feature functions as input to the CRF. Labels are 48 phones, reduced to 39 for testing.
2. SVD_0: Calculate the average activation of each of the 105 features over each of 145 phone states (automatically aligned to 3 states per phone, one for silence). The 105x145 matrix undergoes SVD. Reduce to various sizes (50 is best). Calculate the cosine of each frame of data to each new �phone state� column in the P matrix. 145 state feature functions (cosines per frame) given to CRF to train. (Further tuning showed that 51 dimensions is ever-so-slightly better than 50 dimensions.)
3. Baseline + SVD_0: For each frame, concatenate the Baseline 105 MLP outputs and the SVD_0 145 cosines. Train the CRF on 250 state feature functions.
4. SVD_1: Look at the cosines generated by SVD_0. For each set of frames corresponding to a single phone, mark the frames for which the highest cosine is less than some threshhold. Set the threshhold so that, overall, 25% of frames are marked. For each phone, calculate the centroid of the MLP Activations for the frames that are marked. These centroids are the correct size to be appended to the original 105x145 matrix. Recalculate SVD on the augmented matrix, reduce the matrix to 50 dimensions, and recalculate the cosine of each frame to each of the (now) 193 phone states. Give 193 state feature functions for the CRF to train on.
5. SVD_2: Do exactly as above, starting with 193 cosines, finding a new threshhold to hold out 25% of frames, calculate new centroids to append to SVD matrix, etc. 241 state feature functions.
6. SVD_3: And again. 289 state feature functions.
7. SVD_1_ Kmeans_sub-frames: Instead of just finding the average of the sub-threshhold frames, use k-means clustering to determine the new features. That is, extract the frames that are below a threshhold, calculate k-means k=2 on the MLP activations of those frames, find the centroids of those two clusters, and add them to the original SVD matrix. ReSVD, ReReduce, ReCosine, ReTrain on 241 state feature functions.
8. SVD_1_Kmeans_sub-frames_sub-phones: As in experiment 7, but only calculate clusters and centroids for the sub-threshhold frames of those phones whose recognition accuracy score was below the accuracy score of the whole data set (8 phones). Append 16 new centroids to the original SVD, ReSVD, ReReduce, ReCosine, ReTrain on 161 state feature functions.
9. Kmeans all, pre-SVD: Start with the 105 feature frames � the MLP activations. Group the frames by phone label. For each group, run the k-means calculation over the frames for k=2 through k=10. The centroids of the resulting clusters form the matrix that then undergoes SVD and reduction to 50 dimensions. The number of state feature functions is dependent on how many clusters there were � there are (48*clusters) cosines/state feature functions. The best result was for 7 clusters.
10. Kmeans all, post-SVD: Start with the 145 state feature functions (cosines) resulting from SVD_0 (experiment 2). Group the frames by phone label, and run the k-means calculation for k=2 through k=10. Centroids become the SVD matrix, cosines of frames to SVD columns become the new state feature functions. Again, the number of state feature functions vary by how many clusters were made per phone. Best results were with 9 clusters.
11. Baseline + Kmeans all: Append the 105 features of original MLP data to the 336 features resulting from the best run of experiment 9. Train the CRF (no extra SVD required here, just mashing feature functions together). Results are non-significantly greater than the baseline alone.
12. SVD_0 + Kmeans all: Append the 145 features resulting from experiment 2 to the 336 features of experiment 9.
13. Posteriors+Linear+LinKLT: Concatenate the pfiles containing the output of the 105 MLP classifiers as softmax posteriors, linear outputs, and linear transformed outputs. This is a pfile with 315 features. Calculate the average value of each of the features over each of 145 phone states. This is the original SVD matrix. Perform SVD and reduce to various sizes. Use the frames of the pfile to get cosines to the 145 new phone vectors. Train on 145 cosine features, test on devtest.
14. FeatureSpace to CRF: Start with the 105x145 matrix (MLP features by phone states). Run the SVD and reduce to 50 features. Multiply the left (feature) matrix by the inverse diagonal matrix. Convert each frame of the original MLP data into this feature space, so now each frame has 50 features instead of 105. Train the CRF on this data. Tested dimensionality 47-53, best: 53feats - 71.25/73.05 it 47
15. FeatureSpace to MLP to CRF: Start as above. Train an MLP with the 50 features as input, 48 labels as output. Calculate the posteriors of all training and testing data for those 48 labels. Train the CRF on those 48 features.
16. Baseline + FSpace-MLP: Append the 48 Features resulting from the MLP to the 105 original features in the baseline experiment. Retrain CRF.
17. Baseline + FSpace-MLP + Fspace: To experiment 16, append the features from experiment 14 (50dims).
18. Fspace, scaled: Normalize the variation among the features derived from the SVD fspace. Train CRF and decode.
19. Fspace, scaled to MLP: Scale the F-space output, then retrain the phone MLP, then train CRF and decode.
20. Fspace, full svd: Instead of averaging the 105 features over the phones, take the SVD of the entire space - all MLP derived frames. Again, reduce the resulting matrices to 50 dimensions. The left matrix is 50x105, so do the fspace (non-cosine) experiments with this new SVD matrix. Train CRF and decode.
21. Fspace, phonological feature MLPs: Using the 105x145 matrix for SVD, derive the fspace version of the data. Retrain each of the 8 phon. feature class MLPs, combine for 44 features, train CRF and decode.
22. FSpace, Full SVD, to MLP to CRF: Using the features generated by reducing the SVD on the entire data matrix, train the 48-phone MLP. Use the output to train a CRF.
23. FspaceMLP48 + FspaceMLP44 to CRF: Combine the 48 phone features from experiment 15 with the 44 phonological features from experiment 21, retrain CRF and decode.
24. FspaceML48 + FspaceMLP44 + Baseline: Combine the 105 posteriors of the baseline to the features in experiment 23. Retrain CRF and decode.
25. Fspace, triphone SVD: For each triphone exhibited in the training set, if it shows up in more than 100 frames, get the average activation for each feature over all frames labeled with that triphone. This gives a 2598x105 matrix. Transpose, do SVD, use Fspace projection of 105 posteriors as input to CRF.
26. MLP48+MLP44+Fspace50: Take the features that result from training the phone-MLP (48) and the phone-feature-MLP (44) on the fspace features. Append them to the 50 fspace features. Train the CRF and decode.
27. Triphone Fspace to 100 + Baseline: Take the features derived from reducing the triphone fspace to 100 dimensions. Append to these features the 105 (non-SVD) baseline features.
28. 4space- 44pos,61pos,44lin,61lin: Use the PARAFAC2 algorithm. Start with 4 matrices representing the average activations from posterior or linear outputs over 145 phone states. Each type of activation in its own matrix, slab of a 3D matrix. Use PARAFAC2 to reduce the 145 to 50 (this is opposite how the other experiments worked, but makes sense in this context (?)). Each slab is reduced separately, but in a way contigent upon the other slabs. Multiply each slab by H, which is like S. Project the frames of the original data onto the appropriate slab to get 50 features per frame. Append the projections to get 200 features per frame. Use norm2vars.pl to scale the features to mean 0, std 1. Train a CRF and decode.
29. Fspace+baseline: Reduce the 105x145 SVD to 105x50. Project the frames into the 50dim Fspace (exp 14). Append to these features the 105 baseline features. Train CRF and decode.
32. Posteriors to KLT to CRF. Do a KL transform over the training data. Warp the training and test pfiles by the klt stats. Train the CRF and decode.
32a. Avg Posteriors to KLT to CRF. Since we took SVD over the averages, not the whole, do KLT over the averages, then warp the train and test data, then train the CRF.
33. Posteriors to KLT to MLP to CRF. Do a KL transform over the training data. Warp the training and test pfiles by the klt stats. Train a 48-output MLP over the KLT data. Train the CRF on the MLP outputs.
33a. Avg Posteriors to KLT to MLP to CRF. Do a KL transform over the averages of the training data. Warp the training and test pfiles by the klt stats. Train a 48-output MLP over the KLT data. Train the CRF on the MLP outputs.