| # | algorithm | #slabs | start columns | reduced columns | total CRF features | accuracy | correctness |
| 1 | parafac2 | 4 | 145 | 50 | 200 | 70.95 | 77.81 |
| 2 | parafac2 | 4 | 145 | 50 | 200 | 71.51 | 77.55 | 2a | parafac2 | 4 | 145 | 45 | 180 | 71.51 | 77.67 |
| 2b | parafac2 | 4 | 145 | 55 | 220 | 71.38 | 77.54 |
| 2c | parafac2 | 4 | 145 | 60 | 240 | 71.45 | 77.81 |
| 2d | parafac2 | 4 | 145 | 65 | 260 | 71.39 | 77.67 |
| 2e | parafac2 | 4 | 145 | 25 | 100 | 71.56 | 77.75 |
| 3 | parafac2 | 4 | 145 | 50 | 200 | 71.45 | 77.54 |
| 4 | parafac2 | 4 | 145 | 50 | 200 | 71.34 | 77.53 |
| 5 | parafac2 | 4 | 145 | 50 | 200 | 70.70 | 77.75 |
| 6 | parafac2/mlp | 4 | 145 | 25 | 48 | 73.74 | 76.16 |
| 7 | parafac2/mlp + parafac2 | 4 | 145 | 25 | 148 | 73.25 | 79.61 |
| 8 | parafac2/mlp+105feats | 4 | 145 | 25 | 153 | 74.17 | 77.89 |
| 9 | parafac2/mlp+fspace/mlp | 4 | 145 | 50 | 96 | 74.16 | 76.52 |
| 10 | parafac2/mlp+fspace/mlp+fspace50 | 4 | 145 | 50 | 96 | 74.42 | 77.47 |
| 11 | KLT | -- | 145 | 50 | 50 | 70.89 | 76.34 |
| 12 | klt to MLP | -- | 145 | 50 | 48 | 72.77 | 75.19 |
| 13 | klt over avgs | -- | 145 | 50 | 50 | 69.98 | 76.39 |
| 14 | klt over averages to mlp | -- | 145 | 50 | 48 | 73.12 | 75.48 |
| 15 | parafac2 - PLP | 5 | 145 | 25 | 125 | 71.90 | 77.84 |
| 16 | parafac2 - PLP,PLP2 | 6 | 145 | 25 | 150 | 72.64 | 76.48 |
| 20 | parafac2 - AFRL | 5 | 145 | 25 | 125 | 72.73 | 78.60 |
| 21 | parafac2/mlp - AFRL | 5 | 145 | 25 | 48 | 75.48 | 77.99 |
| 22 | parafac2 + parafac2/mlp - AFRL | 5 | 145 | 25 | 173 | 74.00 | 80.75 |
Experiment 1:
4 slabs are: 44pos,61pos,44lin,61lin.
Calculate train & test w/in same script.
Matrix weighting: P2{i} = P{i}*H
Scale both train & test using qnnorm, norms2vars, qncopy.
Experiment 2:
4 slabs are: 44pos,61pos,44lin,61lin.
Matrix weighting: P2{i} = P{i}*Cc{i}; (where Cc has 4 slabs, each
with a diagonal corresponding to that slab in P)
Experiments 2a-d:
As experiment 2, with different reductions
Experiment 3:
Same 4 slabs, same scaling
Matrix weighting: transposed P:
P2{i}=Cc{i}*P{i}';
Experiment 4:
Same 4 slabs, same scaling
Matrix weighting: inverse C and transposed P
Cc{i}(j,j) = 1/(C(i,j));
P2{i}=Cc{i}*P{i}';
Experiment 5:
Same 4 slabs
Matrix weighting as in experiment 4
Use train.4space.red50.norm instead of varnorm
Experiment 6:
Start with features from experiment 2 (25 features each slab,100
total), unnormalized
Train 48-phone MLP, then CRF.
Experiment 7:
Concatenate the 48 features from Experiment 6, unnormalized, with the 100 features
from experiment 2e, normalized and retrain CRF.
Experiment 8:
Reduce the 4space matrix to 25 dimensions, train an MLP. (Experiment 6)
Concatenate the 48 new MLP features to the 105 old MLP features.
Train the CRF on 153 features.
Experiment 9:
Build an MLP on 100 4space features (exp 6)
Build an MLP on 50 SVD fspace features (previous)
Combine for 96 features, train CRF
Experiment 10:
Concatenate the 96 features of experiment 9 with the 50 features from
SVD reduction, scaled.
Experiment 11:
Do a K-L transform over the original posteriors, and keep only the
first 50 features. Train the CRF. Transform the test set using the
same statistics, again keeping only the first 50 features for
decoding.
Experiment 12:
Do the K-L transform as in 11, then train an MLP over those features,
then the CRF.
Experiment 13:
Because we do the SVD/parafac over the average values, not all of the
values, derive the KLT statistics over only the posterior average of
each feature. Then Transform the data, keeping only the first 50
features, and train the CRF.
Experiment 14:
As before, but train an MLP on the 50 features.
Experiment 15:
Add PLP features as a 5th slab.
First, subtract the mean over plp features (train & test).
This means, use qnnorm to get means of first 13 features, change the second half
(variances) to 1, use qncopy to substract the means, then recalculate
deltas (which turn out to be the same anyway, the variances don't
affect the derivatives).
Then, using 3-state labels, get average
value for each of the 39 features for the 145 states.
Add the 145x39 matrix as a 5th slab to the parafac2 algorithm.
Reduce to 25 dimensions, then calculate 'fspace' for each of the 5 slabs, giving
125 dimensions for each frame.
Do variance normalization over the output features.
Train and test CRF.
Experiment 16:
Add the squares of the PLP features as a 6th slab, as above.
Reduce the matrices to 25 dimensions.
Use the 39 plp features to project onto both the plp and plpsquared slabs.
Experiment 17:
Use the 125 features of exp15 to train a 48-output MLP, train a CRF.
Experiment 18:
Use the 150 features of exp16 to train a 48-output MLP, train a CRF.
Experiment 19:
Take the better of 17 or 18 and combine with the non-MLP features to
train a CRF.
Experiment 20:
Use the averages of the 36 AFRL features over the 145 phone states as
the 5th slab. (first four slabs as before, 61 and 44 linear and
posterior average activations). Reduce to 25 dimensions each
slab.
Do variance normalization on output before giving 125 features to CRF.
Experiment 21:
Take the 125 features from experiment 20, and train a 48-label MLP.
Train the CRF on the 48 features.
Experiment 22:
Combine the varnorm AFRL features of experiment 20 with the mlpfwd
ARFL features of experiment 21 and train the CRF.
More combination experiments to come..