EXPERIMENTS

#algorithm#slabsstart columnsreduced columnstotal CRF featuresaccuracycorrectness
1parafac241455020070.9577.81
2parafac241455020071.5177.55
2aparafac241454518071.5177.67
2bparafac241455522071.3877.54
2cparafac241456024071.4577.81
2dparafac241456526071.3977.67
2eparafac241452510071.5677.75
3parafac241455020071.4577.54
4parafac241455020071.3477.53
5parafac241455020070.7077.75
6parafac2/mlp4145254873.7476.16
7parafac2/mlp + parafac241452514873.2579.61
8parafac2/mlp+105feats41452515374.1777.89
9parafac2/mlp+fspace/mlp4145509674.1676.52
10parafac2/mlp+fspace/mlp+fspace504145509674.42 77.47
11KLT--145505070.8976.34
12klt to MLP--145504872.7775.19
13klt over avgs--145505069.9876.39
14klt over averages to mlp--145504873.1275.48
15parafac2 - PLP51452512571.9077.84
16parafac2 - PLP,PLP261452515072.6476.48
20parafac2 - AFRL51452512572.7378.60
21parafac2/mlp - AFRL5145254875.4877.99
22parafac2 + parafac2/mlp - AFRL51452517374.0080.75

EXPERIMENT INFO

Experiment 1:
4 slabs are: 44pos,61pos,44lin,61lin.
Calculate train & test w/in same script.
Matrix weighting: P2{i} = P{i}*H
Scale both train & test using qnnorm, norms2vars, qncopy.

Experiment 2:
4 slabs are: 44pos,61pos,44lin,61lin.
Matrix weighting: P2{i} = P{i}*Cc{i}; (where Cc has 4 slabs, each with a diagonal corresponding to that slab in P)

Experiments 2a-d:
As experiment 2, with different reductions

Experiment 3:
Same 4 slabs, same scaling
Matrix weighting: transposed P:
    P2{i}=Cc{i}*P{i}';

Experiment 4:
Same 4 slabs, same scaling
Matrix weighting: inverse C and transposed P
    Cc{i}(j,j) = 1/(C(i,j));
    P2{i}=Cc{i}*P{i}';

Experiment 5:
Same 4 slabs
Matrix weighting as in experiment 4
Use train.4space.red50.norm instead of varnorm

Experiment 6:
Start with features from experiment 2 (25 features each slab,100 total), unnormalized
Train 48-phone MLP, then CRF.

Experiment 7:
Concatenate the 48 features from Experiment 6, unnormalized, with the 100 features from experiment 2e, normalized and retrain CRF.

Experiment 8:
Reduce the 4space matrix to 25 dimensions, train an MLP. (Experiment 6)
Concatenate the 48 new MLP features to the 105 old MLP features.
Train the CRF on 153 features.

Experiment 9:
Build an MLP on 100 4space features (exp 6)
Build an MLP on 50 SVD fspace features (previous)
Combine for 96 features, train CRF

Experiment 10:
Concatenate the 96 features of experiment 9 with the 50 features from SVD reduction, scaled.

Experiment 11:
Do a K-L transform over the original posteriors, and keep only the first 50 features. Train the CRF. Transform the test set using the same statistics, again keeping only the first 50 features for decoding.

Experiment 12:
Do the K-L transform as in 11, then train an MLP over those features, then the CRF.

Experiment 13:
Because we do the SVD/parafac over the average values, not all of the values, derive the KLT statistics over only the posterior average of each feature. Then Transform the data, keeping only the first 50 features, and train the CRF.

Experiment 14:
As before, but train an MLP on the 50 features.

Experiment 15:
Add PLP features as a 5th slab.
First, subtract the mean over plp features (train & test).
This means, use qnnorm to get means of first 13 features, change the second half (variances) to 1, use qncopy to substract the means, then recalculate deltas (which turn out to be the same anyway, the variances don't affect the derivatives).
Then, using 3-state labels, get average value for each of the 39 features for the 145 states.
Add the 145x39 matrix as a 5th slab to the parafac2 algorithm.
Reduce to 25 dimensions, then calculate 'fspace' for each of the 5 slabs, giving 125 dimensions for each frame.
Do variance normalization over the output features.
Train and test CRF.

Experiment 16:
Add the squares of the PLP features as a 6th slab, as above.
Reduce the matrices to 25 dimensions. Use the 39 plp features to project onto both the plp and plpsquared slabs.

Experiment 17:
Use the 125 features of exp15 to train a 48-output MLP, train a CRF.

Experiment 18:
Use the 150 features of exp16 to train a 48-output MLP, train a CRF.

Experiment 19:
Take the better of 17 or 18 and combine with the non-MLP features to train a CRF.

Experiment 20:
Use the averages of the 36 AFRL features over the 145 phone states as the 5th slab. (first four slabs as before, 61 and 44 linear and posterior average activations). Reduce to 25 dimensions each slab.
Do variance normalization on output before giving 125 features to CRF.

Experiment 21:
Take the 125 features from experiment 20, and train a 48-label MLP. Train the CRF on the 48 features.

Experiment 22:
Combine the varnorm AFRL features of experiment 20 with the mlpfwd ARFL features of experiment 21 and train the CRF.

More combination experiments to come..