################################################################################## # 33.phrasal.category.penultimate.of.+1.following.token tier extractor for Wagon: # Written by Kyuchul Yoon ( kyoon@ling.osu.edu ) # Extracts from a set of .TextGrid.lab files 33.phrasal.category.penultimate.of.+1.following.token data field for Wagon training # The script assumes that you already have the TextGrid files labelled by professional K-ToBI labelers. # The script will read in all the TextGrid.lab files one by one from the directory 32.phrasal.category.penultimate.of.-1.previous.token # and write the output files into 10.wagon-features\33.phrasal.category.penultimate.of.+1.following.token # The filename of the output files are .wagon.33 # Assumes that you have a token/POS column in a file 056.PhrCat-big-original\ejk-tok.+1.penult.PhrCat # This script splits the second column of ultimate PhrasalCatetories into PhrCats of each sentence separated by 0 (S) ################################################################################## form Select files word tokenFile_(with_path) 056.PhrCat-big-original\ejk-tok.+1.penult.PhrCat word subFolderToProcess 10.wagon-features\32.phrasal.category.penultimate.of.-1.previous.token word fileExtOfDoneFiles wagon.32 word outputSubFolder 10.wagon-features\33.phrasal.category.penultimate.of.+1.following.token word tierNameToAdd PhrCat.penult.of.+1.previous.token choice outputFileExt: 1 button wagon.33 endform Read Strings from raw text file... 'tokenFile$' Rename... tokenFile # Get the list of filenames of TextGrid. files Create Strings as file list... fileList 'subFolderToProcess$'\*.'fileExtOfDoneFiles$' Sort numFiles = Get number of strings pause 'numFiles' labeled textgrids identified. Continue? # Initialize the line number for .PhrCat file. Will be used for giving info when discrepancy exists btw/ # the numbers of PhrCats and intervals. And iLineNumPhrCat for extracting one token at a time from the .PhrCat file. # And numPhrCat for number of PhrCats in one sentence in the .PhrCat file lineNumPhrCat = 1 iLineNumPhrCat = 1 numPhrCat = 0 # Loop throught each file for iFile to numFiles select Strings fileList # Get the name for a TextGrid file doneFile$ = Get string... iFile filePrefix$ = doneFile$ - fileExtOfDoneFiles$ Read from file... 'subFolderToProcess$'\'doneFile$' Rename... textGrid numIntervals = Get number of intervals... 1 # Get the number of tiers so that you can add an additional tier at the end numTiers = Get number of tiers Duplicate tier... 1 (numTiers+1) 'tierNameToAdd$' # Set the first/last interval text to naught Set interval text... (numTiers+1) 1 Set interval text... (numTiers+1) numIntervals for iToken from 2 to (numIntervals-1) select Strings tokenFile #### Word extractor (separated by either spaces or tabs in a line) #### #### Extract the first column. Easier than extracting the second column #### ##################################################### token$ = Get string... iLineNumPhrCat # To jump to the first PhrCat, i.e. penultimate PhrCat, identify the position of the first white space # Check for either spaces or tabs iSpaces = index(token$, " ") iTabs = index(token$, tab$) if (iSpaces <> 0) or (iTabs <> 0) penultPhrCat$ = left$(token$, (iSpaces-1)) endif # Put the PhrCat label into each interval select TextGrid textGrid # The last one interval should always be zero if iToken < (numIntervals-1) Set interval text... (numTiers+1) iToken 'penultPhrCat$' else Set interval text... (numTiers+1) iToken 0 endif iLineNumPhrCat = iLineNumPhrCat + 1 lineNumPhrCat = iLineNumPhrCat endfor Edit pause Write to text file... 'outputSubFolder$'\'filePrefix$''outputFileExt$' Remove endfor select Strings fileList plus Strings tokenFile Remove #### END OF SCRIPT ####