########################################################################## # 01.morpheme.identity tier extractor for Wagon: Written by Kyuchul Yoon ( kyoon@ling.osu.edu ) # Extracts from a set of .TextGrid.lab files 01.morpheme.identity data field for Wagon training # The script assumes that you already have the TextGrid files labelled by professional K-ToBI labelers. # The script will read in all the TextGrid.lab files one by one from the directory 085.TextGrid-segmented-labeled2 # and write the output files into 10.wagon-features\01.morpheme.identity subdirectory. # The filename of the output files are .wagon.01 ########################################################################## form Select files word subFolderToProcess 085.TextGrid-segmented-labeled2 word fileExtOfDoneFiles TextGrid.lab word outputSubFolder 10.wagon-features\01.morpheme.identity word tierNameToAdd morph.identity choice outputFileExt: 1 button wagon.01 endform #################### Define morphemes to be identified in the feature columns ############# # Topic markers "eun/PAU", "do/PAU" # Subject markers "i/PCA" # etc. "go/PAD(ECS)", "eu-myeo(n)/ECS", "eo/ECS", "myeon-seo/ECS", "neun-de/ECS" #################### End of morpheme definition block ############################# # Get the list of filenames of TextGrid.lab files Create Strings as file list... fileList 'subFolderToProcess$'\*.'fileExtOfDoneFiles$' Sort numFiles = Get number of strings pause 'numFiles' labeled textgrids identified. Continue? # Loop throught each file for iFile to numFiles select Strings fileList # Get the name for a TextGrid.lab file doneFile$ = Get string... iFile Read from file... 'subFolderToProcess$'\'doneFile$' Rename... textGrid numIntervals = Get number of intervals... 1 # Get the number of tiers so that you can add an additional tier at the end numTiers = Get number of tiers Duplicate tier... 1 (numTiers+1) 'tierNameToAdd$' # Set the first/last interval text to naught Set interval text... (numTiers+1) 1 Set interval text... (numTiers+1) numIntervals ############# Loop through each interval (eojeol) and extract info ############### for iToken from 2 to (numIntervals-1) tempIntervalText$ = Get label of interval... (numTiers+1) iToken # ############### Extract "morpheme identity" tier ######################### # #### This script block can be used later when you extract Wagon info from textgrids #### # #### That way, you have more control later on over which morphemes to use #### # ############################################################## # # If the intervalText$ is one of those morphemes defined above, then set it, otherwise, put "0" # if (intervalText$ = "eun/PAU#" or intervalText$ = "do/PAU#" or intervalText$ = "i/PCA#" or intervalText$ = "go/ECS#" # ... or intervalText$ = "go/PAD#" or intervalText$ = "eu-myeo/ECS#" or intervalText$ = "eu-myeon/ECS#" # ... or intervalText$ = "eo/ECS#" or intervalText$ = "myeon-seo/ECS#" or intervalText$ = "neun-de/ECS#") # newIntervalText$ = intervalText$ - "#" # Set interval text... (numTiers+1) iToken 'newIntervalText$' # else # Set interval text... (numTiers+1) iToken 0 # endif ############### Version 2 of "morpheme identity" tier ######################### #### This revised block will just copy (minus "#" symbol) token/POS pairs to the new tier #### ################################################################# indexOfSharp = index(tempIntervalText$, "#") if indexOfSharp <> 0 intervalText$ = left$(tempIntervalText$, (indexOfSharp-1)) else intervalText$ = tempIntervalText$ endif Set interval text... (numTiers+1) iToken 'intervalText$' endfor Edit pause textgrid to be saved. Continue? Write to text file... 'outputSubFolder$'\'doneFile$'.'outputFileExt$' Remove endfor select Strings fileList Remove #### END OF SCRIPT ####