########################################################################### # Removes two sentences from out of eight sentences. Does this for the whole .group files # Written by Kyuchul Yoon ( kyoon@ling.osu.edu ) # If you've got 50 files with each file containing 8 sentences, you choose two numbers. # Then the script divides each file into 6 sentences and 2 sentences ############################################################################ # Specify files and folders form Select files comment Choose one from each set. These will be omitted from training sets choice SentenceNum1: 7 button 1 button 2 button 3 button 4 button 5 button 6 button 7 button 8 choice SentenceNum2: 8 button 1 button 2 button 3 button 4 button 5 button 6 button 7 button 8 word inputFileExt group word sentSeparator SFN integer colNumForSentSeparator 16 word outputFileExt .cross-validate endform #outFile$ = "selected.txt" Create Strings as file list... fileList *.'inputFileExt$' Sort numFiles = Get number of strings pause 'numFiles' labeled textgrids identified. Continue? # Loop throught each file for iFile to numFiles # Get the input filename select Strings fileList doneFile$ = Get string... iFile Read Strings from raw text file... 'doneFile$' Rename... myStrings numLines = Get number of strings # Initialize the index for the number of sentences iSentence = 1 # Loop through each line for iLine to numLines ##### By examining the file numbers to omit, decide which output file to use if (sentenceNum1 <> iSentence) and (sentenceNum2 <> iSentence) outFileToUse$ = "selected.txt" else outFileToUse$ = "omitted.txt" endif currentLine$ = Get string... iLine tempString$ = currentLine$ ####### Get the column string that the user designated as separating sentences ####### for iColumn to (colNumForSentSeparator-1) lenCurrentLine = length(tempString$) iTab = index(tempString$, tab$) tempString$ = right$(tempString$, (lenCurrentLine-iTab)) endfor ##################### End of "gettting to" the column string ################ #### Now, actually extract the first three letters which could be "SFN" sentSep$ = left$(tempString$, 3) # If the line is indeed the last token (its POS being SFN), then increase the sentence number if (sentSep$ = "SFN") iSentence = iSentence + 1 endif # Write out the line according to the filename determined above fileappend 'outFileToUse$' 'currentLine$''newline$' endfor select Strings myStrings Remove endfor select Strings fileList Remove #### END OF SCRIPT ####