With hangul .txt file; 1. get rid of within-file file IDs; cat 302000.txt | sed 's/[0-9].*\.txt\#\#//g' > tmp1 2. delete blank lines; cat tmp1 | tr -s '\012' '\012' > tmp1 3. delete the first blank line; cat tmp1 | wc 46 968 6849 cat tmp1 | tail -45 > tmp2 cat 302000.tx3 | wc 45 968 6848 4. split the file; split -l 1 tmp2 302000-- Replace 4 with the following; csplit -zsk -f 302000- tmp2 1 {*}