User Tools

Site Tools


linux:scripts:remove_lines_not_matching_a_pattern

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

linux:scripts:remove_lines_not_matching_a_pattern [2012/08/20 14:18] (current)
amon created
Line 1: Line 1:
 +A dadtabase loading problem. A CSV file comes in and has some rows which don't match a pattern (for example an item in a column needs to have an underscore and a number at the end).
  
 +<code bash>
 +importFile="​importFile.csv"​
 +badRowFile="​errorRows_file.txt"​
 +
 +#Pattern we want for correct rows:
 +## 1: Two columns with any data '​.*,​.*,'​
 +## 2: A third column where:
 +## 2.1: The text content (anything '​.*'​)
 +## 2.2: Ends with and underscore followed by a number '​_[0-9]*,'​
 +## Note: The comma on the end so we make sure this pattern is at the end of the text field (the comma signalling the end of the column)
 +## 3: At least one column of anything after that
 +correctPattern="​.*,​.*,​.*_[0-9]*,​.*,"​
 +
 +#Output anything that DOESN'​T match the pattern ( '​-v'​ to negate grep).
 +grep -v "​${correctPattern}"​ "​${importFile}"​ > ${badRowFile}
 +#If an output file is generated (there was one or more rows not matching pattern)
 +if [ -f "​${badRowFile}"​ ]
 +then
 +    echo "​badRowFile Found: ${badRowFile}"​
 +    ######################################################​
 +    #Do whatever notification you require
 +    ######################################################​
 +    ​
 +    ######################################################​
 +    #Clean dirty data from import file
 +    ######################################################​
 +    # sed
 +    ## 1: -i - Inline, make changes to the file rather than output the contents
 +    ## 2: -e - specifies an expression (mostly handy with multiple expressions):​ "/<​pattern>/​! d"
 +    ## 2.1: Double quotes so the variable for the pattern is expanded
 +    ## 2.2: Normally '​d'​ deletes whole line that matches, '​!d'​ deletes lines that DON'T match
 +    #Adding "! d" (the space prevents command expansion).
 +    sed -i -e "/​${correctPattern}/​! d" "​${importFile}"​
 +fi
 +
 +#Remove ${badRowFile}
 +rm -f "​${badRowFile}"​
 +</​code>​
linux/scripts/remove_lines_not_matching_a_pattern.txt · Last modified: 2012/08/20 14:18 by amon