Xan
Filter rows with empty or "NA" values in the ScientificName column
xan filter '(ScientificName eq "NA") or (len(trim(ScientificName)) eq 0)' taxon_name_gnverified_col.tsv
Filter rows with empty TaxonomicName in kew-species-list.csv and save to a new file
xan filter !'(len(trim(TaxonomicName)) eq 0)' kew-species-list.csv > kew-species-list-nona.csv
Select specific columns from a file
xan select TaxonomicName kew-species-list-nona.csv > taxon_names.csv
(not a xan command) Use gnverifier
gnverifier -f csv -s 1 taxon_names.csv > taxon_name_gnverified_col.csv
View some stats
xan stats taxon_name_gnverified_col.csv | xan v
Deduplicate rows based on ScientificName
xan dedup -s ScientificName taxon_name_gnverified_col.csv > taxon_name_gnverified_col_deduped.csv
Left join two files on ScientificName
xan join --left TaxonomicName kew-species-list-nona.csv ScientificName taxon_name_gnverified_col_deduped.csv > kew-species-list-nona-gnverified-col.csv
Standardize a date field
xan map 'strftime(datetime(col("Last Seen On"), "%d-%m-%Y"), "%Y-%m-%d")' LastSeenISO joined.csv > joined_standardized.csv
xan map 'strftime(datetime(col("Last Seen On"), "%d-%m-%Y"), "%Y-%m-%d")' \ LastSeenISO kew-species-list-nona-gnverified-col.csv > joined.tmp && mv joined.tmp kew-species-list-nona-gnverified-col.csv
Script
xan filter !'(len(trim(TaxonomicName)) eq 0)' kew-species-list.csv > kew-species-list-nona.csv
xan select TaxonomicName kew-species-list-nona.csv > taxon_names.csv
gnverifier -f csv -s 1 taxon_names.csv > taxon_name_gnverified_col.csv
xan dedup -s ScientificName taxon_name_gnverified_col.csv > taxon_name_gnverified_col_deduped.csv
xan join --left TaxonomicName kew-species-list-nona.csv ScientificName taxon_name_gnverified_col_deduped.csv > kew-species-list-nona-gnverified-col.csv
xan map 'strftime(datetime(col("Last Seen On"), "%d-%m-%Y"), "%Y-%m-%d")' \ LastSeenISO kew-species-list-nona-gnverified-col.csv > joined.tmp && mv joined.tmp kew-species-list-nona-gnverified-col.csv