How to skip headers when reading a CSV file in S3 and creating a table in AWS Athena

Question

I am trying to read csv file from s3 bucket and create a table in AWS Athena. My table when created is unable to skip the header information of my CSV file.

Query Example :

CREATE EXTERNAL TABLE IF NOT EXISTS table_name (   `event_type_id`
     string,   `customer_id` string,   `date` string,   `email` string )
     ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
     WITH
     SERDEPROPERTIES (   "separatorChar" = "|",   "quoteChar"     = "\"" )
     LOCATION 's3://location/' 
     TBLPROPERTIES ("skip.header.line.count"="1");

This doesn't seem to work. Is there any other way that I could get through this?

Archana · Answer 1 · Sep 4, 2018

This is a known deficiency. The best method I've seen was tweeted by Eric Hammond:

...WHERE date NOT LIKE '#%'

This appears to skip header lines during a Query. I'm not sure how it works, but it might be a method for skipping NULLs.

To know more about Migrating to AWS, join our AWS Cloud Migration Training. Also, If you wish to master the principles in AWS and work on step-by-step tasks that have significant relevance for the world of business, industry professionals developed the AWS syllabus.