Databricks - incorrect header check

This post I would like to show you how we can fix the problem of "Incorrect header check" received while fetching the data from hive table.


Actual message "
SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 4 times, most recent failure: Lost task 0.3 in stage 63.0 (TID 3506, 10.1.1.7, executor 28): java.io.IOException: incorrect header check.

 at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)

    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)

    at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)

    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)

    at java.io.InputStream.read(InputStream.java:101)
"

To better understand the scenario, let me explain how the data got loaded in the hive table. we are storing the data from source systems to RAW layer. Here raw layer is landing area where we are picking the file from source system and loading as it is. Only difference from source and RAW is the compressed format in RAW.

Now look at the table script below.

drop table   if exists  raw_data.rw_aa_addisplay;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create table raw_data.rw_addisplay
(
    day                       string ,
    url                        string,
    url_clean              string,
    page_country       string ,
)
PARTITIONED BY (file_dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\""

LOCATION
  '/rawlocation/'


Hive table can read csv files from this location even if the files are compressed in gzip format.

here comes the issue - while loading the file from source to raw location, pulled the file and stored  as csv file with the extension  as .gz which is gzip format. File is not actually gzip but extension. this made hive in confused state and generated the above issue.

Once the file compressed and stored in the location , this issue got resolved.




Comments

  1. Great Article! I got too much information from this post. Thanks for sharing such a helpful article. Click here to more information about it

    ReplyDelete
  2. Glad to visit this blog, really helpful. Gathered lots of information and waiting to see more updates.
    RPA Training Institute in Chennai
    RPA Certification Course
    RPA Training Near Me

    ReplyDelete
  3. valuable blog,Informative content...thanks for sharing, Waiting for the next update...
    Cyber Security Course in Chennai
    Cyber Security Training in Chennai

    ReplyDelete
  4. Nice blog, very informative content.Thanks for sharing, waiting for the next update…

    Mobile Testing Training in Chennai
    Mobile Testing Course in Chennai

    ReplyDelete
  5. Nice article, its very informative content..thanks for sharing...Waiting for the next update.

    LoadRunner Training in Chennai
    Loadrunner Course in Chennai

    ReplyDelete
  6. I read this blog, Nice article...Thanks for sharing and waiting for the next...
    data science study material
    data scientist for beginners

    ReplyDelete
  7. Nice blog, very informative content.Thanks for sharing, waiting for the next update…
    aws tutorial
    learn aws

    ReplyDelete
  8. Nice article, its very informative content..thanks for sharing...Waiting for the next update.
    Jbpm Course in Chennai
    Jbpm Training in Chennai

    ReplyDelete
  9. Thanks for sharing a good article with us. It is very helpful to us, if you want to know more about accounting software our website is helpful to you...
    Accounting Software Singapore
    PSG Grant Accounting Software
    E invoicing Singapore

    ReplyDelete

Post a Comment

Popular posts from this blog

Hadoop - Hive - Load data from csv/xls files

Big Data : Big Table from Google