Databricks - incorrect header check
This post I would like to show you how we can fix the problem of "Incorrect header check" received while fetching the data from hive table.
Actual message "
SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 4 times, most recent failure: Lost task 0.3 in stage 63.0 (TID 3506, 10.1.1.7, executor 28): java.io.IOException: incorrect header check.
To better understand the scenario, let me explain how the data got loaded in the hive table. we are storing the data from source systems to RAW layer. Here raw layer is landing area where we are picking the file from source system and loading as it is. Only difference from source and RAW is the compressed format in RAW.
Now look at the table script below.
drop table if exists raw_data.rw_aa_addisplay;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create table raw_data.rw_addisplay
(
day string ,
url string,
url_clean string,
page_country string ,
)
PARTITIONED BY (file_dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\""
)
LOCATION
'/rawlocation/'
Hive table can read csv files from this location even if the files are compressed in gzip format.
here comes the issue - while loading the file from source to raw location, pulled the file and stored as csv file with the extension as .gz which is gzip format. File is not actually gzip but extension. this made hive in confused state and generated the above issue.
Once the file compressed and stored in the location , this issue got resolved.
Actual message "
SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 4 times, most recent failure: Lost task 0.3 in stage 63.0 (TID 3506, 10.1.1.7, executor 28): java.io.IOException: incorrect header check.
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
at
java.io.InputStream.read(InputStream.java:101)
"To better understand the scenario, let me explain how the data got loaded in the hive table. we are storing the data from source systems to RAW layer. Here raw layer is landing area where we are picking the file from source system and loading as it is. Only difference from source and RAW is the compressed format in RAW.
Now look at the table script below.
drop table if exists raw_data.rw_aa_addisplay;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create table raw_data.rw_addisplay
(
day string ,
url string,
url_clean string,
page_country string ,
)
PARTITIONED BY (file_dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\""
)
LOCATION
'/rawlocation/'
Hive table can read csv files from this location even if the files are compressed in gzip format.
here comes the issue - while loading the file from source to raw location, pulled the file and stored as csv file with the extension as .gz which is gzip format. File is not actually gzip but extension. this made hive in confused state and generated the above issue.
Once the file compressed and stored in the location , this issue got resolved.
Excellent idea!!! I really enjoyed reading your post. Thank you for your efforts . Share more like this.
ReplyDeleteHadoop Training In Porur
Artificial Intelligence Course In Porur
German Classes in Anna Nagar
RPA Training in T Nagar
Data Science Training in Porur
Software Testing Training in Chennai
SEO Training in omr
Ethical Hacking Course in OMR
IELTS Coaching In Velachery
German Classes in T Nagar
Great Article! I got too much information from this post. Thanks for sharing such a helpful article. Click here to more information about it
ReplyDeletenice
ReplyDeleteGlad to visit this blog, really helpful. Gathered lots of information and waiting to see more updates.
ReplyDeleteRPA Training Institute in Chennai
RPA Certification Course
RPA Training Near Me
Nice blog, very informative content.Thanks for sharing, waiting for the next update…
ReplyDeleteMobile Testing Training in Chennai
Mobile Testing Course in Chennai
Nice article, its very informative content..thanks for sharing...Waiting for the next update.
ReplyDeleteLoadRunner Training in Chennai
Loadrunner Course in Chennai
I read this blog, Nice article...Thanks for sharing and waiting for the next...
ReplyDeletedata science study material
data scientist for beginners
Nice blog, very informative content.Thanks for sharing, waiting for the next update…
ReplyDeleteaws tutorial
learn aws
Nice article, its very informative content..thanks for sharing...Waiting for the next update.
ReplyDeleteJbpm Course in Chennai
Jbpm Training in Chennai
Thanks for sharing a good article with us. It is very helpful to us, if you want to know more about accounting software our website is helpful to you...
ReplyDeleteAccounting Software Singapore
PSG Grant Accounting Software
E invoicing Singapore
Great Article Artificial Intelligence Projects
ReplyDeleteProject Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai Project Centers in Chennai