Databricks - incorrect header check
This post I would like to show you how we can fix the problem of "Incorrect header check" received while fetching the data from hive table.
Actual message "
SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 4 times, most recent failure: Lost task 0.3 in stage 63.0 (TID 3506, 10.1.1.7, executor 28): java.io.IOException: incorrect header check.
To better understand the scenario, let me explain how the data got loaded in the hive table. we are storing the data from source systems to RAW layer. Here raw layer is landing area where we are picking the file from source system and loading as it is. Only difference from source and RAW is the compressed format in RAW.
Now look at the table script below.
drop table if exists raw_data.rw_aa_addisplay;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create table raw_data.rw_addisplay
(
day string ,
url string,
url_clean string,
page_country string ,
)
PARTITIONED BY (file_dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\""
)
LOCATION
'/rawlocation/'
Hive table can read csv files from this location even if the files are compressed in gzip format.
here comes the issue - while loading the file from source to raw location, pulled the file and stored as csv file with the extension as .gz which is gzip format. File is not actually gzip but extension. this made hive in confused state and generated the above issue.
Once the file compressed and stored in the location , this issue got resolved.
Actual message "
SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 4 times, most recent failure: Lost task 0.3 in stage 63.0 (TID 3506, 10.1.1.7, executor 28): java.io.IOException: incorrect header check.
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
at
java.io.InputStream.read(InputStream.java:101)
"To better understand the scenario, let me explain how the data got loaded in the hive table. we are storing the data from source systems to RAW layer. Here raw layer is landing area where we are picking the file from source system and loading as it is. Only difference from source and RAW is the compressed format in RAW.
Now look at the table script below.
drop table if exists raw_data.rw_aa_addisplay;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create table raw_data.rw_addisplay
(
day string ,
url string,
url_clean string,
page_country string ,
)
PARTITIONED BY (file_dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\""
)
LOCATION
'/rawlocation/'
Hive table can read csv files from this location even if the files are compressed in gzip format.
here comes the issue - while loading the file from source to raw location, pulled the file and stored as csv file with the extension as .gz which is gzip format. File is not actually gzip but extension. this made hive in confused state and generated the above issue.
Once the file compressed and stored in the location , this issue got resolved.
Excellent idea!!! I really enjoyed reading your post. Thank you for your efforts . Share more like this.
ReplyDeleteHadoop Training In Porur
Artificial Intelligence Course In Porur
German Classes in Anna Nagar
RPA Training in T Nagar
Data Science Training in Porur
Software Testing Training in Chennai
SEO Training in omr
Ethical Hacking Course in OMR
IELTS Coaching In Velachery
German Classes in T Nagar
Great Article! I got too much information from this post. Thanks for sharing such a helpful article. Click here to more information about it
ReplyDeletenice
ReplyDeleteGlad to visit this blog, really helpful. Gathered lots of information and waiting to see more updates.
ReplyDeleteRPA Training Institute in Chennai
RPA Certification Course
RPA Training Near Me