Databricks - Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
We were trying to load data from Azure data lake store Raw layer to Azure data lake store analytics layer using hive query. Raw layer is having tables with text format whereas in analytics layer it is ORC format. This code is written inside the notebook and the notebook is called from Azure data factory.
Raw layer gets the data from FTP and this layer is like landing area and hence all the data coming in from source systems are stored as it is without any modification.
But at the time of loading the data, databricks throws the error and it says the error is caused by java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
after some research we fond that this error is due to the difference in serede properties of source and destination tables.
before
CREATE TABLE analytics_data.ay_contact
(
contactsystemnumber bigint,
currencycode string,
mailingaddresscountrycode string,
mailingadpoliticaldivision1cd string,
file_load_date date
)
PARTITIONED BY (
file_dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/mnt/analytics/contact'
TBLPROPERTIES (
error went away once we changed the row format to serede from delimited.
after
CREATE TABLE analytics_data.ay_contact
(
contactsystemnumber bigint,
currencycode string,
mailingaddresscountrycode string,
mailingadpoliticaldivision1cd string,
file_load_date date
)
PARTITIONED BY (
file_dt date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/mnt/analytics/contact'
TBLPROPERTIES (
Raw layer gets the data from FTP and this layer is like landing area and hence all the data coming in from source systems are stored as it is without any modification.
But at the time of loading the data, databricks throws the error and it says the error is caused by java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
after some research we fond that this error is due to the difference in serede properties of source and destination tables.
before
CREATE TABLE analytics_data.ay_contact
(
contactsystemnumber bigint,
currencycode string,
mailingaddresscountrycode string,
mailingadpoliticaldivision1cd string,
file_load_date date
)
PARTITIONED BY (
file_dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/mnt/analytics/contact'
TBLPROPERTIES (
'transient_lastDdlTime'='1527633997');
after
CREATE TABLE analytics_data.ay_contact
(
contactsystemnumber bigint,
currencycode string,
mailingaddresscountrycode string,
mailingadpoliticaldivision1cd string,
file_load_date date
)
PARTITIONED BY (
file_dt date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/mnt/analytics/contact'
TBLPROPERTIES (
'transient_lastDdlTime'='1527633997');
Excellent and very cool idea and great content of different kinds of the valuable information's.
ReplyDeleteHadoop Admin Training in Chennai
Hadoop Administration Course
Informatica MDM Training in Chennai
IELTS Training in Chennai
Japanese Language Course in Chennai
Spoken English in Chennai
Hadoop Admin Training in Porur
Hadoop Admin Training in Adyar
This is really nice post, I found and love this content. I will prefer this, thanks for sharing. Data Analytics For Business.
ReplyDeleteYou are providing a post that is very useful for developing my knowledge and I learn more info from your blog.
ReplyDeleteHadoop Training In Anna Nagar
Ethical Hacking Course in Tambaram
Digital Marketing Course in OMR
AWS Training in T Nagar
German Classes in OMR
SEO Training in T Nagar
Spoken English Classes in Velachery
Python Training in Chennai
Artificial Intelligence Course in Tambaram
Graphic Design Courses in Porur
Great Article Artificial Intelligence Projects
ReplyDeleteProject Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai Project Centers in Chennai
Wonderful Blog.... Thanks for sharing with us...
ReplyDeleteHadoop Training in Chennai
Hadoop Training in Bangalore
Big Data Online Training
Big Data Training in Coimbatore