sample xml file.
Source file: emp.xml
<xml>
<employee>
<id>123</id>
<Name>Test1</Name>
<Branch>ECE</Branch>
</employee>
<employee>
<id>453</id>
<Name>Test2</Name>
<Branch>EEE</Branch>
</employee>
<employee>
<id>789</id>
<Name>Test3</Name>
<Branch>CSE</Branch>
</employee>
<employee>
<id>100</id>
<Name>Test4</Name>
<Branch>IT</Branch>
</employee>
</xml>
1. download hivexmlserde-1.0.5.3. jar
2. copy above jar file in lib of apache home folder in Hadoop.
3. Copy the given xml file to hdfs location. Here i Copied emp.xml file to /hive/externaldata/xml folder.
hdfs dfs -put /home/emp.xml /hive/externaldata/xml
4. Login Hive, and run following command.
hive> create external table xml_file (id int,
Name string,
Branch string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml. XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/employee/ id/text()",
"column.xpath.Name"="/ employee/Name/text()",
"column.xpath.Branch"="/ employee/Branch/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml. XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io. IgnoreKeyTextOutputFormat'
location '/hive/externaldata/xml/'
TBLPROPERTIES (
"xmlinput.start"="<employee",
"xmlinput.end"="</employee>"
);
5. hive>select * from xml_file;
No comments:
Post a Comment