Convert HIVE table to AVRO format and export as AVRO file

Step 1:

Create an new table using AVRO SERDE based off the original table in HIVE. You can do it in HUE data browser:

CREATE TABLE avro_test_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "testnamespace.avro",
      "name": "testavro",
      "type": "record",
      "fields": [ {"name":"strt_tstmp","type":"string"},{"name":"end_tstmp","type":"string"},{"name":"stts_cd","type":"int"}]
    }');

This will create a new table in AVRO compatible format in HIVE.

Step 2: Load data from the original table

INSERT OVERWRITE TABLE avro_test_table SELECT *  FROM tablename;

Step 3: Go to HUE file browser and see if the file is created. It might be created with file names as some zeroes without any extension. You should look for this file in one of the sub-folders of your application directory for ex: /data/apps/appname/appname_db/avro_test_table/xxx

Step 4:
Run the following command from command line in Hadoop

hadoop fs -cat /data/apps/appname/appname_db/avro_test_table/xxx > testdata.avro

Step 5: Look for this avro file in your home directory in edge node.


Comments

Popular posts from this blog

Setting property 'keystoreFile' did not find a matching property. No Certificate file specified or invalid file format

Load data from CSV into HIVE table using HUE browser

Gitlab change project visibility from private to internal