Posts

Showing posts from September, 2017

Convert HIVE table to AVRO format and export as AVRO file

Step 1: Create an new table using AVRO SERDE based off the original table in HIVE. You can do it in HUE data browser: CREATE TABLE avro_test_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES (     'avro.schema.literal'='{       "namespace": "testnamespace.avro",       "name": "testavro",       "type": "record",       "fields": [ {"name":"strt_tstmp","type":"string"},{"name":"end_tstmp","type":"string"},{"name":"stts_cd","type":"int"}]     }'); This will create a new table in AVRO compatible format in HIVE. Step 2: Load data from the original table ...

Load data from CSV into HIVE table using HUE browser

It may be little tricky to load the data from a CSV file into a HIVE table. Here is a quick command that can be triggered from HUE editor. Steps: 1. Upload your CSV file that contains column data only (no headers) into use case directory or application directory in HDFS 2. Run the following command in the HIVE data broswer LOAD DATA  INPATH "/data/applications/appname/table_test_data/testdata.csv" OVERWRITE INTO TABLE testschema.tablename; 3. This will overwrite all the contents in the table with the data from csv file. so existing data in the table will be lost Make sure the table is already created in the HIVE. You can create the table as follows: CREATE TABLE   tablename( ·   strt_tstmp string , end_tstmp string , stts_cd int , ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ·   STORED AS TEXTFILE  

java.nio.file.NoSuchFileException: hdfs:/nameservice1/user HDFS Scala program

At the time of writing this, I could not find an effective native Scala API to copy and move the files. The most common recommendation was to use java.nio.* package. UPDATE : The java.nio.* approach may not work on HDFS always.  So found the following solution that works. Move files from one directory to another using org.apache.hadoop.fs.FileUtil.copy API val fs = FileSystem.get(new Configuration())         val conf = new org.apache.hadoop.conf.Configuration()         val srcFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())         val dstFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())         val dstPath = new org.apache.hadoop.fs.Path(DEST_FILE_DIR)         for (file <- fileList) {           // The 5th parameter indicates whether source should be deleted or not           FileUtil.co...