My World of experiments with Technology

Posts

Showing posts from September, 2017

Convert HIVE table to AVRO format and export as AVRO file

September 22, 2017

Step 1: Create an new table using AVRO SERDE based off the original table in HIVE. You can do it in HUE data browser: CREATE TABLE avro_test_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.literal'='{ "namespace": "testnamespace.avro", "name": "testavro", "type": "record", "fields": [ {"name":"strt_tstmp","type":"string"},{"name":"end_tstmp","type":"string"},{"name":"stts_cd","type":"int"}] }'); This will create a new table in AVRO compatible format in HIVE. Step 2: Load data from the original table

Load data from CSV into HIVE table using HUE browser

September 22, 2017

It may be little tricky to load the data from a CSV file into a HIVE table. Here is a quick command that can be triggered from HUE editor. Steps: 1. Upload your CSV file that contains column data only (no headers) into use case directory or application directory in HDFS 2. Run the following command in the HIVE data broswer LOAD DATA INPATH "/data/applications/appname/table_test_data/testdata.csv" OVERWRITE INTO TABLE testschema.tablename; 3. This will overwrite all the contents in the table with the data from csv file. so existing data in the table will be lost Make sure the table is already created in the HIVE. You can create the table as follows: CREATE TABLE tablename( · strt_tstmp string , end_tstmp string , stts_cd int , ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' · STORED AS TEXTFILE

java.nio.file.NoSuchFileException: hdfs:/nameservice1/user HDFS Scala program

September 19, 2017

At the time of writing this, I could not find an effective native Scala API to copy and move the files. The most common recommendation was to use java.nio.* package. UPDATE : The java.nio.* approach may not work on HDFS always. So found the following solution that works. Move files from one directory to another using org.apache.hadoop.fs.FileUtil.copy API val fs = FileSystem.get(new Configuration()) val conf = new org.apache.hadoop.conf.Configuration() val srcFs = FileSystem.get(new org.apache.hadoop.conf.Configuration()) val dstFs = FileSystem.get(new org.apache.hadoop.conf.Configuration()) val dstPath = new org.apache.hadoop.fs.Path(DEST_FILE_DIR) for (file <- fileList) { // The 5th parameter indicates whether source should be deleted or not FileUtil.copy(srcFs, file, dstFs, dstPath, true, conf) Old solution using java.nio.* APIs ex: //correct way Path s = new File("C:\\test\\input\\FlumeDa