java.nio.file.NoSuchFileException: hdfs:/nameservice1/user HDFS Scala program



At the time of writing this, I could not find an effective native Scala API to copy and move the files. The most common recommendation was to use java.nio.* package.

UPDATE: The java.nio.* approach may not work on HDFS always.  So found the following solution that works.

Move files from one directory to another using org.apache.hadoop.fs.FileUtil.copy API

val fs = FileSystem.get(new Configuration())
        val conf = new org.apache.hadoop.conf.Configuration()
        val srcFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())
        val dstFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())
        val dstPath = new org.apache.hadoop.fs.Path(DEST_FILE_DIR)

        for (file <- fileList) {
          // The 5th parameter indicates whether source should be deleted or not
          FileUtil.copy(srcFs, file, dstFs, dstPath, true, conf)



Old solution using java.nio.* APIs

ex:

//correct way
Path s = new File("C:\\test\\input\\FlumeData.123.avro").toPath();
Path d =  new File("C:\\test\output\\FlumeData.123.avro").toPath();
Files.move(s, d, StandardCopyOption.REPLACE_EXISTING);

//incorrect way
Path s = new File("C:\\test\\input\\FlumeData.123.avro").toPath();
Path d =  new File("C:\\test\output").toPath();
Files.move(s, d, StandardCopyOption.REPLACE_EXISTING);

So the essence is the Files.move() requires the complete path of file, not just the directory.

The below exception could occur when you don't pass the entire path (including the file name) in the Files.copy method.

Exception:

java.nio.file.NoSuchFileException: hdfs:/nameservice1/user/xxxxx/inputDir/FlumeData.xxx.avro -> hdfs:/nameservice1/user/xxxx/output
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:390)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1347)
        at hadoop.scala.FileSorter$$anonfun$moveProcessedFilesToArchiveDir$1.apply(FileSorter.scala:114)
        at hadoop.scala.FileSorter$$anonfun$moveProcessedFilesToArchiveDir$1.apply(FileSorter.scala:112)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at hadoop.scala.FileSorter.moveProcessedFilesToArchiveDir(FileSorter.scala:112)
        at hadoop.scala.FileSorter.processFile(FileSorter.scala:78)
        at hadoop.scala.FileSorter.init(FileSorter.scala:37)
        at hadoop.scala.Driver$.main(Driver.scala:32)
        at hadoop.scala.Driver.main(Driver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Comments

Popular posts from this blog

Load data from CSV into HIVE table using HUE browser

Gitlab change project visibility from private to internal

Setting property 'keystoreFile' did not find a matching property. No Certificate file specified or invalid file format