Steps to get data from Amazon S3 using Spark(Scala)
For Retrieving Root Access Keys(accessKeyId,secretAccessKey) of Amazon S3 please click here.
After getting accessKeyId, secretAccessKey of S3 we have to do write below code/logic to read or write data from Amazon S3.
After getting accessKeyId, secretAccessKey of S3 we have to do write below code/logic to read or write data from Amazon S3.
--------------Example of reading file from S3--------------------------------------
val accessKeyId = "AKIAI6OWVYUVEEXAMPLE1"
val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("s3n://testing/first_test.csv")
df.show()
//Output
+----------+--------------+------------+-----------+
| id | area | name | zip |
+----------+--------------+------------+-----------+
| 1 |BLOCK A | RAM |560097 |
| 2 |BLOCK B | RAJ |560091 |
| 3 |BLOCK C | ROHAN |560092 |
| 4 |BLOCK D |RAMESH |560092 |
| 5 |BLOCK E | RAMU |560098 |
+----------+--------------+------------+-----------+
----------------------------------------------------------------------------------------------
-----------------------Example to write csv file in S3-------------------------------
val accessKeyId = "AKIAI6OWVYUVEEXAMPLE1"
val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("s3n://testing/first_test.csv")
df.write.format("csv").option("header", "true").save("s3n://testingpython7878/write_today_UmeshDemo_01012019")
println("Written Sucessfully")
//Output
Written Sucessfully
--------------------------Once code execute successfully.--------------------------
-----------------------------------------------------------------------------------------------
--------------Example to write json file in S3----------------------------------------
val accessKeyId = "AKIAI6OWVYUVEEXAMPLE1"
val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
var df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/FileStore/tables/second.csv")
df.write.json("s3n://testing/write_JSONFolder_10052019")
println("written completed")
val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
var df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/FileStore/tables/second.csv")
df.write.json("s3n://testing/write_JSONFolder_10052019")
println("written completed")
//Output
Written Sucessfully
-----------------------Once code execute successfully.----------------------------
No comments:
Post a Comment