Spark-Big Data: Get Data from Amazon S3 using Spark(Scala)

Steps to get data from Amazon S3 using Spark(Scala)

For Retrieving Root Access Keys(accessKeyId,secretAccessKey) of Amazon S3 please click here.
After getting accessKeyId, secretAccessKey of S3 we have to do write below code/logic to read or write data from Amazon S3.

--------------Example of reading file from S3--------------------------------------

val accessKeyId = "AKIAI6OWVYUVEEXAMPLE1" val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb" sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId) sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey) val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("s3n://testing/first_test.csv") df.show()

//Output

+----------+--------------+------------+-----------+ | id | area | name | zip | +----------+--------------+------------+-----------+ | 1 |BLOCK A | RAM |560097 | | 2 |BLOCK B | RAJ |560091 |

| 3 |BLOCK C | ROHAN |560092 | | 4 |BLOCK D |RAMESH |560092 | | 5 |BLOCK E | RAMU |560098 | +----------+--------------+------------+-----------+

----------------------------------------------------------------------------------------------

-----------------------Example to write csv file in S3-------------------------------

//Output

Written Sucessfully

--------------------------Once code execute successfully.--------------------------

-----------------------------------------------------------------------------------------------

--------------Example to write json file in S3----------------------------------------

val accessKeyId = "AKIAI6OWVYUVEEXAMPLE1"
val secretAccessKey = "u2TF/Byp3oDeJo4MnFsx5xw3HKz7zVbOdEghteb"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
var df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/FileStore/tables/second.csv")
df.write.json("s3n://testing/write_JSONFolder_10052019")
println("written completed")

//Output

Written Sucessfully

-----------------------Once code execute successfully.----------------------------

Spark-Big Data

Wednesday, 3 July 2019

Get Data from Amazon S3 using Spark(Scala)

Steps to get data from Amazon S3 using Spark(Scala)

No comments:

Post a Comment

Integrating Apache Hive with Spark

Search This Blog