-
-
Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
import sys | |
from awsglue.transforms import * | |
from awsglue.utils import getResolvedOptions | |
from pyspark.context import SparkContext | |
from awsglue.context import GlueContext | |
from awsglue.dynamicframe import DynamicFrame | |
from awsglue.job import Job | |
args = getResolvedOptions(sys.argv, ['JOB_NAME']) | |
sc = SparkContext() | |
glueContext = GlueContext(sc) | |
spark = glueContext.spark_session | |
job = Job(glueContext) | |
##Read Data from Salesforce using DataDirect JDBC driver in to DataFrame | |
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=<token>").option("dbtable", "SFORCE.OPPORTUNITY").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "[email protected]").option("password", "pass123").load() | |
job.init(args['JOB_NAME'], args) | |
##Convert DataFrames to AWS Glue's DynamicFrames Object | |
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df") | |
##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue | |
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type = "s3", connection_options = {"path": "s3://glueuserdata"}, format = "csv", transformation_ctx = "datasink4") | |
job.commit() |
Rudresh,
Unfortunately, I haven't tried it with Scala.
Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?
I need to keep the log of above job in another file and db .How can I keep the log in csv file?Can anyone please help me on this
Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?
From what I have read, you can have your query in the option 'dbtable'
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=").option("dbtable", "your query").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "[email protected]").option("password", "pass123").load()
Hi,
How can i copy all the tables in salesforce through this script to s3?
Thanks in advance!
Technically, you can. You would have to iterate through all the tables and load it up. You might have to change the script a bit though.
Can You please Help me with that since I am very new to AWS glue .
i tired :
query ="show tables"
for i in query:
source_df =spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=xxxl").option("StmtCallLimit", "0").option("dbtable", i)......load()
can you please help me read the data in csv and write the dataframe in salesforce table
I tried this code
val df = sparkSession.read.format("com.databricks.spark.csv").option("header", "true").load("your bucket location")
df.printSchema()
df.write.format("com.springml.spark.salesforce").option("login","https://test.salesforce.com/").option("username", "username").option("password","password+token").option("datasetName", "tableName").save()
I got the issue inavliSfObjectfault error
Hello,
have you ever tried with custom OpenEdge DB?
What part of the script should change in this case?
Progress provides no information.
Thanks
Hey, Did you try to read the database via Scala in AWS Glue? If yes it would be great if you could share some code sample.