Skip to content

Instantly share code, notes, and snippets.

Created November 6, 2017 16:17
Show Gist options
  • Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Glue Job Script for reading data from DataDirect Salesforce JDBC driver and write it to S3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
##Read Data from Salesforce using DataDirect JDBC driver in to DataFrame
source_df ="jdbc").option("url","jdbc:datadirect:sforce://;SecurityToken=<token>").option("dbtable", "SFORCE.OPPORTUNITY").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "[email protected]").option("password", "pass123").load()
job.init(args['JOB_NAME'], args)
##Convert DataFrames to AWS Glue's DynamicFrames Object
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df")
##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type = "s3", connection_options = {"path": "s3://glueuserdata"}, format = "csv", transformation_ctx = "datasink4")
Copy link

Unfortunately, I haven't tried it with Scala.

Copy link

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

Copy link

SachinThanas commented Oct 16, 2018

I need to keep the log of above job in another file and db .How can I keep the log in csv file?Can anyone please help me on this

Copy link

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

From what I have read, you can have your query in the option 'dbtable'
source_df ="jdbc").option("url","jdbc:datadirect:sforce://;SecurityToken=").option("dbtable", "your query").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "[email protected]").option("password", "pass123").load()

Copy link

Sid-19 commented Oct 3, 2019

How can i copy all the tables in salesforce through this script to s3?
Thanks in advance!

Copy link

Technically, you can. You would have to iterate through all the tables and load it up. You might have to change the script a bit though.

Copy link

Sid-19 commented Oct 4, 2019

Can You please Help me with that since I am very new to AWS glue .

i tired :

query ="show tables"

for i in query:
source_df"jdbc").option("url","jdbc:datadirect:sforce://;SecurityToken=xxxl").option("StmtCallLimit", "0").option("dbtable", i)......load()

Copy link

can you please help me read the data in csv and write the dataframe in salesforce table

Copy link

I tried this code
val df ="com.databricks.spark.csv").option("header", "true").load("your bucket location")

df.write.format("com.springml.spark.salesforce").option("login","").option("username", "username").option("password","password+token").option("datasetName", "tableName").save()

I got the issue inavliSfObjectfault error

Copy link

have you ever tried with custom OpenEdge DB?
What part of the script should change in this case?
Progress provides no information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment