Skip to content

Instantly share code, notes, and snippets.

@argenisleon
Last active August 22, 2018 16:05
Show Gist options
  • Select an option

  • Save argenisleon/201653deced5294fb461c1cb103140d7 to your computer and use it in GitHub Desktop.

Select an option

Save argenisleon/201653deced5294fb461c1cb103140d7 to your computer and use it in GitHub Desktop.
Description Pandas Spark Optimus
Read csv file pd.read_csv() spark.read.csv() op.read.csv()
Create Dataframe pd.Dataframe df.createdataframe() op.create.df()
Append Row df.append df.union() df.row().append()
Column Mean df.mean df1.agg({"x": "max"}) df.cols().mean()
Show Rows from Dataframe df.head() df.show() df.show()
Drop Columns df.drop() df.drop() df.cols().drop()
Sum all values in a Column df.sum() df1.agg({"x": "sum"}) function df.cols().sum()
Save Dataframe to csv df.to_csv() df.write.csv() df.save().csv()
Get a value by index df.get() NA NA
Get the mode of a column df.mode() NI df.cols().mode()
Cast a Column df.astype() df.column.cast() df.cols().cast(), astype() as alias
Substract 2 dataframes df.sub() NI NI
Merge to dataframes pd.concat() df.union() optimus.concat()
Apply a user defined fucntion to a column df.apply(func) fn = F.udf(labmbda x:x+1, DoubleType()) df.withColumn('disp1', n(df.disp)) df.cols().apply(func)
Group rows df.groupby() df.groupby() df.groupby()
Joint operation between to dataframes df.join() df.join() df.join()
Fill Null values with x df.fillna() df.fillna() df.fillna()
Get the max number of a Column df.max() df1.agg({"x": "max"}) df.cols().max()
Reset index reset_index() NA NA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment