| Description | Pandas | Spark | Optimus |
|---|---|---|---|
| Read csv file | pd.read_csv() | spark.read.csv() | op.read.csv() |
| Create Dataframe | pd.Dataframe | df.createdataframe() | op.create.df() |
| Append Row | df.append | df.union() | df.row().append() |
| Column Mean | df.mean | df1.agg({"x": "max"}) | df.cols().mean() |
| Show Rows from Dataframe | df.head() | df.show() | df.show() |
| Drop Columns | df.drop() | df.drop() | df.cols().drop() |
| Sum all values in a Column | df.sum() | df1.agg({"x": "sum"}) function | df.cols().sum() |
| Save Dataframe to csv | df.to_csv() | df.write.csv() | df.save().csv() |
| Get a value by index | df.get() | NA | NA |
| Get the mode of a column | df.mode() | NI | df.cols().mode() |
| Cast a Column | df.astype() | df.column.cast() | df.cols().cast(), astype() as alias |
| Substract 2 dataframes | df.sub() | NI | NI |
| Merge to dataframes | pd.concat() | df.union() | optimus.concat() |
| Apply a user defined fucntion to a column | df.apply(func) | fn = F.udf(labmbda x:x+1, DoubleType()) df.withColumn('disp1', n(df.disp)) | df.cols().apply(func) |
| Group rows | df.groupby() | df.groupby() | df.groupby() |
| Joint operation between to dataframes | df.join() | df.join() | df.join() |
| Fill Null values with x | df.fillna() | df.fillna() | df.fillna() |
| Get the max number of a Column | df.max() | df1.agg({"x": "max"}) | df.cols().max() |
| Reset index | reset_index() | NA | NA |
Last active
August 22, 2018 16:05
-
-
Save argenisleon/201653deced5294fb461c1cb103140d7 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment