Skip to content

Instantly share code, notes, and snippets.

@nrchandan
Created April 29, 2014 08:54
Show Gist options
  • Save nrchandan/11394440 to your computer and use it in GitHub Desktop.
Save nrchandan/11394440 to your computer and use it in GitHub Desktop.
import pyspark
class Point(object):
'''this class being used as container'''
pass
def to_point_obj(point_as_dict):
'''convert a dict representation of a point to Point object'''
p = Point()
p.x = point_as_dict['x']
p.y = point_as_dict['y']
return p
def add_two_points(point_obj1, point_obj2):
print type(point_obj1), type(point_obj2)
point_obj1.x += point_obj2.x
point_obj1.y += point_obj2.y
return point_obj1
def zero_point():
p = Point()
p.x = p.y = 0
return p
sc = pyspark.SparkContext('local', 'test_app')
a = sc.parallelize([{'x':1, 'y':1}, {'x':2, 'y':2}, {'x':3, 'y':3}])
b = a.map(to_point_obj) # convert to an RDD of Point objects
c = b.fold(zero_point(), add_two_points)
@nrchandan
Copy link
Author

The above code fails with python PicklingError:

PicklingError: Can't pickle : attribute lookup main.Point failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment