Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created December 16, 2025 02:25
Show Gist options
  • Select an option

  • Save rjurney/4994b83d752ef58c8264a90bbe788d76 to your computer and use it in GitHub Desktop.

Select an option

Save rjurney/4994b83d752ef58c8264a90bbe788d76 to your computer and use it in GitHub Desktop.
Some commands to examine the data after you run the pipeline
threads = spark.read.parquet("data/threaded_emails.parquet")
threads.select("jwz_thread_id", "id", "date", "thread_depth", "subject").orderBy("jwz_thread_id","date").limit(20).show(20, False)
In [17]: threads.select("jwz_thread_id", "id", "date", "thread_depth", "subject").orderBy("jwz_thread_id","date").limit(20).show(20, False)
+---------------------------------------------+---------------------------------------------+-------------------+------------+---------------------------------+
|jwz_thread_id |id |date |thread_depth|subject |
+---------------------------------------------+---------------------------------------------+-------------------+------------+---------------------------------+
|<1000099.1075858574579.JavaMail.evans@thyme> |<1000099.1075858574579.JavaMail.evans@thyme> |2001-06-25 16:34:13|1 |Oakhill Attorney Contacts |
|<1000099.1075858574579.JavaMail.evans@thyme> |<32912150.1075858554251.JavaMail.evans@thyme>|2001-06-25 16:36:41|1 |FW: Oakhill Attorney Contacts |
|<1000099.1075858574579.JavaMail.evans@thyme> |<30443262.1075858574720.JavaMail.evans@thyme>|2001-06-25 20:09:52|1 |Re: FW: Oakhill Attorney Contacts|
|<1000099.1075858574579.JavaMail.evans@thyme> |<23981452.1075858554340.JavaMail.evans@thyme>|2001-06-25 21:06:59|1 |RE: FW: Oakhill Attorney Contacts|
|<10001276.1075862177804.JavaMail.evans@thyme>|<10001276.1075862177804.JavaMail.evans@thyme>|2001-11-08 16:49:34|0 |RE: East Gas Hierarchy Structure |
|<1000142.1075852789603.JavaMail.evans@thyme> |<1000142.1075852789603.JavaMail.evans@thyme> |2001-07-19 13:09:48|0 |FW: PJM Day ahead product |
|<1000163.1075859635769.JavaMail.evans@thyme> |<15355978.1075859637169.JavaMail.evans@thyme>|2000-12-07 14:40:00|1 |JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<23055388.1075842307522.JavaMail.evans@thyme>|2000-12-07 14:40:00|1 |JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<1000163.1075859635769.JavaMail.evans@thyme> |2000-12-07 14:40:00|1 |JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<31007029.1075842329969.JavaMail.evans@thyme>|2000-12-07 14:40:00|1 |JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<5326348.1075842307544.JavaMail.evans@thyme> |2000-12-07 14:46:00|1 |Re: JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<17451267.1075842307823.JavaMail.evans@thyme>|2000-12-08 11:55:00|1 |Re: JDI Master |
|<1000163.1075859635769.JavaMail.evans@thyme> |<7267032.1075842330108.JavaMail.evans@thyme> |2000-12-08 11:55:00|1 |Re: JDI Master |
|<10001823.1075845003199.JavaMail.evans@thyme>|<10001823.1075845003199.JavaMail.evans@thyme>|2000-05-17 16:05:00|1 |Re: Rob McGrory |
|<10001823.1075845003199.JavaMail.evans@thyme>|<10941170.1075845047261.JavaMail.evans@thyme>|2000-05-17 16:05:00|1 |Re: Rob McGrory |
|<10001823.1075845003199.JavaMail.evans@thyme>|<24820373.1075844265302.JavaMail.evans@thyme>|2001-04-26 19:28:00|1 |Rob McGrory |
|<10001823.1075845003199.JavaMail.evans@thyme>|<5929173.1075844201345.JavaMail.evans@thyme> |2001-04-26 19:28:00|1 |Rob McGrory |
|<10001823.1075845003199.JavaMail.evans@thyme>|<28423122.1075844235358.JavaMail.evans@thyme>|2001-04-26 19:28:00|1 |Rob McGrory |
|<10001987.1075860874313.JavaMail.evans@thyme>|<1449535.1075860769362.JavaMail.evans@thyme> |2002-02-19 18:32:21|1 |MMBtu content |
|<10001987.1075860874313.JavaMail.evans@thyme>|<10001987.1075860874313.JavaMail.evans@thyme>|2002-02-19 18:32:21|1 |MMBtu content |
+---------------------------------------------+---------------------------------------------+-------------------+------------+---------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment