This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from graphframes.lib import AggregateMessages as AM | |
from graphframes.examples import Graphs | |
from pyspark.sql.functions import sum as sqlsum | |
g = Graphs(spark).friends() # Get example graph | |
# For each user, sum the ages of the adjacent users | |
msgToSrc = AM.dst["age"] | |
msgToDst = AM.src["age"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import glob | |
import shutil | |
import uuid | |
from pyspark.sql.readwriter import DataFrameWriter | |
def csv_single(self, path, **options): | |
""" | |
Write the DataFrame as a single CSV file at the specified path. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Script that tests and times Relik's relation extraction and entity linking on the GraphFrames Paper: https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf""" | |
import timeit | |
import warnings | |
from pprint import pprint | |
from relik import Relik # type: ignore | |
from relik.inference.data.objects import RelikOutput # type: ignore | |
# Squash Relik's warnings for prettier screenshots | |
warnings.simplefilter("ignore") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pypdf import PdfReader | |
# Load the PDF. The GraphFrames paper normally resides at | |
# https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf | |
reader = PdfReader("data/grades_graphframes.pdf") | |
# Extract text from all pages | |
text = "\n".join([page.extract_text() for page in reader.pages if page.extract_text()]) | |
# Write it to a text file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can see it linked many topics that are related - Apache Phoenix - but not actually mentioned in the text... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Prompt: Find all instances of files containing the term 'motif' or 'graphlet' in this folder | |
or any below it. List the filenames, then print the total count of unique files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I can't figure out why this unit test is failing with this error: | |
> [error] Uncaught exception when running org.graphframes.lib.ConnectedComponentsSuite: java.lang.OutOfMemoryError: Java | |
heap space sbt.ForkMain$ForkError: java.lang.OutOfMemoryError: Java heap space | |
The test is an 8 node, 6 edge graph of two components and two dangling vertices. WTF heap space? I cleaned up the `Dockerfile` | |
below because it was on wonky versions and tried the same commands there... no go. Same exception. The weird thing is that | |
CI does pass these tests... so I don't get what is going wrong. | |
HOW YOU CAN HELP: Please run this command and tell me if the tests pass: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can also look for a more complex motif: a directed square. We will find all instances of a directed square in the graph. | |
<div data-lang="python" markdown="1"> | |
{% highlight python %} | |
# G8: Directed Square | |
paths = g.find("(a)-[e]->(b); (b)-[e2]->(c); (c)-[e3]->(d); (d)-[e4]->(a)") | |
four_edge_count(paths).show() | |
{% endhighlight %} | |
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import kuzu | |
import kuzu.connection | |
import kuzu.database | |
def create_tables(conn: kuzu.connection.Connection) -> None: | |
try: | |
# Create a Person node table | |
conn.execute( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Dockerfile for a Spark environment with Python 3.10. The image is based on the miniconda3 image | |
# and installs OpenJDK 17, Spark 3.5.1 with Hadoop 3 and Scala 2.13 and Poetry. The image then | |
# installs the OpenJDK 17 and the Python packages specified in the pyproject.toml file. | |
FROM continuumio/miniconda3 | |
RUN apt update && \ | |
apt-get install -y curl apt-transport-https openjdk-17-jdk-headless wget build-essential git \ | |
autoconf automake libtool pkg-config libpq5 libpq-dev && \ | |
apt-get clean && \ | |
rm -rf /var/lib/apt/lists/* |
NewerOlder