Skip to content

Instantly share code, notes, and snippets.

View saulius's full-sized avatar
:octocat:
java.lang.OutOfMemoryError

Saulius Grigaliunas saulius

:octocat:
java.lang.OutOfMemoryError
View GitHub Profile

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.


Effective Engineer - Notes

What's an Effective Engineer?

This document has moved!

It's now here, in The Programmer's Compendium. The content is the same as before, but being part of the compendium means that it's actively maintained.

@longcao
longcao / SparkCopyPostgres.scala
Last active September 11, 2024 18:55
COPY Spark DataFrame rows to PostgreSQL (via JDBC)
import java.io.InputStream
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.{ DataFrame, Row }
import org.postgresql.copy.CopyManager
import org.postgresql.core.BaseConnection
val jdbcUrl = s"jdbc:postgresql://..." // db credentials elided
val connectionProperties = {
@joepie91
joepie91 / vpn.md
Last active May 17, 2025 02:36
Don't use VPN services.

Don't use VPN services.

No, seriously, don't. You're probably reading this because you've asked what VPN service to use, and this is the answer.

Note: The content in this post does not apply to using VPN for their intended purpose; that is, as a virtual private (internal) network. It only applies to using it as a glorified proxy, which is what every third-party "VPN provider" does.

  • A Russian translation of this article can be found here, contributed by Timur Demin.
  • A Turkish translation can be found here, contributed by agyild.
  • There's also this article about VPN services, which is honestly better written (and has more cat pictures!) than my article.
import scalaz._
import scalaz.std.list._
import scalaz.syntax.monad._
import scalaz.syntax.monoid._
import scalaz.syntax.traverse.{ToFunctorOps => _, _}
class Foo[F[+_] : Monad, A, B](val execute: Foo.Request[A] => F[B], val joins: Foo.Request[A] => B => List[Foo.Request[A]])(implicit J: Foo.Join[A, B]) {
def bar: Foo[({type l[+a]=WriterT[F, Log[A, B], a]})#l, A, B] = {
type TraceW[FF[+_], +AA] = WriterT[FF, Log[A, B], AA]
@epiphani
epiphani / CDHTez.md
Last active February 14, 2024 08:03
Getting Tez enabled on CDH5.4+

So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.

First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.

At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo

@non
non / answer.md
Last active February 28, 2025 11:46
answer @nuttycom

What is the appeal of dynamically-typed languages?

Kris Nuttycombe asks:

I genuinely wish I understood the appeal of unityped languages better. Can someone who really knows both well-typed and unityped explain?

I think the terms well-typed and unityped are a bit of question-begging here (you might as well say good-typed versus bad-typed), so instead I will say statically-typed and dynamically-typed.

I'm going to approach this article using Scala to stand-in for static typing and Python for dynamic typing. I feel like I am credibly proficient both languages: I don't currently write a lot of Python, but I still have affection for the language, and have probably written hundreds of thousands of lines of Python code over the years.

@tsiege
tsiege / The Technical Interview Cheat Sheet.md
Last active May 14, 2025 04:34
This is my technical interview cheat sheet. Feel free to fork it or do whatever you want with it. PLEASE let me know if there are any errors or if anything crucial is missing. I will add more links soon.

ANNOUNCEMENT

I have moved this over to the Tech Interview Cheat Sheet Repo and has been expanded and even has code challenges you can run and practice against!






\

@candera
candera / ssh-repl.org
Last active February 9, 2019 07:50
ssh-repl

Embedding an SSH-accessible REPL in a Clojure process

N.B. This is now a library, thanks to the efforts of the wonderful @mtnygard. And the README does a good job of making clear just how terrible an idea it is to actually do this. :)

As any Clojurist knows, the REPL is an incredibly handy development tool. It can also be useful as a tool for debugging running programs. Of course, this raises the question of how to limit access to the REPL to authorized parties. With the Apache SSHD library, you can embed an SSH server in any JVM process. It takes only a little code to hook this up to a REPL, and to limit access either by public key or

@subelsky
subelsky / large_redshift_tables.sql
Created April 18, 2014 17:39
Quick SQL command to find large tables in redshift
-- based on http://stackoverflow.com/questions/21767780/how-to-find-size-of-database-schema-table-in-redshift
SELECT name AS table_name, ROUND((COUNT(*) / 1024.0),2) as "Size in Gigabytes"
FROM stv_blocklist
INNER JOIN
(SELECT DISTINCT id, name FROM stv_tbl_perm) names
ON names.id = stv_blocklist.tbl
GROUP BY name
ORDER BY "Size in Gigabytes" DESC