Skip to content

Instantly share code, notes, and snippets.

View softwaredoug's full-sized avatar
😇

Doug Turnbull softwaredoug

😇
View GitHub Profile
@softwaredoug
softwaredoug / i-dont-like-nbdev.md
Last active December 29, 2023 14:32
These are just notes from learning nbdev, that may turn out to be false, and I encourage that feedback.

I am working on a project contemplating the best use of notebooks in our search relevance workflow. We're a cross-disciplinary team of software engineers and data scientists. Recently, to decide best practices, I watched the two famous talks I don't like Notebooks by Joel Grus and I like notebooks by nbdev creator Jeremy Howard. As a senior dev, I want to have opinions for how my team should develop both the notebooks and any underlying libraries.

Positive things about nbdev and notebooks

  • Writing docs leads to better code - I have written better code when I know its being consumed as documentation by others, and needs to be read. I fully agree with the amazing feedback loop between writing and coding that creates much better libraries
  • Jupyter as a dev env - For some people, Jupyter is their preferred dev environment, and should be supported as such.
  • Philosophy - I generally agree with the philo
@softwaredoug
softwaredoug / TimestampPolicy.java
Created November 11, 2020 15:32
timestamp policy for Kafka
new CustomTimestampPolicyWithLimitedDelay<Long, SearchQueryEvent>(
new GetSearchQueryTimestampFunction(),
Duration.standardMinutes(5),
previousWatermark
);
@softwaredoug
softwaredoug / SearchTimestampFunction.java
Created November 11, 2020 15:32
Search timestamp function
public class GetSearchQueryTimestampFunction implements SerializableFunction<KafkaRecord<Long, SearchQueryEvent>, Instant> {
@Override
public Instant apply(KafkaRecord<Long, SearchQueryEvent> input) {
return input.getKV().getValue().searchTimestamp;
}
}
@softwaredoug
softwaredoug / KafkaSearchSource.java
Created November 11, 2020 15:30
Creating a Kafka Source
KafkaIO.<Long, SearchQueryEvent>read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopic("search_queries_topic")
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer(SearchQueryEventDeserializer.class)
@softwaredoug
softwaredoug / KafkaSearchSource.java
Last active October 28, 2020 14:59
Timestamp policy example with Kafka source with Apache BEAM
import com.google.api.services.cloudresourcemanager.CloudResourceManager;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.AtomicCoder;
import org.apache.beam.sdk.coders.CoderException;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.kafka.*;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.schemas.JavaFieldSchema;
import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
import org.apache.beam.sdk.schemas.annotations.SchemaCreate;
@softwaredoug
softwaredoug / kotlin-command-line.md
Created September 16, 2020 15:39
Kotlin gradle command line

Getting started with Kotlin? I had to string together the steps to get everything I needed from the command line. Thought I'd document the steps I went through w/ gradle, etc.

I prefer having everything working in the command-line and treat the IDE as secondary. So gradle first, IntelliJ second.

First, make sure you have gradle & a jdk installed.

Second, create a directory for your project and initialize a Gradle projects

mkdir kotlin-hello-world
@softwaredoug
softwaredoug / solr-opinons.md
Last active March 6, 2021 20:04
Opinions on using Solr effectively from Doug Turnbull

Solr needs opinions, because Solr documentation gives you way too many options. It's hard to navigate the best practices for using Solr. Some of my opinions based on dozens of Solr projects :)

Prefer preloading static, classic config files over managed schema, config API, or schemaless

Schema files are a good thing. They are declarative, and not letting them change at runtime prevents all kinds of security issues. Further, classic schema / solrconfig support all of Solr's functionality and are well documented with tons of examples online in blog articles and stackoverflow. Using managed schema or the config API takes a lot of experimentation.

Static configurations can also be easily version controlled. As I've learned as a long time Elasticsearch user, this is one of Solr's advantages. Having an API for changing every underlying config option of your index means finding the code that made the change is rather time consuming.

Static configuration is also good separation of concerns. You cleanly separate

@softwaredoug
softwaredoug / elasticsearch-debug-intellij.md
Last active April 12, 2020 15:34
Debug Elasticsearch in IntelliJ (in 2020)

You used to be able to debug Elasticsearch (and plugins) by doing

./gradlew run --debug-jvm

This would launch Elasticsearch in debug mode, whereby you'd subsequently connect IntelliJ the running Elasticsearch JVM. However, if you try this after Elasticsearch 7.5, you'll get the error:

Running elasticsearch in debug mode, node{::integTest-0} suspending until connected on debugPort 5005
@softwaredoug
softwaredoug / plural_misstems.txt
Last active February 21, 2020 13:13
Curated list of badly stemmed plural into a Solr / Elasticsearch synonyms file (as initially gathered by Mark Harwood, see linked issue in file)
# English minimal stemmer Plural misstems as a synonyms file
# NOT attempting to cover irregular plurals (feet/foot...)
#
# Gathered by Mark Harwood at Elastic
# https://github.com/elastic/elasticsearch/issues/42892
#
# Observation in curating these: some are both plurals and verbs
# such as 'harnesses' - Henry harnesses horses with harnesses
employees => employee

Write for yourself, not the audience.

Who is the audience - they're the people likely to read your writing. The audience is a useful tool. A means, not an end. Having your writing read and scrutinized gives you healthy skepticism towards your ideas. A counterbalance to ego. That's why it's so important to write to an audience, not write for them.

Don't aim to do the audience a "service" by writing. The audience can be wrong! For a variety of reasons groups of people are not always so smart compared to individuals. The dance between how individuals can be right/wrong and how groups can be right/wrong is what makes writing so thrilling. Truth lies in some tension between the two, not in one or the other. Write to be understood by the audience, but not liked by them.

I'm not just talking about socially or politically controversial topics. Mundane prose about boring cut & dry topics, can be frought with this tension between t