GitHub is in the business of collaboration and connecting people to projects they are interested in. Search plays a key role in discovering new projects, and we are looking for someone to join us in making GitHub search an outstanding experience.
At the heart of the GitHub search experience is ElasticSearch, a Lucene-based distributed search and analytics engine running on the Java virtual machine. As we have expanded the number and type of things that can be searched, managing the search cluster has become a full time job.
Currently we have over two billion documents stored in our search clusters, and we are serving six hundred thousand search requests per day. We are looking for someone to scale our infrastructure to handle growth over the next several years. Capacity planning and performance tuning of the JVM and the ElasticSearch cluster are important parts of this role. You will also be working to improve search results by refining queries and the text analysis.
Some of the qualities we are looking for are:
- A good understanding of information retrieval concepts.
- Experience working with large JVM systems.
- Desire to streamline search operations.
"Wow! I'm excited about changing the world through collaboration, and I love working with Lucene and the JVM." If that is your response after reading through all of this, then we would be ecstatic to hear from you!
Drop us a line at [email protected] and tell us a little about yourself and how you want to make GitHub search a better experience. We would like to hear about the following items:
- What is an example of how you have used Lucene (Solr, ElasticSearch)?
- How have you tackled system performance problems?
- Why do you think you would be great at this position?
We're excited to hear from you.
See also the GitHub Jobs posting. It has the same content as this gist, but it looks much more official!