Skip to content

Instantly share code, notes, and snippets.

@jhyland87
Created May 31, 2017 17:03
Show Gist options
  • Save jhyland87/487124367fcf7f9360befaa3b886610b to your computer and use it in GitHub Desktop.
Save jhyland87/487124367fcf7f9360befaa3b886610b to your computer and use it in GitHub Desktop.
---- / both DE and DA candidates must have basic python
*Basic python:
different data structures, various operations on them
iterations, traversing through data structures
Data Engineer job id 7122
• We are looking for a senior Data Integration Engineer that has relevant experience working in a highly scalable environment.
• Strong experience with SQL, must be very good hands on with SQL skills. Would be able to write SQL queries and play with various SQL databases.
• Must have experience with SQL, Hive/ Presto, big Data, Hadoop, and MySQL.
• ETL tool is similar to Airflow from AirBnb so, familiar with it is useful.
• We are not using any available tools, we are building our own customized framework called Dataswarm, based on SQL, Python, & PHP.
• Should be able to create customized, efficient, & reliable data pipelines from scratch across our ridiculously large Data Warehouse
• Design & develop insightful reports, analyze data, and develop meaningful custom reports
• Must be able to write code & build custom data pipelines
• Should have good understanding of NoSQL databases like mongodb, Redis, Cassandra etc.
• Should have good Python experience
• Should be familiar with working in Unix environment.
• TOP 3 skills: SQL, Python, Unix
• Location: Menlo Park, CA
• Duration: until end of the year
Internal description
What kind of Data Integration method they are using?
• They are using “Dataswarm”. Dataswarm is a method implemented by facebook, which is similar to Airbnb’s Airflow. However, if you find someone worked at Airbnb then you can present them.
• What is the technology stack they are using?
• The technology stack they are using is just Python and SQL. In SQL, they are using Hive/ Presto SQL (80%) & MySQL (20%).
• What are the Data Integration tools they are using and they are looking in a candidate? (I believe they are using SAS/ ETL but I am not sure).
• They are not using any data integration tools available in the market. They are having their own Data integration framework, that they are building. This framework is built on top of ETL, with Python & SQL. They are building their own data pipelines, and data converters. So, it’s not related to any Data Integration tools available in the market. Its their own and they own it. A person who is very strong in Python, PHP, and SQL (all type of SQL – No SQL, Big Data, Hive, Presto SQL, Postgresql), they want to see this type of candidates.
Python (8/10), SQL (9/10), PHP (5-6/10).
Data Engineer – they are looking for someone with in depth experience with SQL. Minimum 3 years is okay if they have been working with SQL consistently for those there years.
If we check for that and ensure that we ask the screening questions that Amit provided, we should be able to decipher good candidates. Regarding the 2nd technology, Python. Facebook uses a proprietary code that is similar to both Airflow by AirBNB and Python so if they understand one of these tools, they will be able to pickup quickly with the code this team is using.
--------
Data analyst job id 7396
• Excellent SQL skills (80% we use Hive/ Presto SQL. 20% MySQL)
• Able to think through business requirements and translate into data.
• Need to have an eye on data anomalies.
• Preferred: Development experience with any scripting language (Python, Javascript)
• Strong communication skills
• Analytical skills, able to work with large volume of data
Critical thinking, and strong analytical approach for various use cases
Data Analyst – For this role the analyst must also know SQL very well but specifically someone who has experience creating pipelines from the starting point (source data) and is able to think through how to execute the pipeline to get to the end point. Someone who can look at what is put in front of them and know how to transform the pipeline. They should have been doing something similar in previous recent roles. They are looking for someone who knows the following basics of python: how to compute and use data structures
-----
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment