Skip to content

Instantly share code, notes, and snippets.

@prasku5
Last active May 4, 2018 05:43
Show Gist options
  • Save prasku5/f60cfbc33b749e5528bc7672991de687 to your computer and use it in GitHub Desktop.
Save prasku5/f60cfbc33b749e5528bc7672991de687 to your computer and use it in GitHub Desktop.
sqoop import \
--connect jdbc:mysql://localhost/source_database_name \ (This is the path to access the Source DB using JDBC Driver)
--username <username> \ (Source Database Username)
--password <password> \ (Source Database password)
--database source_database_name \ (The database name will become folder name in target HDFS )
--target-dir <path of the directory> \
--hive-import \
--hive-table query_import \
--boundary-query 'SELECT 0, MAX(id) FROM a' \ (The boundary query allows sqoop to know the range of records that need to participate in import process)
--query 'SELECT a.id, a.name, b.id, b.name FROM a, b WHERE a.id = b.id AND $CONDITIONS'\
--num-mappers 3 (This will result in parallelism and we should choose the number of mapper with care)
--split-by a.id \ (we are splitting some column for our performance and it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment