Created
August 10, 2015 22:04
-
-
Save rajkrrsingh/260880b3a587bd36ab4b to your computer and use it in GitHub Desktop.
reading parquet files and know meta information of parquet file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Building a parquet tools | |
git clone https://github.com/Parquet/parquet-mr.git | |
cd parquet-mr/parquet-tools/ | |
mvn clean package -Plocal | |
// know the schema of the parquet file | |
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema sample.parquet | |
// Read parquet file | |
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar cat sample.parquet | |
// Read few lines in parquet file | |
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar head -n5 sample.parquet | |
// know the meta information of the parquet file | |
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar meta sample.parquet |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It seems like there is a resolution issue for one of the dependencies.
[hdfs@master parquet-tools]$ mvn clean package -Plocal
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Parquet Tools (Incubating) 1.6.0rc3-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for com.twitter:parquet-hadoop:jar:1.6.0rc3-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.677 s
[INFO] Finished at: 2016-10-05T17:57:24+00:00
[INFO] Final Memory: 7M/150M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project parquet-tools: Could not resolve dependencies for project com.twitter:parquet-tools:jar:1.6.0rc3-SNAPSHOT: Failure to find com.twitter:parquet-hadoop:jar:1.6.0rc3-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of sonatype-nexus-snapshots has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Fixed this by substituting the variable with 1.6.0 here