Created
December 20, 2020 03:58
-
-
Save Coldsp33d/840f2e6730af66e29286c26315b6dfce to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Beginner's Guide to `pd.read_clipboard` | |
[`read_clipboard`](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#id45) is truly a saving grace for anyone starting out to answer questions in the [tag:pandas] tag. Unfortunately, pandas veterans also know that the data provided in questions isn't always easy to grok into a terminal due to various complication such as MultiIndexes, spaces in header names, datetimes, and python objects. | |
Thankfully, `read_clipboard` has arguments that make handling most of these cases possible (and easy). The purpose of this answer is to document some of those cases in finer details. | |
--- | |
### Spaces in column headers | |
--- | |
### Read a Series instead of a DataFrame | |
--- | |
### Python objects | |
Numeric data - simpler | |
String data - may need yaml | |
--- | |
### Other considerations | |
Uses `read_csv` under the hood, so a lot of the principles for loading data from CSV apply here, such as | |
- parsing datetimes (use `parse_dates`) | |
- no headers (use `header=None`) | |
- custom names (use `names=[...]`) | |
- set a column as the index (use `index_col=[...]`) | |
- read series instead of DataFrame (use `squeeze=true`) | |
- specify a custom separator (use `sep='...'`. If multicharacter or regex, use `engine='python'`) | |
And so on. See [here](https://stackoverflow.com/a/56231664/4909087) for a more comprehensive list. | |
--- | |
### Limitations of `read_clipboard` | |
- Cannot parse prettytable/tabulate output (IOW, borders make it harder). Check out some homemade attempts at tackling this. | |
- Cannot ignore ellipses in data (you'll need to manually remove them) | |
- Cannot load data from images (if you're upto the task you can make a tesseract extension that does) | |
- | |
--- | |
### Other useful `pd.read_clipboard` questions for unconventionally formatted data | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment