Created
October 8, 2014 18:18
-
-
Save lrq3000/e65e5af425bac977af2c to your computer and use it in GitHub Desktop.
blaze-webinar-continuum-analytics-2014-10-08-QA
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Q&A Session for Getting Started with Blaze | |
Session number: 665410874 | |
Date: wednesday 8 october 2014 | |
Starting time: 18:48 | |
________________________________________________________________ | |
Flemming Stark - 19:12 | |
Q: earlier this year blz grew apart from blaze and i don't see it here or anywhere. is blz dead? | |
Phillip Cloud - 19:15 | |
A: i believe blz is replaced by bcolz https://github.com/ContinuumIO/blz/issues/16 | |
Travis Oliphant - 19:45 | |
A: Hi Flemming. Blz was primarily Francesc's work. He has moved forward with bcolz and we are now supporting bcolz and blz is deprecated. | |
________________________________________________________________ | |
Michael Sterling - 19:13 | |
Q: Sorry joined in late. Is there somewhere to get the notebook? | |
Matt T. - 19:16 | |
A: The powerpoint is available, but I'll have to let Matt post the link again. It was on the first slide. I'm not sure if the notebook is availble. | |
________________________________________________________________ | |
Philip Branning - 19:21 | |
Q: Q: Does Blaze have a caching layer? | |
Phillip Cloud - 19:22 | |
A: No there's no caching, but specific backends might have this. | |
________________________________________________________________ | |
Guillaume Gay - 19:22 | |
Q: Hi, looks like my OS won't let me use sound with the cisco webex java app, so I don't hear anything you guys are saying. Thanks for the event anyway, I'll resort to the doc and so on to learn about Blaze! | |
Matt T. - 19:23 | |
A: We're sorry to hear that, we will have the recording up on youtube, with a link from our site, later today | |
________________________________________________________________ | |
Michael Sterling - 19:23 | |
Q: Is the plan to keep the syntax of blaze aligned with pandas or will they just diverge over time? | |
Phillip Cloud - 19:26 | |
A: This isn't a specific goal, but pandas' APIs are excellent so we'll probably keep many things. So, mostly aligned, but possibly a few differences | |
Phillip Cloud - 19:27 | |
A: Here's a link to dplyr: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html | |
________________________________________________________________ | |
Vishal Soni - 19:27 | |
Q: does blaze have delayed / 'lazy' computation? For example, stringing together queries before sending them to the backend engine? | |
Phillip Cloud - 19:29 | |
A: Yes, a blaze expression does no backend computation. It simply sits around in memory waiting to be translated to a backend. | |
________________________________________________________________ | |
Alberto Andrade-Fraga - 19:27 | |
Q: Does blaze assume the DB schema has been created or can it also migrate a DB from one implementation to another? | |
Phillip Cloud - 19:30 | |
A: Do you mean say from postgres to sql? | |
Phillip Cloud - 19:30 | |
A: sorry postgres to mysql | |
________________________________________________________________ | |
Alberto Andrade-Fraga - 19:31 | |
Q: yes, lets assume you have a DB in MSSQL and want to try creating a copy of the DB in MongoDB? | |
Phillip Cloud - 19:32 | |
A: Yes, I believe we have support for this and if we don't this is definitely a goal. This would be done with the into function. | |
Phillip Cloud - 19:39 | |
A: I should note that I don't think we migrate things like indexes. | |
________________________________________________________________ | |
Vishal Soni - 19:32 | |
Q: is there a blaze api & backend for dense array data with metadata? Like larry or x-ray (i.e. ndarrays with axis labels)? | |
Phillip Cloud - 19:34 | |
A: As of now, no, though this is a medium term goal. We have a PR up for SciDB and we talk to Stephan Hoyer (of xray) fairly regularly. We've mostly been focused on Table objects but Arrays are on the roadmap | |
________________________________________________________________ | |
Drew Newman - 19:33 | |
Q: Are there plans to support Cassandra as a back end? | |
Phillip Cloud - 19:35 | |
A: I don't think we will implement this ourselves, but if a person wanted to write a backend for cassandra, then we would happily review and most likely accept a pr. | |
Phillip Cloud - 19:36 | |
A: i believe cassandra uses a variant of sql, so if you had a separate python package that implements a sqlalchemy dialect, we could very easily help you plug this in to blaze | |
Phillip Cloud - 19:39 | |
A: There's a package called impyla that implements a sqlalchemy dialect for the cloudera impala package. | |
________________________________________________________________ | |
Philip Branning - 19:36 | |
Q: What does blaze do when you try to pull down something that's too big to fit in memory? | |
Phillip Cloud - 19:37 | |
A: Whatever Python itself does. There's no inspection of memory capacity or leveraging of that information as of now | |
________________________________________________________________ | |
Philip Branning - 19:44 | |
Q: Yeah, is there some idea of a streaming iterator type of thing for into? | |
Phillip Cloud - 19:44 | |
A: Yes there's a "chunked" interface, that does exactly this | |
________________________________________________________________ | |
Flemming Stark - 19:27 | |
Q: some features of blaze now are similar to features of IoPro is that the approach or are there differences? | |
Travis Oliphant - 19:47 | |
A: IOPro is an optimized interface for bringing data into memory. Blaze could *use* IOPro to quickly load data into NumPy arrays or Pandas DataFrames. | |
________________________________________________________________ | |
Kerry Oliphant - 19:41 | |
Q: Do you have some time today to talk? his presenation has spured some thinking | |
Travis Oliphant - 19:48 | |
A: Yes, give me a call | |
________________________________________________________________ | |
Oleg Mürk - 19:49 | |
Q: Can Blaze do predicate push-down and map pruning (chunk value range filtering) with SparkSQL? | |
Phillip Cloud - 19:50 | |
A: Is this something like filter(f, map(g, sequence))? | |
________________________________________________________________ | |
rich fernandez - 19:49 | |
Q: Is blaze trying to be a superset of sqlalchemy, to some extent? | |
Travis Oliphant - 19:50 | |
A: Blaze is similar in spirit to sqlalchemy, but calling it TableAlchemy and ArrayAlchemy would better capture the spirit. | |
Travis Oliphant - 19:50 | |
A: Also, Blaze *uses* sqlalchemy under the hood for its SQL interface. | |
________________________________________________________________ | |
Stephen Larroque - 19:52 | |
Q: How blaze is (will) managing complex computations on big data like matrix multiplication that aren't necessarily implemented in the database? Can it transparently streamline to numpy in an out-of-core fashion? | |
Phillip Cloud - 19:55 | |
A: We don't have a way to choose a particular backend for streaming operations, but there's an open discussion about this https://github.com/ContinuumIO/blaze/issues/698 | |
________________________________________________________________ | |
Oleg Mürk - 19:54 | |
Q: Does Blaze/Blosc do something similar to PyTable's OPSI indexes? | |
Phillip Cloud - 19:57 | |
A: blaze indexes are dependent on whether the backend supports it, so eg you can do create_index(tables.Table) to create a fully sorted index | |
________________________________________________________________ | |
Vishal Soni - 19:59 | |
Q: How does the developer of a data library create a blaze backed? Do they commit it to the blaze codebase, or can they just package in in a standalone manner with their library? | |
Phillip Cloud - 20:00 | |
A: Generally a more obscure backend would probably better off as a separate package, but of course obscure is open to discussion | |
________________________________________________________________ | |
Vishal Soni - 20:00 | |
Q: In the future, how do you envision maintanence of various backends? Centralized at continuum vs. distributed across libraries? | |
Phillip Cloud - 20:01 | |
A: Good question, I don't think we have a solid plan for how to deal with this yet, though we've discussed a little bit | |
________________________________________________________________ | |
Philip Branning - 20:00 | |
Q: I understand query optimization would generally be a backend-specific concern. But some query optimization should make sense at the level of Blaze. Is there any query optimization currently? | |
Phillip Cloud - 20:04 | |
A: This hasn't received a ton of attention, in favor of getting a core api and infrastructure in place, but there are some old issues discussing things like constant folding and other optimizations. we are thinking about these but they are low priority | |
________________________________________________________________ | |
Drew Newman - 20:04 | |
Q: Will this chat transcript be available after the presentation ends today? There is a lot of good information here. | |
Phillip Cloud - 20:06 | |
A: If you're on the Event Center application you can save the chat | |
Phillip Cloud - 20:06 | |
A: There's a recording that I think will have this as well, maybe Lila or MattT can comment | |
________________________________________________________________ | |
Oleg Mürk - 20:07 | |
Q: Do You currently integrate with SciDB? | |
Phillip Cloud - 20:07 | |
A: Nope, though Chris Beaumont has a PR up https://github.com/ContinuumIO/blaze/pull/681 | |
________________________________________________________________ | |
Oleg Mürk - 20:11 | |
Q: In PySpark integration what is exactly Blaze/Blosc function? | |
Phillip Cloud - 20:13 | |
A: Blaze drives PySpark (blaze provides an API to call the API of PySpark) and Blosc is an algorithm orthogonal to the goals of blaze. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment