Skip to content

Instantly share code, notes, and snippets.

@akutz
Created August 24, 2016 18:54
Show Gist options
  • Save akutz/7daaa1a0ac52b44a3c4399995e094e39 to your computer and use it in GitHub Desktop.
Save akutz/7daaa1a0ac52b44a3c4399995e094e39 to your computer and use it in GitHub Desktop.
----- Today August 24th, 2016 -----
akutz [12:12 PM]
Hi John, I'm about to eat lunch, but I can answer some questions if you have any.
jameyers14 [12:38 PM]
Sure. Still on?
akutz [12:40 PM]
Yep.
jameyers14 [12:40 PM]
OK. Thanks so much for your time. Matt says you are very generous with it.
akutz [12:40 PM]
He's a filthy liar. I hate helping people :slightly_smiling_face:
jameyers14 [12:41 PM]
Perfect.
[12:41]
I do to.
akutz [12:41 PM]
No, I don't mind. I find it's often better to review interfaces via a conversation than documentation. Plus, I'm the only one writing documentation, and dev docs are always getting the short end of the stick since user docs are required.
jameyers14 [12:42 PM]
OK, before we get into weeds, I have a more higher level "is RexRay/libstorage the right way to go" question.
akutz [12:42 PM]
Sure
jameyers14 [12:42 PM]
Did Matt give you the background on our company (Actifio)?
akutz [12:42 PM]
No
jameyers14 [12:45 PM]
OK. We make appliances (physical and virtual) that perform copy data virtualization. In a nutshell, we do incremental forever ingest of production data from many platforms and then act as virtual storage arrays and can present any point in time as native storage (FC or iSCSI), replaying the whole "chain of changes" at wire speed. People use us both for data protection and test/dev because when you access data from us, we don't have to move bits of data up front - we're acting as a storage array so the access to the data is almost immediate regardless of if we're talking about 1GB or 1PB. Make sense?
akutz [12:48 PM]
So if I understand it, you can basically take any level-1 snapshot point-in-time and present it back to the consumer as a the storage device with a full-view of the data from that point-in-time?
jameyers14 [12:49 PM]
Yes. A synthetic full. And that source data could be a VM, it could be a physical server, it could be Oracle, it could data captured on-the-wire over FC, etc.
akutz [12:49 PM]
Synthetic full meaning you compose a "full" view that includes the delta from the request point as well as any missing blocks from the last time they were recorded as well?
jameyers14 [12:50 PM]
correct. All at wire-speed. As native storage.
akutz [12:50 PM]
Pretty neat.
[12:50]
Sounds like you and ZFS would go hand-in-hand :slightly_smiling_face:
jameyers14 [12:50 PM]
We do a lot with ZFS.
[12:51]
So from a test/dev standpoint the way people use the product today is you'd ingest a incremental block updates of your production data -- let's say it's a Postgres database (as it's something people would want to use with containers)... You're already doing this for data protection (storing it in our dedup pool, replicating to a remote site/cloud, etc). So you have the data in our system. We call it a Gold Copy.
akutz [12:53 PM]
Crash consistent then I imagine. Unless the app is aware and has some hook?
jameyers14 [12:53 PM]
App consistent.
akutz [12:53 PM]
So the app has some API it hooks into? Is it OS level? Your API?
jameyers14 [12:54 PM]
Now you decide you want to use the same data for test/dev. So you provision what we call a workflow. You tell our product you want to take whatever latest copy of this database is, prep mount it through a data masking system (say to strip credit card/PCI data), and then mount the sanitized copy to 50 development machines. Each development machine will then get the mount, and at time 0 will have consumed almost no disk space. Each mount represents a fork from the original data and we'll write to disk only the data that each of the developers changes on their mount. And then we'll refresh that data according to your schedule.
[12:54]
There are multiple hooks at the app and OS level.
akutz [12:55 PM]
You're not the masking system then, since I imagine you don't know/care about the file system?
jameyers14 [12:55 PM]
That's right. We work at the block level. So we mount to a host that understands the data and how to manipulate it.
[12:55]
But we orchestrate it.
[12:56]
So that's great. One of our large banks actually does this to 1000 developers across their enterprise. Saves them 8 figures every year. Works great. But now let's say they want to use Docker instead of mounting to VMs or physical hosts. You see where this is going...
akutz [12:56 PM]
Yep
jameyers14 [12:57 PM]
If you want to see this, I can do a WebEx BTW.
akutz [12:57 PM]
Would you want to present a unique device to each container?
[12:57]
No, I think I have it.
[12:57]
I was actually taking the afternoon off to catch up on some paperwork and other items, so I'd rather not get pulled into too long of a demo at the moment. I was just hoping to answer some basic questions for you to get you moving in the right direction.
jameyers14 [12:59 PM]
But now let's assume we have a sanitized version of the database sitting ready to be consumed on our appliances. A simple REST API call from a Docker host can instruct the appliance to present the LUN to the host and our Connector (a small daemon running on the host) will take care of all of the host-specific tasks like importing LVM groups or Zpools and mounting the drive. So basically with a single API call to request a copy of this dataset, it will just appear moments later on the host mounted and ready to go.
[1:00]
The here's the question, now that you have the background. I want people to be able to run docker and request via a volume driver that the container gets a copy of some pre-defined data set that is ready to be consumed. Is RexRay the way to go here, or would it be easier and simpler to just implement the volume driver directly.
akutz [1:01 PM]
The host daemon is the piece of this that may be the issue. We don't delegate the client-side process to anything driver-specific except for the 1) retrieval of a client-instance ID, 2) getting a list of the client's local devices in order to detect when the new device is presented as well as next device ID.
[1:02]
We actually manage the process of mounting the device, formatting it, etc.
[1:02]
But it sounds like you need to be injected into that process.
jameyers14 [1:02 PM]
The host daemon does not interact with any local process. The appliance talks to it. If you make a mount API call to our appliance to mount this data at /media/pgsql1, it will just show up there moments later.
akutz [1:03 PM]
I understand that. That's kind of my point :slightly_smiling_face:
[1:03]
I'm saying the current libStorage workflow expects the libStorage client to handle the mounting and formatting of a device.
[1:03]
And that piece *isn't* storage driver specific.
jameyers14 [1:04 PM]
OK. Also, in this case we're talking about existing data, so you would only want a container to consume it, not format it.
akutz [1:04 PM]
The storage driver code is primarily for telling a remote platform to create/delete/attach/detach volumes an devices.
[1:04]
We don't format anything if we detect an existing file system.
[1:04]
But it sounds like your daemon is presenting not a device (well, I guess it is, but that part is masked), but rather the eventual FS mount point of the device.
jameyers14 [1:05 PM]
OK. The Connector is optional. If you don't have it installed, we'll simply expose the data over the fabric and leave it to something else to deal with bus rescans, LVM stuff, mounting, etc.
akutz [1:05 PM]
You know, we *might* be able to work with you if I finished my work on bind mounts. Because then I could simply "remount" "/media/pgsql" as if *it* was a device.
[1:06]
Yep, the workflow today is such that the storage driver dev handles the bus scans via a piece called the _Executor_ binary that all storage drivers contribute to for their pieces of code that must execute client-side.
[1:06]
We'd have to extend libStorage support to include your FS types, but that's not really an issue.
[1:07]
We'd also have to add LVM support which we do not have today. Well, if the device being presented excepts to be a part of an LVM volume that is.
jameyers14 [1:07 PM]
The appliance just exposes data as a LUN over FC or iSCSI. If libStorage can handle all of the client side tasks, we don't have to be involved with them. But keep in mind we are talking about mounting raw block data and our Connector's strength is that it will handle all kinds of FSes, LVMs, ZFS, etc.
akutz [1:09 PM]
I'm not discounting what your connector can do. I'd rather be able to rely on that honestly. I need to consider that I either have to extend libStorage to delegate the client-side work to some new type of driver that is part of the workflow *or* we'd end up having to recreate much of what the connector does as part of the libStorage package.
jameyers14 [1:09 PM]
It also greatly helps the appliance discover details about the host that are needed to mount data to it - e.g. WWN on FC or IQN on iSCSI.
[1:10]
Yes. So my take is I'd rather have libStorage consume a mount and do a bind mount as you described above.
akutz [1:11 PM]
Sure, but at that point the only value we add is providing a bridge between you and Docker.
[1:11]
Because you *still* have to install your connector on each client
jameyers14 [1:11 PM]
And that is what I've been wrestling with.
akutz [1:11 PM]
That's the issue I'm seeing. The value of libStorage is the client piece is storage-agnostic. We simply fetch the executor if we need it.
[1:12]
So yeah, it seems to me either you/I need to recreate much of what your connector does OR REX-Ray/libStorage adds minimal value to you.
[1:12]
You could simply extend the connector daemon to expose a docker volume driver endpoint and be done with it.
[1:12]
IFF the connector has to be installed locally.
jameyers14 [1:12 PM]
And that is my question.
akutz [1:13 PM]
That's the most straight-forward approach today in my estimation.
jameyers14 [1:13 PM]
And I value you and Matt's input here as I am at a major decision point.
akutz [1:13 PM]
Unless you wanted to fork libStorage and take a look at the client-side OS and Integration drivers in order to figure out how you would augment them for your needs.
jameyers14 [1:14 PM]
I don't even need to extend the Connector. There is a reference implementation of a volume driver for GlusterFS that uses REST. I could basically change the API calls and have a working volume driver.
akutz [1:14 PM]
Not saying you would do the work -- I could / you could / we could together. But at this point I think you should look there to see what we lack in order to determine how much work libStorage requires to make it work for your needs.
[1:14]
Ah.
[1:14]
https://github.com/emccode/libstorage/blob/master/drivers/integration/docker/docker.go
GitHub
emccode/libstorage
libstorage - libStorage provides a portable and remotable storage plugin framework.
jameyers14 [1:14 PM]
And therefore keep the whole thing open source and not hidden in the Connector.
akutz [1:14 PM]
This is the `integration` driver.
jameyers14 [1:14 PM]
Yes, I've looked at it.
akutz [1:14 PM]
Ah
[1:15]
So you know you *could* implement this (https://github.com/emccode/libstorage/blob/master/api/types/types_drivers_integration.go) and basically replace the Docker integration driver with your own.
GitHub
emccode/libstorage
libstorage - libStorage provides a portable and remotable storage plugin framework.
[1:15]
I don't even like that it's called the Docker integration driver.
[1:15]
It should really be called Linux or something.
jameyers14 [1:17 PM]
Sure. So I guess we're just at the point of what's best for the community and what people would use. I could hack up the GlusterFS reference driver and probably get it to work with minimal fuss. When you mount, the driver sends an API call to the appliance, the appliance exposes the data and talks to the local Connector which does all the host tasks and returns status to the appliance, and the appliance returns ultimate status back to the docker driver along with a mount point. Done.
[1:18]
Or we can try to get libStorage to handle the whole stack. I've never written anything in or used GO before. :disappointed:
akutz [1:18 PM]
But again, if you *have* to install the local connector I see very little value in you utilizing libStorage
jameyers14 [1:19 PM]
Precisely. Which is why if we used libStorage, I meant by handling the whole stack doing what the Connector does.
akutz [1:19 PM]
ScaleIO requires clients to have the SIO tools, but they are not services. You literally have a running process that can respond to Docker's requests already.
jameyers14 [1:20 PM]
Not today. The Connector does not talk to local services. Today it only accepts a mutually authenticated and secure connection from the appliance. When you use the API you talk to the appliance, which talks to the Connector. Nothing but the appliances talks to the Connector directly.
akutz [1:21 PM]
No, I know. I'm saying it's there to be extended.
jameyers14 [1:21 PM]
That's true.
[1:22]
However, I'd be more a fan of just publishing an open source docker driver with its own daemon rather than extending the Connector, which is closed source.
akutz [1:23 PM]
I'm not trying to dissuade you; I'm just trying to describe what there is today and what needs to be done to complete the ask.
jameyers14 [1:24 PM]
But to get back to the heart of the discussion, I suspect there already is a lot of overlap between what the Connector does today and what libStorage does. Regardless of if storage comes from a VMAX or an Actifio appliance, libStorage already has to have basic functionality to rescan HBAs, find the storage, figure out what kind of filesystem it is, and mount it up.
akutz [1:25 PM]
That's true. And it does.
[1:25]
The workflow is there.
[1:25]
The storage driver provides the scanning piece, and we provide the mounting / formatting piece.
[1:26]
Adding support for `mount.zfs` is no biggie. However, LVM support is another thing as it may require a separate workflow / discovery process.
jameyers14 [1:27 PM]
Agreed. So as I understand the differences are: the Connector provides discovery services - it tells the appliance what the host's WWN or IQN is, it handles LVM and ZFS, and it supports non-Linux and non-Intel platforms (Solaris on SPARC, HPUX, AIX, etc).
akutz [1:28 PM]
Okay. The nice bit about the last piece is you're only concerned about supporting what Docker supports in terms of this new work.
jameyers14 [1:28 PM]
And I doubt a lot of people want to run Docker on AIX, but you probably know better than me.
akutz [1:28 PM]
I'm not sure about non-x86 instruction sets, but Docker is currently Linux-only.
[1:29]
There *is* some C-code in the server, but that doesn't have to concern you. The executor would have to be built for new ARCHs if we have to support them.
[1:29]
But the executor is pure-Go I believe (I wrote it all, but even I'm not positive anymore), so no worries there.
jameyers14 [1:29 PM]
Yeah, I'm not too worried about that.
akutz [1:30 PM]
Look, my suggestion is this -- look harder at the Docker integration driver and you tell me what we're missing.
jameyers14 [1:30 PM]
So libStorage would have to do a registration function with our appliances. It would have to check to see if the host is known to the appliance. If it's not it would have to create the host object over our REST API with the relevant WWN/IQN before it can request a mount.
akutz [1:31 PM]
I believe that is the *key* component that has to be replaced/augmented to support your platform.
jameyers14 [1:31 PM]
The only thing other than that is LVM support. From what I can tell (I looked at your driver earlier -- Matt sent the link), the rest is there.
akutz [1:32 PM]
Re: registration - that's fine. We can do that via the remote, InstanceInspect method. You just need to provide a light-weight mechanism to gather client-instance info that gets built with the executor. That code needs to be dependency-light to keep the executor binary small.
[1:34]
For example, the ScaleIO instanceID code invokes the local SDC tool to get a GUID. That GUID is then sent to the remote libStorage server which routes it to the requested service (which *should* be a service configured to use the ScaleIO driver), and the Instance ID is passed to the InstanceInspect method which asks the ScaleIO platform to identify this client GUID.
[1:34]
Same basic principle for your registration issue.
jameyers14 [1:34 PM]
That makes sense.
akutz [1:35 PM]
I need to run, but I'll be on later this evening as well as tomorrow and Friday. Those days I'll have more time and we can jump on a Hangout so I can go over some of this visually with you. For now just look at the existing drivers. Their layout is fairly consistent. If you want to take the leap with the assumption we can add support for ZFS and LVM, and I think we can, then maybe fork libStorage and take a stab at your driver.
[1:36]
I'm going to discuss the above conversation with one of my colleagues and solicit their input in case I missed anything or forgot something obvious.
jameyers14 [1:36 PM]
One last question...
akutz [1:36 PM]
Sure.
jameyers14 [1:39 PM]
Remember the source data is block level. We define it at the level an "application" in our system. That could be an entire VMDK of original data that comprises a boot partition for /boot, and a LVM volume group with / and /home logical volumes. So you request "postgres1" and you actually end up with 3 mountpoints. How do we handle that?
akutz [1:40 PM]
Currently we do not. We deal with devices. You mount a device to a single mount point. We don't deal with devices that contain anything other than a single, mountable file system at the moment.
[1:40]
I mean, if it's raw we'll format it, but beyond that...
[1:41]
Again, I think we'll have to extend the workflow to accommodate some of your needs. We *do* currently examine the _superblock_ as it were to check for the existing file system. I imagine we can check some similar data in order to determine if we need to perform some edge workflow.
[1:43]
I have some ideas on how to handle it. We could extend the executor so that the mount workflow is handled externally, into code to which you could contribute.
[1:43]
Or again, you could simply extend/replace the existing integration driver.
jameyers14 [1:45 PM]
OK. Is there value to you to handle a block device that contains more than a single mount point, either multiple partitions and/or LVM? In our experience, we see just about every Linux system we ingest is LVM.
akutz [1:45 PM]
Oh sure, there definitely is value. I'm not denying that. I'm approaching this all from a time-to-resolution stand-point from your point-of-view.
[1:46]
Regardless of whether it's valuable for us to do something, if it's not in the product already it delays your time-to-resolution.
jameyers14 [1:47 PM]
Of course. So it sounds like my best path forward right now is to at least hack up the GlusterFS driver to get something to at least get out there. And then if there is demonstrable interest from our customers, I can assign real engineers to work on the libStorage part with you and your colleagues.
akutz [1:48 PM]
That would likely be the path of least resistance. I'll take all this back to my PM and discuss the LVM support. This could end up being a chicken and egg scenario. If you aren't actively pursuing it, it may not be a priority. If it's not a priority, you may not pursue it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment