Skip to content

Instantly share code, notes, and snippets.

@callmehiphop
Last active August 29, 2015 14:24
Show Gist options
  • Save callmehiphop/7849a495cbb5c8deedf9 to your computer and use it in GitHub Desktop.
Save callmehiphop/7849a495cbb5c8deedf9 to your computer and use it in GitHub Desktop.

BigTable

var bigtable = gcloud.bigtable();

getZones

bigtable.getZones(function(err, zones, apiResponse) {});

Zone


getClusters

var zone = bigtable.zone('my-zone');

zone.getClusters(function(err, clusters, apiResponse) {});

Create a cluster

zone.createCluster(clusterOptions, function(err, cluster, apiResponse) {});

Cluster


Get Cluster Metadata

cluster.getMetadata(function (err, metadata, apiResponse) {});

Update Cluster Metadata

cluster.setMetadata(metaData, function(err, metadata, apiResponse) {});

Delete Cluster

cluster.delete(function (err, apiResponse) {});

Undelete Cluster

cluster.restore(function (err, apiResponse) {});

Create a Table

cluster.createTable('my-table', function(err, table, apiResponse) {});
// or
cluster.createTable(tableOptions, function(err, table, apiResponse) {});

Get Prexisting Table (half initialized)

var myTable = cluster.table('my-table');

Get All Tables on Cluster

cluster.getTables(function (err, tables, apiResponse) {});

Table


Get Table Schema

table.getMetadata(function (err, tableSchema, apiResponse) {});

Delete a Table

table.delete(function (err, table, apiResponse) {});

Rename a Table

table.rename('awesome-table', function (err, table, apiResponse) {});

Create a Column Family

table.createFamily('user', function (err, family, apiResponse) {});

Get Pre-existing Family (half initialized)

var userFamily = table.family('user');

Update a Family

userFamily.setMetadata(metaData, function(err, family, apiResponse) {});

Delete a Family

userFamily.delete(function(err, apiResponse) {});

Get Sample Keys

If a callback is not provided a stream will be returned

table.getSampleKeys(function (err, keys, apiResponse) {});

Get Table Rows

If callback is not provided a stream is returned

var rowOptions = {
  prefix: 'com.google.'
};

table.getRows(rowOptions, function(err, rows, apiResponse) {});

Delete Table Rows

rowOptions would be filters used to determine which rows to delete

table.deleteRows(rowOptions, function(err, apiResponse) {});

Create a Row

table.createRow(rowData, function(err, row, apiResponse) {});

Create Multiple Rows

table.createRows([rowData], function(err, rows, apiResponse) {});

Get Specific Row (half initialized)

var myRow = table.row('my-row');

Update a Row

myRow.set('user:name', 'stephen', function(err, row, apiResponse) {});

// or for multiple columns
var rowData = {
  user: { // family
    name: 'stephen', // column
    age: 99 // column
  }
};
myRow.set(rowData, function(err, row, apiResponse) {});

Get Columns From Row

myRow.get(['user:name'], function(err, columns, apiResponse) {});

Delete Columns From Row

myRow.delete(['user:name'], function(err, columns, apiResponse) {});

Delete An Entire Row

myRow.delete(function(err, columns, apiResponse) {});

Target Specific Family From Row

This should not be confused with the Table#family which allows you to update/delete families for the entire Table.

var myFamily = myRow.family('user');

// get column
myFamily.get('name', function(err, name, apiResponse) {});

// set column(s)
myFamily.set('name', 'peter', function(err, family, apiResponse) {});
// or
myFamily.set({ name: 'peter' }, function (err, family, apiResponse) {});

// delete column(s)
myFamily.delete(['name'], function(err, family, apiResponse) {});

// delete all columns associated with family
myFamily.delete(function(err, apiResponse) {});
@stephenplusplus
Copy link

bigtable.renameTable('presidents', 'prezzy', function (err, prezzy, apiResponse) {
  // prezzy === isntanceof Table
});
bigtable.removeTable('presidents', function (err, prezzy, apiResponse) {
  console.log('presidents table has been deleted');
});

These should be:

var table = bigtable.table('prezzy');
table.rename('prezzydense', function(err, apiResponse) {});
table.delete(function(err, apiResponse) {});

prezzy.createColumnFamily('user', function (err, apiResponse) {
  console.log('user family created');
});
prezzy.removeColumnFamily('user', function (err, apiResponse) {
  console.log('user family deleted');
});

Could be:

prezzy.createFamily('user', function(err, family, apiResponse) {});

var family = prezzy.family('user');
family.delete(function(err, apiResponse) {});

var filters = [
  bigtable.prefixRange('com.google.'),
  bigtable.familyFilter('user') // only return user family columns
];

// maybe add a next() function for next set of rows..?
prezzy.getRows(filters, function (err, rows, apiResponse) {
  // rows = collection of Row instances
});

var deleteFilters = [
  bigtable.prefixRange('com.prezzy.')
];

prezzy.deleteRows(deleteFilters, function (err, apiResponse) {
  // delete rows using filters
});

We could probably just do:

var options = {
  prefix: 'com.google.',
  family: 'user'
};

prezzy.getRows(options, function(err, rows, apiResponse) {});
prezzy.deleteRows(options, function(err, apiResponse) {});

// multiple dynamic rows
var filteredRows = prezzy.rows([
  bigtable.prefixRange('com.google.')
]);

This is interesting, but possibly confusing. If we do go with this though, maybe:

var filteredRows = prezzy.rows({ prefix: 'com.google.' });

// maybe rename delete?
alincoln.remove(function (err, apiResponse) {
  console.log('alincoln deleted from prezzy table!');
});

Yes :)


alincoln.remove(['user:tagged'], function (err, apiResponse) {
  console.log('user:tagged was deleted');
});

I don't mind this if we choose to go this route ('user:tagged' style) over the other, but we only support choose one. For sake of comparison, the other style looks like:

var family  = alincoln.family('user');
family.delete('tagged', function(err) {});

// or get certain columns via filters
var filters = [
  bigtable.columnFilter('tagged')
];

alincoln.get(filters, function (err, rowData, apiResponse) {
  // rowData = { user: { tagged: 'bfranklin' } }
});

Like the other place above, this could be easier:

var options = {
  column: 'tagged'
};

alincoln,get(options, function(err, rowData, apiResponse) {});

@callmehiphop
Copy link
Author

Do you think rather than having a createTable method it would be better to have something like..

bigtable.table('prezzy').create(function (err, prezzy, apiResponse) {});

I sorta of disagree with the moving deleteFamily to the Family object, mostly because getting a family object via table (opposed to a row) seems kinda overkill since all you'd probably do with it is delete it. I imagine if you wanted to query via families you'd use a family filter on getRows.

prezzy.getRows({ family: 'user' }, function (err, rows, apiResponse) {});

I also like the idea of using a hash over an array of objects, I haven't spent a lot of time looking into all the different filters usually associated with HBase but I'd like to dig into that a little more before going with it.

@stephenplusplus
Copy link

Do you think rather than having a createTable method it would be better to have something like..

We've run into this before with the other apis that generally work the same. We thought it was a bit confusing to have create inside of the thing that it's referencing. We've kept references like that (storage.bucket, pubsub.topic, etc) only for "things that already exist" to avoid the confusion.


getting a family object via table (opposed to a row) seems kinda overkill since all you'd probably do with it is delete it.

Oh, so you can delete a column family from all of the rows in one API call? We definitely need to support that, then.

@callmehiphop
Copy link
Author

Yep, I imagine if we wanted to delete a family at the row level we could just do something like

var gwash = prezzy.row('gwashington');
gwash.family('user').delete(function (err, apiResponse) {});

@stephenplusplus
Copy link

I'm not a big fan of the "reverse hierarchy" (hope there's not a word for this that I don't know) and defaults. It's easy enough to get a zone from a cluster directly:

var gcloud = require('gcloud')({ /* auth conf */ });
var bigtable = gcloud.bigtable();

var zone = bigtable.zone('my-zone');
var cluster = zone.cluster('my-cluster');

// if someone doesn't care about the Zone, shortened:
var cluster = bigtable.zone('my-zone').cluster('my-cluster');

That makes the most sense to me ¯_(ツ)_/¯


bigtable.getClusters(function(err, clusters, apiResponse) {});

Can you get clusters without specifying a zone?


bigtable.getTables(function(err, tables, apiResponse) {});

Can you list tables without specifying a cluster and zone?


zone.getCluster('my-cluster', function(err, cluster, apiResponse) {});

Our usual way of handling this is having a getMetadata method on Cluster:

var cluster = zone.cluster('my-cluster');
cluster.getMetadata(function(err, metadata) {});

Most of the time, a user might want to "do something" with the cluster, so making an extra API request up front to download details that will be discarded isn't helpful or cost-effective.


cluster.undelete(function (err, apiResponse) {});

😮 That's a thing? Maybe restore for a name, unless undelete is a convention?


table.getSchema(function (err, tableSchema, apiResponse) {});

Does an endpoint exist for this, or do we parse out a part of the response from the resource (i.e. what would come back from getMetadata)?


rowOptions would be filters used to determine which rows to delete

(For table.getRows, table.deleteRows)

What would rowOptions look like?


General notes:

  • For the getMetaData and setMetaData, just rename getMetadata and setMetadata :)
  • I'm only 98
  • Does this idea have a place here? Mainly, how table.rows([rowObj1, rowObj2]) would make it easier to modify family values at once.

@callmehiphop
Copy link
Author

I'm not a big fan of the "reverse hierarchy" (hope there's not a word for this that I don't know) and defaults. It's easy enough to get a zone from a cluster directly

The closest to API documentation I can find is the .proto files an Go client. As far as I can tell the zone and cluster are required and I wanted to supply defaults since I imagine some people will go directly for the table.

var bigtable = gcloud.bigtable({
  zone: 'my-zone',
  cluster: 'my-cluster'
});

var myTable = bigtable.table('my-table');
var myOtherTable = bigtable.table('my-othertable');

vs.

var bigtable = gcloud.bigtable();

var myTable = bigtable
  .zone('my-zone')
  .cluster('my-cluster')
  .table('my-table');

It might also be worth noting that all APIs involving zones/clusters involve an entirely different grpc client anyways.


Can you get clusters without specifying a zone?

Per bigtable_cluster_service.proto it looks like all that is needed is your project id.


Can you list tables without specifying a cluster and zone?

Nope, which is sort of why I wanted to be able to set a default zone and cluster, but we could easily just pass them in or they could chain .zone().cluster().getTables()


😮 That's a thing? Maybe restore for a name, unless undelete is a convention?

That's a thing! I'm cool with changing it to restore, it was just noted as undelete within the proto file.


Does an endpoint exist for this, or do we parse out a part of the response from the resource (i.e. what would come back from getMetadata)?

There's a GetTable rpc defined and the comments around it state that it returns the schema of the table.


What would rowOptions look like?

Still need to work out whether or not this is feasible and what filters are available, but previously we talked about allowing an option to filter the results like..

var rowOptions = {
  prefix: 'com.google.'
};

I'm only 98

b.s.

@stephenplusplus
Copy link

var bigtable = gcloud.bigtable({
  zone: 'my-zone',
  cluster: 'my-cluster'
});

var myTable = bigtable.table('my-table');
var myOtherTable = bigtable.table('my-othertable');

// vs.

var bigtable = gcloud.bigtable();

var myTable = bigtable
  .zone('my-zone')
  .cluster('my-cluster')
  .table('my-table');

I still choose the second personally. I see the convenience in the first, but it doesn't follow suit with our other APIs, which could be confusing:

var gcs = gcloud.storage({
  bucket: 'my-cool-bucket'
});

gcs.getFiles();

We could try to go back over our APIs to support a similar pattern, but even looking at the example above, I'm not sure it's preferred. More specifically, .bigtable() should return a reference to a Bigtable object, and .storage() should return a reference to a Storage object. "gcs.getFiles()" would mean "get all the files from GCS" and "bigtable.table()" (from your example) would mean "get this table from bigtable". The solution to that is obviously naming the vars differently:

var cluster = gcloud.bigtable({
  zone: 'my-zone',
  cluster: 'my-cluster'
});

var myTable = cluster.table('my-table');
var myOtherTable = cluster.table('my-othertable');

// and...

var bucket = gcloud.storage({
  bucket: 'my-cool-bucket'
});

bucket.getFiles();

And with those name changes, we have perspective for how they're supposed to behave. But, it's unexpected to get a cluster back from "gcloud.bigtable" and a bucket back from "gcloud.storage" directly. Having them act as a child class is misleading to their "parent-scope" capabilities.

@stephenplusplus
Copy link

Does an endpoint exist for this, or do we parse out a part of the response from the resource (i.e. what would come back from getMetadata)?

There's a GetTable rpc defined and the comments around it state that it returns the schema of the table.

I think that would just be table.getMetadata() then.

@callmehiphop
Copy link
Author

That's cool with me, I just thought I'd throw it out there. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment