rgrove/gist:2025242

Created March 12, 2012 22:56

Star (2) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rgrove/2025242.js"></script>
Save rgrove/2025242 to your computer and use it in GitHub Desktop.

Download ZIP

YUI Attribute Value Filters proposal

Raw

gistfile1.md

Attribute Value Filters

With more and more application logic moving to the client, and with YUI becoming more popular on the server, it's increasingly important to design APIs that handle user input safely. Currently, YUI modules that store user input in attributes must do one of two things: either escape user strings before setting an attribute, or escape them manually before using them.

Escaping automatically before storing the value is safest, but also inconvenient if you sometimes need the unescaped value, since you must then store two versions (probably in two different attributes). This can lead to API clutter and confusion. Escaping manually before use avoids API clutter but increases the likelihood of mistakes, and also clutters up the codebase in general. It significantly increases the chances that another developer who is unaware of the need to escape the value will inadvertently introduce a security vulnerability.

Proposal

Attribute should provide a consistent, pluggable API for retrieving a filtered string value. Internally, string attributes would be stored as raw strings, and would be filtered on demand using the specified filter when the attribute is retrieved. This avoids the need for module developers to write custom getter functions or store both filtered and unfiltered values, and allows for flexible and safe usage in a variety of scenarios.

Filters should be pluggable via a static API on Y.Attribute. This will allow the escape module to provide a set of default filters, and custom modules can provide their own filters to meet custom needs.

Getting and Setting Values

Setting a value continues to work the same as it does today:

klass.set('username', '<b>joe</b>'); // stores "<b>joe</b>" as the raw attr value

Getting a value also works the same:

klass.get('username'); // => "<b>joe</b>"

To get a filtered version of a value, specify the name of the desired filter as the second argument to get():

klass.get('username', 'html'); // => "&lt;b&gt;joe&lt;/b&gt;"
klass.get('username', 'url');  // => "%3Cb%3Ejoe%3C%2Fb%3E"

Adding a Filter

Filters are registered statically on Y.Attribute. Once registered, a filter is available for use on any class instance that uses Attribute, even if it was instantiated before the filter was registered.

To register a filter:

// Registers a new "html" filter unless one already exists.
Y.Attribute.addFilter('html', Y.Escape.html);

// Arbitrary filter.
Y.Attribute.addFilter('disemvowel', function (value) {
    return value.replace(/[aeiou]+/g, '');
});

Internally, Attribute should store the filter function in a static object hash, with the name as the key.

If addFilter() is called with the name of a filter that already exists, it should log an error and refuse to overwrite the existing filter.

Removing a Filter

To remove a previously added filter:

// Removes the "html" filter if it exists.
Y.Attribute.removeFilter('html');

When removeFilter() is called with the name of a filter that doesn't exist, it should simply do nothing.

Default Filters

A new attribute config property named filter would allow module developers to specify a default filter to be used for an attribute. For example, I could define an attribute that should always be filtered as HTML by default:

// ...
ATTRS: {
    username: {
        filter: 'html'
    }
}
// ...

This would cause get('username') to run the "html" filter. I could still specify another filter if desired, or get('username', 'raw') to get the raw, unfiltered value.

Caching

To improve performance, Attribute could cache filtered values internally, clearing the cached value whenever an attribute's raw value is updated. There may be dragons here.

get() Logic

Internally, get() or its underyling implementation should take the following steps:

Let attrName be the value of the first argument to get().
Let filterName be the value of the second argument to get().
If attrName refers to a nonexistent attribute, return undefined.
Let rawValue be the raw value of the attribute or sub-attribute named by attrName, after passing through the attribute's getter function if one is set.
If filterName is undefined or null, then
1. If a default filter has been configured for the attribute, then
  1. Let filterName be the name of the attribute's default filter.
  2. If filterName refers to a nonexistant filter, return undefined.
  3. Execute the default filter function, passing rawValue as the only argument.
  4. Return the filter function's return value.
2. Otherwise, return rawValue.
Otherwise:
1. If filterName equals "raw", return rawValue.
2. Otherwise, if filterName refers to a nonexistent filter, return undefined.
3. Execute the filter function, passing rawValue as the only argument.
4. Return the filter function's return value.

lsmith commented Mar 21, 2012

Ok, I've read through the proposal and the comments.

What it seems to me you are describing is similar to what exists today in DataTable as column formatters. In fact, it's probably more accurate to s/filter/formatter/g because "filter" suggests the removal of some part of the thing rather than reformatting it.

Registered formatters were in YUI 2, and are one of the todo items for 3.6.0, allowing

var table = new Y.DataTable({
    columns: [{ key: 'price', formatter: 'currency' }, ... ]
    ...
});

I think the idea is good, though it might complicate DT's use of formatters.

That said, I am very uncomfortable with the string parsing approach. Attribute is upstream of a lot of code, including most customer class implementations. Adding more work to get() will slow the library and implementation code down across the board. And, more recently relevant to DataTable, the new delimiter will decrease the usable characters in attribute names. This might break existing code. From 3.4.1 to 3.5.0, the move from Record to Model breaks implementations that use periods in their column keys because the Model instances think they are trying to retrieve subattributes. I don't want this to happen again.

Related aside: the implementation steps for get() are absent consideration of subattributes.

This also seems related to the notion of attribute types. This has come up a number of times in the past, and I am certainly in favor of it. The premise being that attributes can be configured with a type (sibling of value, setter, getter, etc), and other config settings would be applied based on that type.

The parallel for mosen's suggested setter filter is DataSchema's parsers, or the type config hinted above. Those are both static configurations as opposed to per-operation configurations which is the focus of this proposal.

So, I'm sticking with my original suggestion for getAs(name, formatter), though I think get(name, formatter) will be sufficient, since AttributeCore's implementation of get() only passes through one argument to _getAttr() anyway.

lsmith commented Mar 21, 2012

To be clear, I recognize that the proposal is per-operation functionality, and that differs from column configuration and the ability to configure attribute types. The similarity is in registered name => common formatting function. I like that pattern, so I like the proposal, save the string parsing.

Author

rgrove commented Mar 21, 2012

@lsmith:

What it seems to me you are describing is similar to what exists today in DataTable as column formatters. In fact, it's probably more accurate to s/filter/formatter/g because "filter" suggests the removal of some part of the thing rather than reformatting it.

You're right that filters and formatters are similar, but I'm wary of diluting the intent of this feature by targeting too broad a use case or referring to it as a "formatter". Filtering -- specifically, filtering for purposes of security -- is the primary use case, and I think the name is accurate: "filter this value through this function". The function may remove something, it may escape something, or it may just make something pretty, but the point is it acts as a filter between the raw value and the value you get back. It's a bonus that it can be used as a general-purpose attribute formatting mechanism, but that's not the focus, and general formatting isn't something that I think warrants support in Attribute's core. Filtering is.

I understand your concerns with the string parsing, and I'm open to changing that. What I like about the get('foo:html') or get('foo|html') API is that it makes the filter conceptually explicit and seems (to me anyway) to stand out more than get('foo', 'html'). This is definitely subjective though, so if you say this API doesn't do it for you, I believe you. Given that the string parsing here would consist of a simple regex match I don't think it would have a measurable perf impact compared to all the other work that goes on when retrieving an attr value, but if string parsing is a blocker, I'm willing to concede.

I don't like getAs(), though. As you may have noticed, I'm pretty intent on keeping the conceptual overhead of this API as low as possible. I don't want people to have to use a new method, or even a method signature that's too far off from what they're used to. I think filters need to be baked into get() and they need to become so ubiquitous that a get() call without a filter will look weird to people. I'd be okay with get(name, filterName).

Using get() rather than a new method is also pretty important for the case where an attribute is defined with a default filter.

I think the idea is good, though it might complicate DT's use of formatters.

My goals are pretty different from DataTable's goals. I want to avoid stepping on DT's toes for sure, but as I said above, formatting is not the focus here.

This also seems related to the notion of attribute types. This has come up a number of times in the past, and I am certainly in favor of it. The premise being that attributes can be configured with a type (sibling of value, setter, getter, etc), and other config settings would be applied based on that type.

Not quite. Context is very important for filtering. A static type won't suffice here, because regardless of what that type is, its value will need different filtering depending on the context in which it's used. HTML escaping isn't sufficient for URLs, URL escaping isn't sufficient for HTML, etc. Types are certainly useful for formatters of the sort that DataTable uses, but not for context-sensitive user input filtering.

I've specified that it should be possible to define a default filter for an attribute, but that's just a convenience, and doesn't obviate the need to filter appropriately for the context in which a value is being used.

lsmith commented Mar 21, 2012

LGTM

jinsley commented Mar 22, 2012

In the get() logic section 5.1.2 I would have thought returning undefined when the default filter is non-existent, rather than rawValue, would have unexpected consequences for developers. Otherwise this all looks good.

Author

rgrove commented Mar 22, 2012

@jinsley: Returning the rawValue there when it's expected to be filtered could have even worse consequences, which is why we return undefined if the specified filter isn't found. It might also make sense to log an error in this case.

jinsley commented Mar 22, 2012

@rgrove: I think a non-silent failure would be good - logging an error whilst returning undefined would do that.

natecavanaugh commented May 16, 2012

Hey guys,
I like the avoidance of string parsing for the filter for a couple of reasons, one of which Luke mentioned, which is adding even more overhead into everything that uses Attribute, but the other is related to a request:

What about having the filter param be either a function or a string name. If it's registered as part of addFilter, you can pass it's name. If it's a function, just use that to filter the value.

The upside of this I see is that there are many cases where I may have an existing filtering method, but I don't want to register it as a filter, or I have one that's slightly different from a registered one.

Author

rgrove commented May 16, 2012

@natecavanaugh:

What about having the filter param be either a function or a string name. If it's registered as part of addFilter, you can pass it's name. If it's a function, just use that to filter the value.

Seems reasonable.

rgrove/gist:2025242

Attribute Value Filters

Proposal

Getting and Setting Values

Adding a Filter

Removing a Filter

Default Filters

Caching

get() Logic

lsmith commented Mar 21, 2012

Uh oh!

lsmith commented Mar 21, 2012

Uh oh!

rgrove commented Mar 21, 2012

Uh oh!

lsmith commented Mar 21, 2012

Uh oh!

jinsley commented Mar 22, 2012

Uh oh!

rgrove commented Mar 22, 2012

Uh oh!

jinsley commented Mar 22, 2012

Uh oh!

natecavanaugh commented May 16, 2012

Uh oh!

rgrove commented May 16, 2012

Uh oh!