Rough estimate:
POCKETSPHINX: 33%
GOOGLE SPEECH: 66%
speechtest:
POCKETSPHINX PRECISION: 81%
POCKETSPHINX RECALL: 68%
GOOGLE PRECISION: 82%
GOOGLE RECALL 72%
$ speechtest sample2_actual.txt sample2_sphinx.txt
sample2_actual.txt = 234 words
sample2_sphinx.txt = 280 words
True Positives: 190
False Positives: 44
False Negatives: 90
(sphinx) Precision: 0.811965811965812
(sphinx) Recall : 0.6785714285714286
$ speechtest sample2_actual.txt sample2_google.txt
sample2_actual.txt = 234 words
sample2_google.txt = 264 words
True Positives: 191
False Positives: 43
False Negatives: 73
(google) Precision: 0.8162393162393162
(google) Recall : 0.7234848484848485@ 3:00 of Content Mining Taxonomies, Ontologies, and Semantics, Metadata Madness Luncheon
The biggest thing that you would do to kind of do that across multiple data sets starts with normalization, right, so if you want to make sure that data is searchable and you're looking at it across multiple data aggregators, data providers, normalizing that content across those providers is sort of the first thing you can do to make sure that garage is clean. The cleaner that garage becomes, the easier it is to find that content and we can talk a little bit more I think throughout the course of this panel, how that how that goes and how that works.
We um, I think we spend a lot of time thinking about how to clean the garage but maybe not enough about how we got all the stuff in there in the first place. You know, semantics is really, obviously important, especially when you're talking about standardization, um, but, if you don't have the data in the first place, if you can't think about all of this stuff in the context of billions of files, because really, we're not talking about a few boxes in the garage, we're talking about billions of them. And you need to be able to um account for the fact that whatever semantics you do come up with, it, there's gonna be, it's gonna be reliant on humans entering metadata, it's going to be reliant on remembering what the organization is, and also, it not changing, and not to mention, it not failing to account for, to extend the metaphor, a box of new sprinklers or something that doesn't fit the mold, or doesn't fit the organization that you agreed upon.
you would do to kind of do that across multiple datasets start with normalization right so if you want to make sure that data is is searchable and youre looking across multiple
aggregator State provider normalizing that content across those providers to sort of the first thing you can do to make sure that that garage is clean and then to clean that garage becomes the easier it is to find
I think we spend a lot of time thinking about how to
in the garage but maybe not enough time about how we got all the stuff in there in the first place you know as soon as it is really you know obviously in Port
especially when youre talking about standardization but if you dont have the data in the first place if you cant if you cant think about all of this stuff in the
text of billions of files because really were not no were not talking about you know if you boxes in the garage were talking about billions of them and you need to be able to
account for the fact that whatever semantics you do come up with it theres going to be its going to be reliant on humans entering metadata its going to be reliant on the form
during what the organization is and also if not changing and not to mention it is failing to account for you know that I extend the metaphor a box of
sprinklers are something that doesnt fit the mold or doesnt fit the organization that you agreed upon
you would do the time to get across multiple dataset started normalization right so [NOISE] if you wanna make sure that day is this article in you're looking across multiple ah but the aggregated skipper wires [NOISE] normalizing that thompson across those providers us were the first thing you can do to make sure that that crisis clean [NOISE] on and then the clinic arise because the losers i find that constantly talk a little more thing for the course of this town how about how it goes and that's how it works
we young spreading we spend a lot time thinking about how to clean it raj but [NOISE] maybe not of time about how we got all the stuff in there in the first place [NOISE] tom you know this is the id is semantics is it is really a note on this important especially to talk about standardization [NOISE] off by youth if you don't have that in the first place if you can if he can't think about all this stuff in the context of billions of files because really or not you know we're not talk about you know few boxes in the garage are talking about billions of them [NOISE] and you need to be able to bomb account for the fact that they were never semantics you to come up with
it does get it's it's gonna be relied on humans entering meditate and it's gonna be relied on remembering what the organization is and also did not changing [NOISE] and not to mention it failing to account for you know that it ought to extend the metaphor of box of [NOISE] new sprinklers or something [NOISE] that that it doesn't fit the mold or does that organization that you agreed upon