Last active
August 30, 2016 00:15
-
-
Save eellpp/bda725019ccbc666a3e4c12f4d699769 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://www.grcdi.nl/gsb/summary_%20company%20legal%20forms.html | |
https://en.wikipedia.org/wiki/Types_of_business_entity | |
https://gate.ac.uk/sale/tao/splitap6.html | |
https://github.com/hpcc-systems/TextAnalytics/blob/master/hpcc/GATE_Annie/plugins/ANNIE/resources/NE/name.jape | |
## Java * References | |
https://www.oreilly.com/learning/java-8-functional-interfaces | |
http://refactoring.info/tools/LambdaFicator/ | |
https://github.com/JnRouvignac/AutoRefactor/wiki/Useful-links | |
https://mailparser.io/pricing | |
https://parser.zapier.com/ | |
http://unitedobjectives.com/products/invoice_processor/ | |
InvoiceP2 is built in Python and C (for performance critical modules). It uses proprietary Natural Language and Text Processing technology created by United Objectives. Parsing algorithms are backed up by semantic, frequency, colocation, proper names and morphological dictionaries that enable the system to parse documents without any predefined structure. | |
CloudFactory is a scalable way to outsource tedious and repetitive data work. We help break your project down into small tasks that are processed by our global 24x7 workforce. Our software platform manages this workforce to ensure quick turnaround and accurate results for your business. | |
http://www.raremile.com/mobile-receipt-scanning-and-data-extraction.html | |
Papers | |
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.403.1594&rep=rep1&type=pdf | |
http://vision.gyte.edu.tr/publications/2016/VISIGRAPP_2016Camera.pdf | |
http://www.ict.griffith.edu.au/das2012/attachments/FullPaperProceedings/4661a409.pdf | |
http://www.dsi.unifi.it/~simone/Papers/ICDAR97b.pdf | |
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.137&rep=rep1&type=pdf | |
http://ocean.kisti.re.kr/downfile/volume/kmtd/MTMDCW/2010/v13n12/MTMDCW_2010_v13n12_1786.pdf | |
https://www.researchgate.net/profile/Enrico_Francesconi/publication/3710626_Rectangle_labelling_for_an_invoice_understanding_system/links/02e7e51818f35768f6000000.pdf | |
http://seco.cs.aalto.fi/u/jwtuomin/svn/secoweb/public_html/publications/2011/nyberg-masters-thesis.pdf | |
https://jamia.oxfordjournals.org/content/21/5/850 | |
paper | |
Converting and Annotating Quantitative Data Tables | |
https://files.ifi.uzh.ch/ddis/iswc_archive/iswc/ab/2010/iswc2010.semanticweb.org/pdf/167.pdf | |
Rule Based Systems for Big Data | |
http://www.springer.com/us/book/9783319236957 | |
GSearch Queries: | |
annotation of quantitative research data stored in tables. | |
table header disambiguation | |
http://iswc2013.semanticweb.org/sites/default/files/iswc_poster_7.pdf | |
Towards Disambiguating Web Tables | |
create a file with simple Table | |
GUI | |
create a application | |
- annotate it with tokens | |
- create the jape file for it and test it output | |
- save as the applicataion | |
library | |
- load the application from file | |
- execute the PR | |
print out | |
- the reason mail parsing is slow because its following rules based parsing on labels. Matching each text against some list of patterns | |
- instead we should use the model based parsing where probabilistic output is provided | |
We can also use a combination: | |
- use the rule based parser to prepare the learning set for the probabilistic parser | |
- use the rule parser in combination with the ML | |
Currently I am doing following steps : | |
https://gate.ac.uk/wiki/TrainingCourseJune2015/ | |
jape training | |
https://gate.ac.uk/sale/talks/gate-course-jun13/track-1/module-3-jape/module-3-jape.pdf | |
jake wiki | |
https://gate.ac.uk/wiki/jape-repository/ | |
GATE Training docs | |
https://gate.ac.uk/wiki/TrainingCourseJune2015/ | |
"","SportsInterest","MoviesInterest","Technology.Interest","Finance.Interest","Politics.Interest","Travel.Interest","BizInterest","Intnl Interest","Age","Gender.Female","Gender.Male","Relationship.Status.Divorced","Relationship.Status.Married","Relationship.Status.Single", "Family.Size","Job.Level.Director","Job.Level.Entry.Level","Job.Level.intern","Job.Level.Manager","Job.Level.Sr..Manager","Income.25.to. 35","Income.35.to.50","Income.50.to.75","Income.75.to.100","Income.100.plus","Income.25.minus","No.of.Vehicles.Owned","Sales" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment