This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| consolidated_data = [] | |
| for category in data: | |
| for sub_category in data[category]: | |
| for url in company_urls[sub_category]: | |
| consolidated_data.append((category, sub_category, url)) | |
| df_consolidated_data = pd.DataFrame(consolidated_data, columns=['category', 'sub_category', 'company_url']) | |
| df_consolidated_data.to_csv('./exports/consolidate_company_urls.csv', index=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| scrapy/ | |
| scrapy.cfg # deploy configuration file | |
| trustpilot/ # project's Python module, you'll import your code from here | |
| __init__.py | |
| items.py # project items definition file | |
| middlewares.py # project middlewares file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import re | |
| import pandas as pd | |
| import scrapy | |
| class Pages(scrapy.Spider): | |
| name = "trustpilot" | |
| company_data = pd.read_csv('../selenium/exports/consolidate_company_urls.csv') | |
| start_urls = company_data['company_url'].unique().tolist() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Obey robots.txt rules | |
| ROBOTSTXT_OBEY = False | |
| # Configure maximum concurrent requests performed by Scrapy (default: 16) | |
| CONCURRENT_REQUESTS = 32 | |
| #Export to csv | |
| FEED_FORMAT = "csv" | |
| FEED_URI = "comments_trustpilot_en.csv" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| cd src/training/ | |
| python train.py --data_path ./data/tp_amazon.csv \ | |
| --validation_split 0.1 \ | |
| --label_column rating \ | |
| --text_column comment \ | |
| --max_length 1014 \ | |
| --dropout_input 0 \ | |
| --group_labels 1 \ | |
| --balance 1 \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from ml.model import CharacterLevelCNN | |
| from ml.utils import predict_sentiment | |
| model_name = 'model_en.pth' | |
| model_path = f'./ml/models/{model_name}' | |
| model = CharacterLevelCNN() | |
| # download the trained PyTorch model from Github | |
| # and save it at src/api/ml/models/ | |
| # this is done at the first run of the API |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import peewee as pw | |
| import config | |
| db = pw.PostgresqlDatabase( | |
| config.POSTGRES_DB, | |
| user=config.POSTGRES_USER, | |
| password=config.POSTGRES_PASSWORD, | |
| host=config.POSTGRES_HOST, | |
| port=config.POSTGRES_PORT | |
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import db | |
| @api.route('/review', methods=['POST']) | |
| def post_review(): | |
| ''' | |
| Save review to database. | |
| ''' | |
| if request.method == 'POST': | |
| expected_fields = [ | |
| 'review', |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| @api.route('/reviews', methods=['GET']) | |
| def get_reviews(): | |
| ''' | |
| Get all reviews. | |
| ''' | |
| if request.method == 'GET': | |
| query = db.Review.select() | |
| return jsonify([r.serialize() for r in query]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -*- coding: utf-8 -*- | |
| import dash | |
| import dash_core_components as dcc | |
| import dash_html_components as html | |
| external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'] | |
| app = dash.Dash(__name__, external_stylesheets=external_stylesheets) | |
| app.layout = html.Div(children=[ |