Skip to content

Instantly share code, notes, and snippets.

@jsnwesson
Last active February 12, 2023 19:43
Show Gist options
  • Save jsnwesson/1707a89e4969c4ff671b525289eb46b9 to your computer and use it in GitHub Desktop.
Save jsnwesson/1707a89e4969c4ff671b525289eb46b9 to your computer and use it in GitHub Desktop.
02 - CatwalkAPI - Engineering Journal

SDC Engineering Journal

This gist details the decisions I made, the tests I conducted, the observations recorded, and the directions I took while building out a server and database for Project Snickers.

Goals

Create an API using an Express server and either a SQL or noSQL DBMS to retrieve and record data based on the pre-made front-end HTTP requests.

Achievements

  • Imported SQL-ized data in CSV format to a MongoDB and successfully accessed data with Mongoose library.
  • Used aggregation pipeline to properly transform and load new data into the desired schemas that matched the front-end requests.
  • Indexed finished collections to increase the response rates of each request.
  • Successfully installed MongoDB into an EC2 instance with the locally-made collections using SCPs and mongoexport/mongoimport.

Reflections

Resources/Documents

Daily Summary

Database Design and Creation

Database Benchmarking

Which DBMS did you test?

Mongo DB + Mongoose

Performance notes.

Development Stress Tests

Stress Testing

Artillery.io

June 21, 2021

Overview

Document the flow of data from client-side to server side based on given codebase.

Log

Afternoon

Challenges Faced

I needed to know what API requests were made, the data that was stored from those requests, the frequency that they were called on, and the format that the data was retrieved.

Actions Taken

Aside from looking at the API documentation, I went through each relevant file and mapped the requests that were made either during componentDidMount or as an event handler.

Results Observed

Although an axios request exists for getProductList, it didn't appear to be used at all in the codebase. I will have to check again, but it initially made it so that there were three types of requests that I would need to develop: getProduct, getRelated(products), and getStyles(ofProduct).

June 22, 2021

Overview

Design a SQL and noSQL Schema

New Technology Used (Including new features of Old Tech)

  • mySQL + SQL Designer
  • PostreSQL
  • Mongo DB + Mongoose
  • Cassandra

Log

Afternoon - Design mySQL and Mongoose schemas

Challenges Faced

Without looking at the CSV data that would need to be imported, create a SQL and noSQL schema.

Actions Taken

For the SQL schema, I used a schema designer to attempt to create a more clear picture of what tables would need to be JOINed, what IDs they'd need to connect by, and especially attempt to think about what data could be nested similarly to the data fetched by the client-side. For the noSQL schema, I initially made nested schemas, then adjusted it based on the CSV data.

Results Observed

Initially, I was going to use Cassandra as the noSQL option, but soon realized that Mongo was a better option for denormalized schemas that had nested data, whereas Cassandra was able to more closely reflect the SQL format that I wanted to avoid using. Additionally, I thought I'd prefer using mySQL, but given that Mongo + Mongoose allowed for the creation of nested data that the client side would need to receive, I figured that the runtime for retrieving that data would be faster, once the CSV data was properly transformed.

mongoose schema

June 23, 2021

Overview

Attempt to Import CSV data to MongoDB

New Technology Used (Including new features of Old Tech)

  • Mongo shell command
  • MongoDB for VS Code

Log

Afternoon - Use mongoimport to fetch CSV data.

Challenges Faced

Import given CSV data access it with Mongoose.

Actions Taken

Successfully used mongoimport to import data from the CSV into MongoDB (viewable with Mongo shell), but was unsuccessful in accessing data with Mongoose.

Results Observed

I was able to see that my first CSV file was fully imported into MongoDB by using db.product.find().count(), as well as db.product.findOne(), but I couldn't find the same data through Mongoose. I installed MongoDB for VS Code in order to instantly access my Mongo database and all of the files, and I noticed that there were two files in my database: product and products. I knew that product was the name I gave the collection when I used mongoimport, but I wasn't sure about the latter because my mongoose Model isntantiation was meant to create a collection called Product.

                            `const Product = mongoose.Model('Product', productSchema)`

June 24, 2021

Overview

Learned About the Outcome of mongoose.Model and Attempted Aggregation Pipeline (Day 1)

New Technology Used (Including new features of Old Tech)

  • Mongo Compass

Log

Morning

Challenges Faced

Access data in Mongo Database from Mongoose and implement aggregation pipeline to transform/load denormalized data.

Actions Taken

Changed the name of the collection in mongoimport to be products so that my mongoose.model could access the database:

mongoimport --db fetcher --collection products --type csv --headerline --ignoreBlanks --file /Users/Wesson/Documents/SDC/product.csv

Began investigation of aggregation pipeline and how to implement it.

Results Observed

With the help of a Help Desk request, I learned that the result of defining the name of a collection in a mongoose.model would actually be lowercased and pluralized, so that the line const Product = mongoose.Model('**Product**', productSchema) would create a collection called products, or it would be able to access a collection that already exists in the database that has the same name. When looking into aggregation pipeline, it was difficult to determine what keywords to use, how each keyword was implemented (as in the information they needed), and also where the aggregation pipeline was supposed to be implemented (Mongo shell, mongo-based codebase, or mongoose-based codebase). Eventually, my classmates showed me Mongo Compass, which made it possible to create sample data with aggregation pipeline, but because my collections were so large, the runtime to make these sample datas ran out.

June 25, 2021

Overview

Attempted Aggregation Pipeline (Day 2) and Considered CSV Parsing API

New Technology Used (Including new features of Old Tech)

  • Mongo Compass
  • CSV-Parse API

Log

Afternoon - Very Little Progress Made with AP, Suggested Using CSV-Parse API

Challenges Faced

Implement aggregation pipeline to transform/load denormalized data.

Actions Taken

In terms of efforts made, my classmate and spent time blindly attempting to implement aggregation pipeline in the Mongo shell as well as with mongoose. We attended a Help Desk request in which we were suggested to also look into programatically creating our transformed data with a CSV parser.

Results Observed

Because our Mongo Compass wasn't able to render sample data to let us know if our aggregation pipeline code was going to produce the work that we wanted, we spent a lot of our time waiting on our Mongo Database to finish transforming the code. We first learned of $lookup as a way to create a nested collection by merging one collection inside another, but after waiting several hours for the transformation to finish, we learned that the transformation didn't actually change the original collection, nor did it create a new collection. During our Help Desk, I was convinced that the time to wait for a single aggregation pipeline wasn't worth the wait if it realistically took several hours to complete each merge. I decided to investigate a CSV parser as a way to transform the data before it's imported, while my classmate continued researching aggregation pipeline. This made the wisest decision, as it meant that we could look into two approaches rather than investing time in only one of them.

June 26, 2021

Overview

Successfully Implemented Aggregation Pipeline (Day 3) and Designed API Routes

New Technology Used (Including new features of Old Tech)

  • Mongo Compass' Indexing

Log

Morning - Successfully Created New Collections with the Help of Indexing First

Challenges Faced

In order to speed up the process of aggregation pipeline, the collections needed to be indexed first.

Actions Taken

Indexed collections based on the shared data (productId for three collections, and styleId for the other three collections). Also, I experienced an issue where the initial aggregation pipelines (after indexing) were producing empty results for the merged collection, so I had to drop the old collections and use mongoimport again to allow for proper indexing.

Results Observed

Once I learned that indexing the ids would make the runtime only 5-10 minutes, as opposed to 12+ hours for transforming the collections, the next step would be to design the API routes to access this data.

Afternoon - API Routes

Challenge/Motivation

Create API routes that would access the newly transformed collections.

Actions Taken

Used the endpoints from the client code to design server-side routes and used Mongoose's Model.find() method to GET the desired data based on Product ID and Stlye ID.

Results Observed

I realized that I will need to continue designing the collection for getStyles, because the client is expecting an object skus to have several keys that pertain to each sku id, rather than an array of objects. This will require returning to aggregation pipeline to transform the old CSV data.

June 28, 2021

Overview

Indexing Finished Collections and Implement Artillery.io

New Technology Used (Including new features of Old Tech)

Log

Afternoon

Challenges Faced

Stress test the API routes to determine if there is any room for optimization.

Actions Taken

I created a Baseline test of 1 arrival/sec for 60 sec, then a Scalability test of 10/100/1000 arrivals/sec for 60 sec. Lastly, a Loading test provided a Warm-Up, Ramp to, and Sustained Load for a total of 13 minutes with the max arrival rate being 50 per second.

Results Observed

After indexing the final collections there was a dramatic improvement of completing my GET requests (several seconds to a few milliseconds). It was not possible to run the 1000 arrivals/sec for longer than 5 seconds before an ETIMEDOUT warning showed up. artillery.io for getProduct

June 29, 2021

Overview

Impact of Keeping and Removing Indices/IDs of Nested Documents

Log

Afternoon

Challenges Faced

Test different versions of each GET request.

Actions Taken

I created variations of a given collection to determine if removing nested IDs or indexes would improve the response rates of my GET requests in artillery.io.

Results Observed

I chose to create my tests for my deployment code instead of leaving it for optimizing during deployment. The reason being that I figured any optimizing that would need to be done could be done sooner and with immediate results, as well as having fewer variables to worry about. The one issue I came across was that I would need to run these tests with fewer applications running in the background (Discord, Slack, Spotify, etc), as the CPU usage goes up dramatically when there are 100-1000 arrivals per second. So I know now that my deployed stress testing with Loader.io will produce more accurate results. As it turns out, the collection that had 80mb more data than the other versions of itself was also the fastest in its baseline and scalability tests. I'm assuming this has to do with the indexing that existed in order to connect these nested documents to the original products and styles collections.

Artillery.io data of getProduct

June 30, 2021

Overview

Used $arrayToObject to Convert Nested Schema

Log

Afternoon

Challenges Faced

In order to fit more closely to the Atelier API used by the frontend code, the nested document 'Skus' needed to be changed from an array of objects to an object of objects.

Actions Taken

I used $arrayToObject and $map in my aggregation pipeline to change the array into an object. Everything else stayed the same in my nested document.

Results Observed

When I had a classmate who wanted to get closer to the actual Atelier API documentation, which meant removing the nested _id, id, and styleId of each object, it was difficult to accomplish this with a simple $project or $unset. I was able to remove _id and styleId, but not id, even with the attempt to exclude it in a separate $project. I was impressed by MongoDB's documentation on the different ways to manipulate the collections, but the syntax is very unforgiving.

July 1, 2021

Overview

Attempted to deploy MongoDB and a server to two EC2 instances, with the MongoDB now including the collections made locally.

New Technology Used (Including new features of Old Tech)

  • mongoexport

Resources

Log

Afternoon

Challenges Faced

In order to use mongoDB in the EC2 instance, it needed to receive the collection from the local mongoDB.

Actions Taken

At first, I used an SCP to send the original CSV files to my Ubuntu instance, but then decided to use mongoexport to extract and send the JSON'd collections from my local mongoDB. Once these JSON files were in the EC2 instance, I used mongoimport and then db.collection.createIndexes to index the product Id of each collection.

Results Observed

I was surprised to find out after using mongoexport that they didn't include the indexes I made in my local database, and the alternative would've been to use mongo dump and mongo restore but I also read that the dump and restore method puts the database at risk for performing less efficiently.

July 2, 2021

Overview

Day 2 on deploying MongoDB, which gave me the opportunity to use one more way to export/import local DB files to EC2.

New Technology Used

-mongodump and mongorestore

Resources

Log

Afternoon

Challenges Faced

With the server instance already running, the challenge was to connect the server to the mongoDB instance.

Actions Taken

With the resources listed above (especially point 4), I was able to make my MongoDB instance publicly available. I tested this by connecting the DB to my local Mongo Compass. Still wasn't able to connect DB to server instance.

Results Observed

I recognize that simply making the database public would also put it at risk, as the actual files are exposed. For future projects in which there is sensitive information, I would make sure to at least create a user/password within mongo shell and enable the security system within mongod.

July 3, 2021

Overview

Final day to deploy MongoDB and connect to server instance!

Log

Afternoon

Challenges Faced

Connect the publicly-made MongoDB instance to the server instance.

Actions Taken

Aside from following the same steps from the previous two days, the order in which they occurred is hard to recall, but I was able to connect the database to the server at least for the afternoon.

Results Observed

I wish I could say what I did that worked, but I cannot. I tried to replicate the general steps that I took to connect my MongoDB to my server, but the I wasn't successful. I know that while my connection was made, I was able to make API requests through Postman and my browser, but my frontend code wasn't able to call on the API routes. This very well might have been due to my Styles route not returning the same data format as it was designed to handle. Overall, I want to return to this project knowing that I am really close to deployment, and that it's a valuable skill that I need to have under my belt if I ever want to deploy a full stack app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment