Skip to content

Instantly share code, notes, and snippets.

@namtx
Last active July 20, 2018 12:03
Show Gist options
  • Save namtx/d9f3df3e5f3025c943ee2812209337e1 to your computer and use it in GitHub Desktop.
Save namtx/d9f3df3e5f3025c943ee2812209337e1 to your computer and use it in GitHub Desktop.

\nHow Nike Run Club led me to write my first AI project\n\n\n\n\nFor a long time I’ve wanted to start learning how to use AI as part of my code flow. Whereshould I start? Well, I don’t have B.Sc and definitely don’t have the math fundamentals to understand all the equations and calculations behind it. But then, I have came across this amazing article of Jason Brownlee and I’ve started believing again!\n\nAfter reading many times this article and understanding it I can finally tell that I’m fluency with KNN algorithm. Let’s talk about KNN a bit.\n\n# KNN Algorithm\n\nImagine you are in the World Cup. You see three pubs of different fans: England, Sweden and France. You are welsh, whom would you celebrate with? of course England because you are both in Great Britain!\n\nThat is the meaning of KNN: K Nearest Neighbors. \nOur data will be classified to the nearest type in the same dimension:\n\nFrom Wikipedia\n\nWe can see the green circle in the middle. it’s closest to the triangles, so it probably will be classified to the triangle shape.\nWell I said “probably” because no “K” was defined. This “K” means how many items should be considered. If “K” is 3 then the circle should be classified as a triangle (two triangle vs one square). But If “K” is 5 then the circle should be classified as a square (three square vs two triangles).\nMost of the time the default value of “K” is 3 (bigger “K” means it’s insensitive).\n\nOne lovely day I was running with Nike Run Club app in the background.\nI felt tired that day, I needed someone to wake me up and give me some motivation. Guess what, it never came. It came up to me: Classify songs as rhythmic or as nonrhythmic!\n\nI’m not a musician but a little bit research led me to the term “Tempo” and it’s measurement unit “BPM”. Beats per minutes value determines whether a song is rhythmic or not.\n\nAs it is written in Wikipedia, a BPM value of a rhythmic song is above 120:\n\n> Allegro — fast, quickly, and bright (120–156 bpm) (molto allegro is slightly faster than allegro, but always in its range)\n\n> Vivace — lively and fast (156–176 bpm)\n\n> Vivacissimo — very fast and lively (172–176 bpm)\n\n> Allegrissimo or Allegro vivace — very fast (172–176 bpm)\n\n> Presto — very, very fast (168–200 bpm)\n\n> Prestissimo — even faster than presto (200 bpm and over)\n\nSo in theory we need to classify whether a BPM of a song is above 120.\nThe problem is I did not find any accurate algorithm for calculating BPM, so I had to add more values to ensure a song is rhythmic:\n1. BPM average of a song.\n2. The percentage of high BPM points of the whole points.\n3. How many sequence of BPM points the songs have (above 5 points of high BPM in a row).\nAlso I’ve subtracted each point with 5 because I did not trust Aubio library algorithm results.\n\nLet’s retrieve these values and add them to the DB:\nFirst extract the BPM values of the song with aubio example:\n\n\n\n\n\nthen, get all the values we need:\n\n\n\nPut all the values into a DB and let’s start classifying!\n\n## Creating Datasets\n\nWe need to feed the machine. The recipe is datasets, which are lists of data that we want to provide the algorithm for the learning part (a lot of them).\n\nLet’s have a look:\n\nWe create a list of lists with our data from the DB.\nThen, we need to load it and split it to 2 groups:\n\* training set- the data the algorithm actually learn.\n\* test set- a validation set for testing our algorithm.\n\nEach list contains 4 attributes:\n\* BPM Average.\n\* BPM Counter Percentage.\n\* High BPM Points Sequence.\n\* Whether the song is rhythmicor not\n\n\n\n## Creating KNN Model\n\nOur KNN model deals with real numbers. That’s way we’re going to use “Euclidean Distance” to calculate the best K points for the upcoming prediction. Each training set is compared with a test case: We calculate the distance between each training set values andtest set values and add it to a variable. We sort the variable that stores our data and retrieve the K best option (smallest distances).\n\nLet’s write some code!\n\n\n\n## Predicting The Future\n\nWe created all the data, retrieved the best points for each training set and now we are ready to predict the result!\n\nThe prediction is a voting. The major group is elected. Simple as that.\n\nLet’s see how it’s done\n\n\n\n## Putting It All Together\n\nNow that we have all the pieces let’s complete this puzzle!\n\n- First get the training data or load it from an existing Pickle object.\n- Find best neighbors around.\n- Predict.\n\nWe will have two modes:\n1. Training mode- Finding the best datasetwe can for the most accurate. prediction.\n2. Classify a song whether it’s rhythmic or not.\n\n\n\nLet’s test the classifier and check the result:\n\n~~~\nRemix.mp3\nTrain set: 64\nTest set: 1\n[[158, 64, 125, <TEMPOS.RHYTHMIC: 'Rhythmic'>], [154, 63, 112, <TEMPOS.RHYTHMIC: 'Rhythmic'>], [167, 72, 145, <TEMPOS.RHYTHMIC: 'Rhythmic'>]]\nIs Remix.mp3 rhythmic? True\nRemix.mp3 is added successfully\n~~~\n\n\nThe song I’ve tested is rhythmic (It’s a remix!!!)\nWe can see that the best neighbors it found are: \n [158, 64, 125, <TEMPOS.RHYTHMIC: ‘Rhythmic’>],\n [154, 63, 112, <TEMPOS.RHYTHMIC: ‘Rhythmic’>],\n [167, 72, 145, <TEMPOS.RHYTHMIC: ‘Rhythmic’>]\n\nThe winner is… ‘Rhythmic’! We have a success!\n\n# Conclusion\n\nWe’ve learn about KNN model, how to implement it and predict data with it.\nUsing AI in your code even though you may not have the all the necessary fundamentals is possible. I’m sure that if we will have more simple articles like the mentioned article above about these models and their implementations we will see more developers using these concepts in their code. Do not be afraid to use it!\n\nI hope this article taught you something new, and I am looking forward to your feedback. Please, do tell — was this useful for you?\n\nThe full project can be found on Github here.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment