Created
August 3, 2016 19:40
-
-
Save llSourcell/7a294e51a8d0624ba1740101e23a25a3 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Close up lines (they are in parentheses in the script) insert them where necessary | |
| get yur self driving truks rite her | |
| SLAM | |
| Planning ain’t easy but somebody gotta do itt | |
| The truth will set you free | |
| but they’re WRONG | |
| Intro skit: | |
| me: forced human interaction sucks, im glad i can book a self driving Uber | |
| Me: Hey this car isn’t self driving! | |
| driver: and It never will be | |
| Hello World its Siraj and in this episode we’re going to talk about how self driving cars work, then implement our own self driving car in a simulated environment. Self driving cars aren’t in the realm of science fiction anymore, real companies like Toyota and Ford have millions of dollars in R&D pouring into this technology. Services like Uber and Lyft that currently pay human drivers will soon deploy entire fleets of self driving cars so prepare for SkyNet. In 2 or 3 years we’re going to start seeing hundreds of thousands of self driving cars being sold to regular consumers. (get yur self driving trucks right her) But how do they work? Well, when we humans are in the drivers seat we’re observing our environment by receiving a input of our surroundings from our eyes and simultaneously processing it in order to make a decision of which way to move the steering wheel. This can be translated into a machine problem known as SLAM (SLAM) or Simulatenous locaziation and mapping and its something all self driving cars do. | |
| A Self driving car is usually outfitted with a GPS unit, an intertial navigation system, and a range of sensors like radar, video, and laser rangefinders. It uses the positional information from the GPS and navigation system to localize itself and the sensor data to build an internal map of the enviroment. Once it has its position in its internal map of the world, it can use that map to find the optimal path to its desitnation that avoids any kind of obstacles be that dead babies or Pokemon. Once the car has determined the optimal path to take, that decision is then broken down into a series of motor commands which are fed into the cars actuators. The actuators control the cars steering and braking and throttling. That’s a high level description of how they work but roads are complex! Its not just about avoiding obstacles, there are weather conditions that require changes in the way you accelerate, different types of road signs, and situations that you probably couldn’t ever predict beforehand! (Planning ain’t easy but somebody gotta do it) | |
| A recent paper came out just this year called long term planning for short term prediction. These guys proposed a planning algorithm for self driving cars, specifically one that would be able to make immediate actions so as to optimize a long term objective. An example they used was a roundabout. When a car tries to merge in a roundabout, it should decide on an immediate acceleration or braking command, while thelong term effect of the command is a success/failure of the merge. Traditionally planning for self driving cars is done via reinforcement learning. The car learns to conitnously correct its driving capability over time through trial an error when training. The car or agent, observes a state s (that is a scene that it observes) and takes an action A. and depdning on whether or not the action was ‘good’ however we define good, it can receive a reward R. Then it moves to the next state s and the process repeats. The goal is to maximize the reward, and that depends on a policy which maps states to actions that it learns over time. The state-action value function is called Q and it helps find the optimal policy. But it can be very hard to learn Q in an enviroment as dynamic as real-life roads with more than one agent. Its not just a problem of predicting your own cars actions, you have to be able to predict other cars actions as well. So to solve this problem of learning Q, they used a deep recurrent neural network to learn a policy. And the input to the neural net was a vector that contained both a predictable part, the speed of the car, and an unpredictable part, the speed of other cars, so it could learn from both. So while this was an effective technique in the paper, they applied it to just two features; adaptive cruise control and merging roundabouts. We can think of these driving tasks as submodules. And this algorithm could be applied to a bunch of different submodules like lane changing decisons, driving off a cliff, and yielding for crazies. This way of algorithmic thinking, of applying various different probabilistic algorithms to modular bits is how most companies think about the problem. | |
| But is there a way to make one learning algorhthm that can learn everything from the ground up? As in everything that a car would need to do to get from point A to point B. The technical term for this is end to end and an even fresher paper that was released about 3 months ago tried it. The paper was called ‘end to end learning for self drving cars’. A team from Nvidia put 3 cameras on a car windsheid to receive input data, fed this video data to a convotulional neural network and features were learned by themselves. They never exxcplicuty trained it to detect the outline of roads. They didnt expcliity deompose the problem into submodules for different scenarios, their CNN mapped what it saw from the input directly to steering commands. It was first trained in a simulation with pre recorded video, then trained by a human driver. It was able to learn road features from steering alone. The training data was a series of road images paried with steering command from a human. Each frame, they manually calibrated the center of the lane which they called the ground truth. (The truth will set you free) They used the Torch library to help train the machine, and the weights were adjusted via back propagation. They got great results and the car did really well on the road in various conditions after training, but more work was needed to improve the algorithm. It was hard for the paper authors to differentiate the feature extractor part of the neural network from the controller part, so it was difficult to test each. Thats why most real world car manufacturers have decided its not yet possible to test and verify and end to end system. They end up just making software where each module is separate and can be tested on its own. (but they’re WRONG) | |
| Deep learning libraries like Torch democratize the technology behind self driving cars. A hacker named george hotz built a self driving car in his garage with just a couple of cell phone cameras and the total cost turned out to be just 1000 bucks. Let’s train our own self driving car using Q learning to drive itself without running into obstacles with the Keras library. After we declare our imports, let’s write our training function for our car first; it’ll take in a neural net with a set of hyperparams as the parameters. then we’ll define some variables for the number of frames we want to observe for both training and testing. We’ll then define our positional variables for localization. We’ll create a new game instance and get the first state of the game instance. We’ll also set a timer for tracking purposes. Then, when we start building experience replay, we’ll update our positional variables, then choose an action depending on the state randomly. f the random variable is outside of our constraints, we’ll get the Q values for each action to help us find the optimal policy. We’ll take that action and if it is valid we will get a reward. | |
| Once its done observing the game and building experience replay, we’ll start training, sampling the experience replay memory and getting the training values. It’ll then train the model on this batch, that is a neural network. then update the starting state, and if the car dies, log the distance and reset the car’s live. Finally we want to save the model every 25000 frames in the weights file. We’ll log everything when we’re done for our own record. So that was jus tthe training function. Let’s look at the testing function. For that we use our trained model as the parameter, initialize our game state and initialize car distance as 0. We then start moving the car and updating the car distance. we choose an action via our trained model, then take an action. We log our distance. And this keeps repeating until the car dies. It ocnstantly tries to avoid obstacles through a mix of reinforcement learning and a neural net Let’s see how it looks in a simulated enrvioment. It constnatly treis t avoid obstacles. Once you’ve got it working in the simulator, you could port it to a real RC car and have it self drive all over your room! Links down below for more info, def subscribe for more ML videos, I’ve gotta go descend some gradients so thanks for watching! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment