Skip to content

Instantly share code, notes, and snippets.

@ferd
Created June 10, 2019 13:46
Show Gist options
  • Save ferd/18f3eb0084a399d147bc257ffb05b8a2 to your computer and use it in GitHub Desktop.
Save ferd/18f3eb0084a399d147bc257ffb05b8a2 to your computer and use it in GitHub Desktop.
AdTech in a non-linear timeline nutshell

Essentially, there's a publisher side (the website displaying the ads), a supply side (the advertisers wanting to display ads), and the client (you). The advertisers might be individual corporations, but they could also be agencies that represent various advertisers or corporations as well.

Traditional advertising was either the publisher or supply side contacting each other to set up a campaign. They could strike a deal like "display my ads on your site for a week for $x" and off you'd go. A contract was signed, you added a banner, and things were fine. You could compare that format to what would be magazines and TV ads: the website was part of a target demographic and you could just buy time as-is. Eventually it evolved when sites got more general (say, news websites or dating websites) where a wide swath of the population could be on the same site. The idea became to target even tighter. The dating websites I used to work on had things like: age, gender, sexual orientation, interests, and so on. We could annotate pages to know which kind of ads to show based on these "tags" that defined the user or content. There was no strict need to track individual users for the third party, and they could get decent targeting. You couldn't account for each ad super simply though, and people used metrics such as "cost per display" (traditionally, Cost Per Mille, or CPM, the cost per 1000 displays), "cost per click" or CPC (how often someone presses the ad), and "cost per action" or CPA (how often someone signs up for a service). CPA is often used as an internal optimization metric, whereas CPM and CPC are used according to campaign types:

  • CPM is for awareness campaigns. You don't necessarily care that people take action, you want them to learn about a product or a brand
  • CPC is for tracking effectiveness of the ad itself, but also of its placement. How interested are people in it?
  • CPA is for tracking effectiveness of the entire pipeline.

Eventually, the big differentiation of online advertising compared with regular ads came with the idea that you could track a user with cookies, and then know how often you displayed an ad. After say, 5-7 times, you might decide "okay, this is a lost cause and we'll have diminishing returns" and then you ask to stop displaying the ad. THIS is the big win compared to any other advertising method. It is intrinsically attached to tracking users, and that's the whole thing that makes it worth it.

In parallel with all of that, people saw that there would be value in becoming a middle-man. You could make a deal with all the advertisers you could fine, and you could make a deal with all the websites you could find, and be the broker for all the ads for all their pages so they don't have to negotiate deals anymore. THis middle-man could charge a small fee to everyone on each transaction, handle the logistics of tracking, and you'd be good to go.

These brokers (Yahoo, Google, AppNexus, etc.) could start tracking users on all websites, to start bringing profiles up, build their own intelligence on top of it, and try to provide better value. They also set up auctions for spots on pages. Since they could track all users and all websites, they'd be able to set up a marketplace where you could say "this is a page about the following topics, this is this kind of anonymous user from this place, age group, with these interests. How much to show your ad here?" This is Real-Time Bidding (RTB). The auction is set up to send you all the data, and within ~100ms, you have to answer back saying that you'll pass, or that you're interested. If you're interested, you send a bid response containing the ad you wanna show, and the price you're willing to pay. Most auctions are set-up so that the highest bidder wins, but pays the second-highest bid price.

websites loved this because they could get money without effort. Advertisers liked it because they could get metrics on everything, super-accurate profiles, etc. But most advertisers were not set up for RTB, so a new kind of intermediary would pop up, being bidders. The bidders could be specialized businesses that would work on behalf of agencies, or agencies that set things up themselves.

They had the ability to do data mining, which gave place to a thing called re-targeting. For example, if I'm a car dealership, I could mark all the users who came to see about my brand new trucks. Later, when the user is seen once again on a news article about cars or on a car-review website, and they're in the right geographical area, I could bid a much higher price to remind them about my trucks, because I know for a fact they are interested. This encouraged agencies to integrate more tightly with their own customers to provide the intelligence to do that kind of stuff.

Eventually agencies themselves had heavily annotated cookie-data that could represent users, and there again, some became brokers of that data. On any bid request you received from an ad network, would you be interested in knowing all about this users you have not yet met, but that our network has? This created a practice of cookie exchanges.

A cookie exchange is done by displaying, along with your winning ad, a little script that will load your own ID, along with the ID of the ad network, and punt it to the cookie broker who would then couple that ID with their own, all through individual cookies. The ad brokers would add deals with other brokers, and they could all daisy-chain cookie loading from one broker to the other.

In parallel, ad exchanges started compounding their inventories with one another. So you would have for example Facebook, who mostly displays on its own website, and companies like OpenX, which were early ad players who had their own network. But you also had other ones like Google's AdClick (who are everywhere), and the integration cost started being high. Companies like AdRoll or AdGear, where I was, would then aggregate all of these exchanges and bundle them into one. They often also have their own bidder on their end, and what they offer is to integrate with an ad provider: come with us, and we'll give you the intelligence for all your best bids, robot protection (because all these bidders fight each other as well), give you the tracking, and integrate you with all the major networks.

So in the end, advertising is still one website showing an ad to a user, but the ecosystem now looks like this:

  • the user browses a website and gets a cookie assigned
  • the website wants to display an ad, and does so by including the script of an ad network on its page
  • the ad network creates a bid request that it sends to all its subscribers, who are either bidders or other ad networks
  • all the bidders in the direct or indirect ad network make a bid (which is either a no-go, or a given creative (an ad) refered by type, dimensions, id, etc. and a clickback URL)
  • the one with the highest price wins and gets its ad displayed on the page
  • the ad network calls the clickback URL to let the bidder know it won, and how much it paid
  • the ad displayed on the page calls some script or piece of content that the bidder has put in place. Before (or in parallel with) showing the ad, cookie exchange takes place where along with the successful display, all the partners daisy chain each other into the page to further annotate the user
  • the bidder ingests the clickback data, marks the number of impressions that have been made, create data for further mining, etc.
  • the same process takes place for each spot you can buy on a page
  • the user has seen their page delayed by a shitload of time and has loaded over hundreds of domains

Then you get adversarial uses: websites hiding ads in ways they are not visible to get them used, websites creating bots to click on ads to raise their values (CPM campaigns and retargeting pay more for content they think is valuable and therefore websites profit more). Websites try to hide the 18+ and nazi content so that they get the broad spectrum advertisers to go on their page -- nobody wants to advertise for adults, because the porn industry is full of bottom feeders doing display advertising and it can hurt brands).

Bidders started fighting other bidders in an attempt to ruin them. Since everything is based on super large volume, a competing bidder that can get you to spend money for bad campaigns will create angry customers that flock other ways. People started writing bots that behave like users to click on campaigns and break the competition.

This happened on facebook as well when bad advertisers were below their objectives and had to start getting the hit rates (CPM and CPA) they needed, and started hiring click farms to fraudulently make people believe their ads were more effective than they were.

Ad networks also started trying to differentiate each other: who has the best data and best quality? Google on the search pages, high trust. Facebook started peddling video as teh highest engagement with the better-tagged users. This led to a lot of lies about the value of video ads vs. text, had major impact in the ad industry hiring and employment methods (almost all advertisers fired copy editors and hired video editors even if all the values were fabricated), and marketing department tried all kinds of optimizations to get what they could. So that's the adtech industry in a nutshell: extremely high volume, ultra-competitive industry, where everyone tries to make fractions of a penny on each ad and cope with it in volume, with rasor thin margins. Big advertisers like Google and Facebook make fat stacks because marketing departments and websites have nearly no other revenue mechanism and they can skim everything, whether the ads are effective or not (hence the drive for engagement) On the tech side it's super fun because tracking, and re-targeted bidding are high volume, low latency deals that require very creative flow control to break even with the bandwidth and CPU costs, and lets you optimize on all the dimensions you might think are fun. Building profiles is where bigdata and machine learning get endless budgets to do whatever the fuck they want to optimize however they want.

You can go from bare metal servers with tweaked hardware and custom-built kernels, up to cloud systems in meshes for whatever the fuck you want. All the tech at the highest volume is there, at all layers and in all areas you might find interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment