"someone did something someone else thinks is wrong"
Examples of headlines blaming human error
"Hal 9000 explanation" - not completely telling the truth
lots of examples of words used in failure situations
"limits of my language are limits of my world" - language acts as a constraint
human error - "post hoc social judgement"
difficult to pinpoint human error without exact catalog of things that must be done in each situation
magazine article and white paper taking point further
"be mindful of your mindset, reactions and langauge" in particular your feelings when things go wrong
"study the system in the context of normal work" - failure is caused by people doing their normal day to day work
need to understand how staff meet demand and handle pressure - human error often caused by pressure
how do people respond to failure? - should adjust and vary performance, make trade offs.
3/5
Can't understand language children use, and vice versa.
Performance is the same thing - difficult for performance people to get the message across.
43% of managers feel IT hinders business
3/5
Match up business metrics (growth, conversions) to perf metrics (page load time)
Perf metrics don't tell the business anything by themselves. Need to correlate and compare to business context.
Keynote has a new RUM system that does that.
script to add to page to measure things - can add custom metrics - budget of things on page.
Designed for developers so they are mindful of performance metrics on page.
4/5
10 days wait for information down to 10 minute expectation
everyone wants everything faster now, and are less patient
1 second faster - 2% conversion increase
advertising, real time bidding big things.
real time analysis + hadoop for processing older data.
2/5
diversity benefits company, bottom line and everything in the world.
Assume good intentions, culture of forgiveness - dont be afraid to speak up.
Acquired diversity as well as inherent diversity - breaking down in different ways
2D diversity
more diverse companies have higher market caps.
meritocracy is broken
"governing group" reach out to own networks, by definition similar to group.
requiring sixteen hour days is anti diversity.
culture of comfort, not diversity, by just carrying on as normal
Apache foundation used as un-diverse example
goal should be to get applications %s to meet demographic
do this by reaching out to these demographicss
need to help these people, mentoring / junior opportunities.
"get people to the table"
once there, how do you get them to stay?
code of conduct.
Eliminate implicit bias.
orchestras - screen around auditions, increased women from 5% to 50%
interviewing groups should be as diverse as possible
5/5
no names / pictures on applications.
"Embracing Your Personal Apocalypse: A Light-Hearted Jaunt Through an Abject Failure Will Pressly (EdgeCast)"
quick hack fix config change, triggered kernel bug
iptables rules confused the issue, all dns failing
talk about recovering from failure rather than details of failure.
keep mind clear and keep perspective.
after recovery - ask "what did we learn?"
3/5
Time For a New Way to Measure User Experience?! Klaus Enzenhofer (Dynatrace)
"the performance impact"
apdex and navigation timings are outdated.
JS heavy apps dont fit these metrics.
Mobile apps dont fit this, browsers dont have nav timing
"user action response time"
errors - look bad.
look at a users whole journey, not just part of it.
take into account where user is
Dynatrace have a product to do it...
3/5
Better Performance Through Better Design Mark Zeman (SpeedCurve)
Designed a beautiful website, ask to make it fast now.
Need to take perf into account when designing things.
not visual aesthetics or user experience - use knowledge / creativity to solve problem
not designers fault, processes fault.
iterative deign process.
need a flexible process.
mindfulness.
dont be dogmatic and follow a process, step back and analyse the process.
have some principles - high level, 5 - 10 performance should be one of these.
example "speed is more important than design embellishment"
"engage quickly and then make it feel like you are there" - travel site, deliver initial content quickly then add rich content after main point of page is loaded.
"small interdisciplinary teams"
"share your knowledge" - experts within team need to work with non experts.
pick a metric and work towards it.
benchmark against competitors.
simplify data and present it in easy to comprehend way.
get knowledge out of head and facilitate performance discussions.
4/5
make devs responsible for security run tests continuously
compares QA testing to security testing - similar, but different
Business context / architecure / app features affect threats we look for - threat model
Likely enough / high impact enough that we care about it.
ok to accept certain threats if conscience decision made and documented
password reset account leakage used as an example - different for standard online shopping, compared to dating site for having affairs (ashley madison)
security requirements visible, actionable, up to date, testable, automated
jbehave + selenium, owasp zap
https://github.com/continuumsecurity/bdd-security
uses a port scanning example to verify only certain ports are open
Also a nessus example, that removes false positives - docs of why false positives are removed.
owasp zap - like charles proxy, focused on security
use selenium + zap api to drive this
create java class with login / logout etc methods
get zap to do submit all forms on app, then spider the app so it is in zaps db, then zap will report on security vulnerabilities
can check different users dont see other users sensitive data
showed the ci process - commit, automated deploy and automated bdd scan from jenkins
similar tools - zap junit
gauntlet (ruby) http://gauntlt.org/
f-secure/mittn python + Burp (propeitary sec scanner)
4/5
not math heavy talk
online / offline models
example of a disk alert, 85%, bumping to 85.1% at 2am to fix in the morning
classification of messy data
example given of a counter which periodically gets reset leading to sawtooth graph
guage vs rate - disk - gauge is how full rate is how much added in last hour
categorisation difficult for machines to do, easy for humans to do
bayesian categorization?
signal vs noise
average median min / max horrible things to do
spikes on an api due to cronjobs
graph stdeviation of data as well
need more data more often to lower p value
or from more places - cpu usage on 1 server vs 500
mesure residuals from a mean
cyclical data - fourier transform can remove the cycles
exponential windowed / weighted mean
cant do this on historic data
sliding windowed mean
huge window must be kept in memory
lurching window - 3day buckets - 2 days, yesterday today
each day is exponential weighted mean of the data.
Cusum test / method
tukey test is another one looking at
minus a std dev - leads to negative answers
your data is not a normal distribution
if we could all the data on perfomrance, cna change our stats
information compression - 1019 goes in 1000 bin with 1056, 2018 in 2000 bin with 2431
summarize as a histogram over 1 minute
once summarized, no longer a distribution
http://www.brendangregg.com/FrequencyTrails/modes.html
5/5
not many people have heard of preloader
added to browsers in 2008
browser parses into a dom tree
used to work like this
once dom resource url was seen, was added to list of resources to download
script elements are more important than other elements - cause parser to halt when discovered, as executing could change the dom. Also had to wait for any css resources to finish.
This is slow!
minimise number of resources, scripts at bottom were workarounds for preloader - still good practice but impact not as high.
different terms in different browsers - preloader is vendor neutral
"the greatest browser optimisation of all time" - steve souders
keeps looking at html even while parser is halted on script execution - speculatively downloads things
preprocessing
tokenization
parsing
preloader between tokenization and parsing
more about why script blocks the dom - document.write, create element, style queries (hence waiting for css download)
what can preloader speculatively download? - external css, images
@import - only webkit
video poster - firefox only
other things are not preloaded
input images
iframe
object
link rel=import
video, audio
css based resources - webfonts background images not preloaded
js based resources - cannot be preloaded (by definition)
priorities -
context based priority
css scripts visible images non visible images
higher priority for resources in
preloader has no spec, is not an api - dont rely on its behaviour
20% typical improvement on average
critical resources must be in markup - not scripts or css (for critical rendering path)
non critical resources - maybe not - consider taking out of markup
bottom scripts can bubble up the waterfall due to preloader
dont invalidate the dom - rewrite base element, add comments
set charset in http headers (at least in IE)
assume nothing about loading order, dont rely on it!!!!
dont rely on js cookies for image requests - image requests may have been made before js executed
resource hints - link rel=preload / preconnect
get it to download webfonts, prefetch dns make ssl connection to other domain
future -
more reources, iframes, link rel=import, video poster, input image
css support (imports)
add support for fonts
resource priorities - smarter unimportant scripts, images etc
http2 - sends all resources and priority
server can implement its own preloader
resource priorities proposal
thinking fast and slow - daniel kahneman
projection bias and switching roles
covers planning poker in detail
bandwagon effect - people believing something because other people do
sunk cost
cutting corners - hyperbolic discounting - prefer reward that arrives sooner
fundamental attribution error - emphasis on internal characteristics rather than external - blameless post mortems avoid this
everyone has good intentions, intuition is not perfect
bayesian reasoning
5/5
The Machine is Dead, Long Live the Machine! - Service Resilience and Deployment Automation at The BBC Yavor Atanasov (BBC)
previously - ops / dev separation
longer release cycle
now - no hard limit on technologies, mix up devs / ops, cont delivery
teams own infra and deploys, choose tech used
60000 deploys in 18 months
2 day release process to 10 mins
fat vs thin containers
interested in containers but not using them
use mock tool or docker containers to ensure clean environment
packages same across all envs, use config to change them
full images vs base images
2 snapshots, one for image with software, one on top with config as well.
use cloudformation for deploys
templates versioned along with code
can apply same template to different environments
use troposphere to do this
separate templates for stateless and stateful templates
isolate instances and networks
make sure api limits / resource limits not exceeded
sec groups isolate instances
subnets and acls to isolate groups
vpc
you should use auto scaling groups - multi AZ
use chaos monkey
deployment by updating the image id of autoscaling group
vpn through vpc for ssh access
rum and synthetic give very different sets of numbers. Why? Should we report both?
normal distributions again - real users are definitely not normally distributed, extremes have effects on whole dataset
empty cache and repeat views - real users will be in between
what is the 1 number to comm back to org?
too many metrics
similar to web analytics
different people will have different important numbers
ceo - revenue vs perf ops - page load fe dev - start render design - speed index
revenue per second of page load is a good metric for ceo
competitive benchmarking - synthetic only
should show relative numbers, accuracy of synthetic may not matter
guardian responsive site is a good example
key point - simplify data from many metrics to display
huffington post removed all 3rd parties added back one by one, to count requests
set SLA using RUM, but be specific
median and 95th percentile for visitors in the USA
use synthetic to set budgets - js size / css size
rum is good for showing the reasons to use a cdn
work needed to relate site changes to rum / synthetic measures
single page apps make it hard - synthetic only measure first load, not xhr
rum - page or service xhr? dilute overall numbers by counting xhr as a page
designers want to know when does my site become usable
usable somewhere in between first render and page load
easyish to see in film strip view of webpage test, difficult to measure
user timing with js events in synthetic can help with this