- Need to make sure ember references are cleared after a request is sent back.
- http://stackoverflow.com/questions/5326300/garbage-collection-with-node-js
- http://blog.caustik.com/2012/04/08/scaling-node-js-to-100k-concurrent-connections/
- http://dtrace.org/blogs/bmc/2012/05/05/debugging-node-js-memory-leaks/
- http://www.scirra.com/blog/76/how-to-write-low-garbage-real-time-javascript
- http://benoitvallee.net/blog/2012/06/node-js-garbage-collector-explicit-call/ (
--expose_js and call global.gc()
) - http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js/
- http://www.ibm.com/developerworks/web/library/wa-memleak/
- http://stackoverflow.com/questions/5733665/how-to-prevent-memory-leaks-in-node-js
- http://blog.caustik.com/2012/04/10/node-js-w250k-concurrent-connections/
- http://stackoverflow.com/questions/5903675/node-js-garbage-collection-event-or-trace-gc-to-stderr
- http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/
- http://stackoverflow.com/questions/9941374/node-js-gc-mark-compact
- https://groups.google.com/forum/?fromgroups#!topic/datamapper/M7BMUe8AjlY
- http://s3.mrale.ph/nodecamp.eu
- https://github.com/TooTallNate/node-weak
- manually trigger gc:
node --expose_gc --nouse_idle_notification tmp/leaks.js
- https://groups.google.com/forum/?fromgroups#!topic/nodejs/BO6JdYi4n2k
node --expose_gc --nouse_idle_notification --trace-gc tmp/leaks.js
- https://github.com/raganwald/homoiconic/blob/master/2012/03/garbage_collection_in_coffeescript.md
- precise generational garbage collection
- https://developers.google.com/chrome-developer-tools/docs/heap-profiling
- https://developers.google.com/chrome-developer-tools/docs/heap-profiling-dominators
- http://lostechies.com/derickbailey/2012/03/19/backbone-js-and-javascript-garbage-collection/
- http://stackoverflow.com/questions/3788805/garbage-collection-and-javascript-delete-is-this-overkill-obfuscation-or-a-g
- https://developer.mozilla.org/en/JavaScript/Memory_Management
- http://stackoverflow.com/questions/6297007/javascript-anonymous-function-garbage-collection
- http://stackoverflow.com/questions/864516/what-is-javascript-garbage-collection/864544#864544
- http://stackoverflow.com/questions/7347203/circular-references-in-javascript-garbage-collector
- question: for $.ajax(error: fn, success: fn), if one of the closures doesn't execute does the reference still exist?
- http://stackoverflow.com/questions/4324133/javascript-garbage-collection
- http://nodeguide.com/convincing_the_boss.html
- http://www.acunote.com/blog/2008/01/garbage-collection-is-why-ruby-is-slow.html
- Further, simply allocating memory is relatively expensive, and that will also show up in profiler output. [which is why reusing objects as much as possible is helpful]
- http://stackoverflow.com/questions/6480148/is-there-a-better-solution-than-activerecord-for-batch-data-imports
- http://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
- http://www.williambharding.com/blog/uncategorized/rails-3-performance-abysmal-to-good-to-great/
- identity map removed in rails: https://github.com/rails/rails/commit/302c912bf6bcd0fa200d964ec2dc4a44abe328a6
- http://mongoid.org/en/mongoid/docs/identity_map.html
The problem in Rails was that if you have post.comments.first.update_attribute('post', null)
, and Post.destroy(post.id)
, it will destroy the comment, which you just changed postId to null so it shouldn't be available to the post. To fix this you need to remove the comment from the comments array after its postId is changed. So you need to have a map of the comment to the associations it's in (the cursors), and just iterate through them and remove it. Basically, just iterate through all cursors for the comment when a property changes on it that's part of its observableFields
, and if it doesn't match anymore, remove it from the in-memory array. This way, when you find the post again with Post.destroy()
, which returns the in-memory post, it will have the post.comments
association, but the comment won't be in there, so dependent-destroy won't have effect. Also, dependent-destroy
shouldn't even be affecting this... it should realize the comment.postId is null now.
Answer. So, global identity map scoped to the current request. Attach the request to the controller and vice-versa. Do App.Post.with(@)
, which initializes an identity map on the request. Have the identity map keep track of all the cursors and models instantiated in the request. Then after the controller responds, after any after
callbacks, everything in the identity-map is cleared from memory with Ember.Object#destroy
. If you have some async callback after the request has been written (say, doing a streaming operation or progress indicator), then it's up to you to fetch the records again. Instead of doing that, you should create a background job and pass the current user's socket id so you can send them messages through the already-instantiated web socket. This frees up the controller and everything in the identity map for garbage collection, making room for the next request.
Rails identity map is cleared when a request is closed. rails/rails#6524
You can keep a global hash pointing to all of the controllers instantiated, and if it or a model has not been accessed within some interval of time, it destroys it. This way, you could store all ember guids in a global list and refresh a timer if any of the objects have been accessed in less than a specified interval ['__ember__guid_1232181', '__ember__guid_1232132', ...]
, otherwise it will iterate through the list, find the objects by that id, and delete it.
Perhaps we could also keep a global object pool of instantiated models of each type, just so the server only needs to swap the attributes out. It might be cheaper to just delete them and start over, but maybe it's better to have like a million objects in memory, and swap them out.
The computed properties are what we need to worry about on the server. If they return another model instance, then there is potentially a memory leak if they are not destroyed, no? Hmm... If there are no objects pointing to either of those records (circular referencing each other), will there be a memory leak? That is, if they are "unreachable", shouldn't they be garbage collected? Need to test.
What about variables in the controller, do they need to be garbage collected?
How about the .instance()
property for the current controller on the server? (don't think we're even using that).
Need to set up sort of debugger/logger for the properties watched in Ember (or all the event listeners) on the server.
Need to clear out the cursor.data
property.
You want your requests to return as quickly as possible so the javascript can be garbage collected. Then run processor-intensive functions in a separate process. How do you then do things like streaming back progressive file upload data? Maybe in this case you have a deallocate
function that you can run when you start your long-running process. Or you can get access to the socket for the user from a background job! Tower.connections[job.data.socketId]
. To make this work we'll have to message via the command-line and hook.io to the socket.io server. Unless there's some way to run the worker alongside the job.
If this happens in the controller, will the controller be garbage collected (and all properties on it), even if the function it calls internally is long-running?
class App.AttachmentsController extends App.Controller
create: ->
App.Attachment.create @params, (error, attachment) =>
# Say this is non-blocking but takes about a minute, will everything except the attachment be garbage collected?
# Probably not, which is why you want to start up background processes.
# So, this function should create a background job, passing the currentUser id, which we can use to search the sockets
# for the socket, which we can use to send data back, all in a separate process so the
# request/response cycle can be freed up and garbage collected.
attachment.processAndUploadInBackground()
@render json: attachment
- There is no explicit garbage collection code for the current HTTP request, so it must be getting cleaned up.
You can set the Ember guid to the model guid!
record[Ember.GUID_KEY] = databaseRecord._id.toString()
Then maybe whenever you call Ember.guidFor
and it matches the object id, then maybe you can pass that into the identity map.
Ember.destroy
: Tears down the meta on an object so that it can be garbage collected.Ember.Object.create().destroy()
: Destroys an object by setting the isDestroyed flag and removing its metadata, which effectively destroys observers and bindings.Ember.Object#willDestroy
: called the frame before it will actually be destroyed.Ember.Object#didDestroy
: called the next frame, just after all metadata for it has been destroyed.
- https://github.com/chrisa/node-dtrace-provider
node-inspector --web-port=8989
- http://dtrace.org/blogs/dap/2012/04/25/profiling-node-js/
npm install -g stackvis
sudo dtrace -o stacks.out -n 'profile-97/execname == "node" && arg1/{ @[jstack(100, 8000)] = count(); } tick-60s { exit(0); }'
npm install memwatch
(https://github.com/lloyd/node-memwatch)- http://stackoverflow.com/questions/5718391/memory-leak-in-node-js-scraper
- http://www.unix.com/man-page/OpenSolaris/1/mdb/ (referenced a lot in dtrace's blog)
- http://dtrace.org/blogs/dap/2012/01/13/playing-with-nodev8-postmortem-debugging/
brew install mdbtools
Better way to remove items from away (not adding to the garbage collector by creating new arrays):
for (var i = index, len = arr.length - 1; i < len; i++)
arr[i] = arr[i + 1];
arr.length = len;
- http://nosql.mypopescu.com/post/13493023635/rails-caching-benchmarked-mongodb-redis-memcached
- https://github.com/SFEley/mongo_store
- Cache queries in mongodb with
cursor.toParams
: http://stackoverflow.com/questions/5709773/how-to-cache-a-query-in-ruby-on-rails-3 - http://www.mongodb.org/display/DOCS/Caching
- https://github.com/jnunemaker/bin/blob/master/lib/active_support/cache/bin.rb
- http://www.quora.com/Is-MongoDB-a-good-replacement-for-Memcached
- "Are you caching data that would benefit more than just a key-value store? Now we're talking. This plays directly into the strengths of MongoDB, and takes memcache where it wasn't really intended to go."
- that means we can store the results from multiple computations (getting a groups/tweets/memberships) and cache it potentially.
- http://stackoverflow.com/questions/5465737/memcache-vs-java-memory
- https://github.com/mape/node-caching/
- http://www.mongodb.org/display/DOCS/Caching
- http://www.quora.com/Which-is-a-better-choice-for-a-web-analytics-service-Redis-or-MongoDB
- http://openmymind.net/2011/5/8/Practical-NoSQL-Solving-a-Real-Problem-w-Mongo-Red/
- http://redis.io/commands/expire
- http://stackoverflow.com/questions/4188620/redis-and-memcache-or-just-redis
- https://github.com/jodosha/redis-store/blob/master/redis-store/lib/redis/store/marshalling.rb
- http://stackoverflow.com/questions/10558465/memcache-vs-redis
- http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster/
- "Send message to invalidate friend's cache in the background instead of doing all individually, synchronously."
- http://www.codypowell.com/taods/2012/01/the-beautiful-marriage-of-mongodb-and-redis.html
- store tweets in both redis and mongodb: "Was it faster to pull the app ids from Redis, use that to pull the documents from MongoDB, then use Python to reorder everything? Actually, yes. Thus far, getting from the cache takes 1/3 of the time that it did before. Meanwhile, adding to the cache is essentially free."
- http://stackoverflow.com/questions/10317732/why-use-redis-instead-of-mongodb-for-caching
- http://broadcastingadam.com/2011/05/advanced_caching_in_rails/
- http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html
- http://stackoverflow.com/questions/7888880/what-is-redis-and-what-do-i-use-it-for
- http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html/
- Precomputed queries! All cursors should store ids of sorted/matching records in redis.
- Use redis to store sorted sets up to 10000 records each (first 50 pages).
- http://stackoverflow.com/questions/10205635/redis-filter-by-range-sort-and-return-10-first
- http://playnice.ly/blog/2010/05/05/a-fast-fuzzy-full-text-index-using-redis/
- http://openmymind.net/Paging-And-Ranking-With-Large-Offsets-MongoDB-vs-Redis-vs-Postgresql/
- http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
- http://patshaughnessy.net/2011/11/29/two-ways-of-using-redis-to-build-a-nosql-autocomplete-search-index
- https://github.com/seatgeek/soulmate
- https://github.com/seatgeek/soulmate/blob/master/lib/soulmate/matcher.rb
- http://redis.io/topics/twitter-clone
- http://www.quora.com/Redis/How-efficient-would-Redis-sorted-sets-be-for-a-news-feed-architecture
- http://santosh-log.heroku.com/2011/05/21/relationlike-redis/
- https://github.com/smrchy/redis-tagging
- http://dr-josiah.blogspot.com/2011/02/some-redis-use-cases.html
- redis for rate limiting
- nested sets (trees) in redis: https://groups.google.com/forum/#!topic/redis-db/IsLJ4PlBo9E/discussion
- https://github.com/rediscookbook/rediscookbook
- http://openmymind.net/Data-Modeling-In-Redis/
- http://knokio.com/data/analytics-with-redis/
Use MongoDB to store the details (membership.createdAt, membership.role, etc.) but use redis just to map the ids (user.membership_ids, user.group_ids).
You want to store all of these ids in redis so you can do fast writes as well! So every time a user posts a tweet, it instantly can grab all users following that group (pure redis query) and push that tweet id into their feed (even if they have 1 million followers redis can do that in 10 seconds). And twitter probably only pushes it into the users that have been recently active (so if you come after not coming for a month or whatever, you have to wait for it to load). In that case, it has to fetch all the users you followed and compute the ids (grab all follower ids for user from redis, then grab each user timeline's latest tweets, and add them to this users home timeline).
- twitter stream algorithm
- http://www.vijaykandy.com/2012/03/autocomplete-using-a-trie-in-redis/
- http://stackoverflow.com/questions/11095331/best-solution-for-finding-1-x-1-million-set-intersection-redis-mongo-other
- http://stackoverflow.com/questions/11441293/incrementing-hundreds-of-counters-at-once-redis-or-mongodb
- https://gist.github.com/896321
- https://github.com/twitter/snowflake
- twitter's recommendation engine
- http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html
- http://blog.waxman.me/how-to-build-a-fast-news-feed-in-redis
- redis can do 100,000 writes per second: http://redis.io/topics/benchmarks
- http://serverfault.com/questions/237505/what-hardware-makes-a-good-mongodb-server-where-to-get-it
- http://www.quora.com/Twitter-Trends/What-is-the-basis-of-Twitters-current-Trending-Topics-algorithm?q=trending+algorithm
- http://engineering.wattpad.com/post/20902824042/using-redis-pipeline-to-write-news-feed
- http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed?q=news+feeds
Determining what data to store depends on your front-end (including what activities your users participate in) and your back-end. I'll describe some general information you can store. Italics are special, optional information you might want or need depending on your schema.
Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)
user_id - user who generated activity source_id - record activity is related to activity_type - type of activity (photo album, comment, etc.) edge_rank - the rank for this particular activity parent_type - the parent activity type (particular interest, group, etc.) parent_id - primary key id for parent type data - serialized object with meta-data
- http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
- http://www.brianfrankcooper.net/pubs/follows.pdf
- http://www.slideshare.net/nkallen/q-con-3770885
- https://github.com/paulasmuth/recommendify
- very good insights: http://engineering.twitter.com/2011/05/engineering-behind-twitters-new-search.html
To support relevance filtering and personalization, we needed three types of signals:
-
Static signals, added at indexing time
-
Resonance signals, dynamically updated over time
-
Information about the searcher, provided at search time
-
https://github.com/ryanking/earlybird/blob/master/earlybird.rb
-
spiderduck: http://engineering.twitter.com/2011/11/spiderduck-twitters-real-time-url.html
-
kestrel (message queueing system for twitter): https://github.com/robey/kestrel
-
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
These servers use a specialized ranking function that combines relevance signals and the social graph to compute a personalized relevance score for each Tweet.
Twitter is a complex yet elegant distributed network of queues, daemons, caches, and databases.