Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 0xdevalias/6f342d265f02f936cffe9c63b9949420 to your computer and use it in GitHub Desktop.
Save 0xdevalias/6f342d265f02f936cffe9c63b9949420 to your computer and use it in GitHub Desktop.
Some Notes and Insights from a JavaScript Reverse-Engineering Email Exchange

Insights from a JavaScript Reverse-Engineering Email Exchange

The following are some notes / insights that I captured from a recent JavaScript Reverse-Engineering email exchange related to forking a GPL licensed project.

This is shared here with the intent of making the more generic aspects of the knowledge shared more reasily accessible to others; and as such, some of the more specific project details have been REDACTED or ..snip..'d, as they are irrelevant to this goal. I also only included messages from the email thread up to the point where the discussion remained broadly useful in contributing to general knowledge worth sharing / referring back to in future.

Table of Contents

Email Thread

1 - X -> Me - Initial Reach Out

Hi Glenn,

I saw you are active on JavaScript reverse-engineering projects on Github, and I'm wondering if you're available for hire. I have a project to pitch to you if you're interested.

Regards

X

2 - Me -> X - Open to hearing more

Hey X,

I’m definitely open to hearing what you have in mind and whether it would be a good fit.

Are you able to give some more details on the project?

Cheers

Glenn 'devalias' Grant

3 - X -> Me - Project Overview

Thanks for getting back to me, Glenn!

I and a decent amount of other website builders and developers find ourselves in a difficult position. We were using REDACTED, to build quite a few, sometimes large scale websites. Last year, the developer folded the project [..snip..] and shut down all forms of communication and support. This may be the end of the road, but I'm exploring our options for creating an open source fork.

REDACTED explicitly claims a GPLv3 license in the distributed plugin file, with no exceptions or delineations. As I understand it, that obligates the developer to make the non-obfuscated source code available, but he has refused to respond to even discuss this. That's why I'm looking into the feasibility of reverse engineering the React code base. I've written a project description, here, but haven't posted it yet as I decided to reach out to domain experts first. This seems like an incredibly niche specialty!

Please let me know your thoughts and if you have any recommendations and/or would be open to tackling this. I don't think it would be difficult to raise funds if we know a domain expert is on the line.

Regards

X

4 - Me -> X - Looking into licenses / EFF / FSF / etc + Overview of *.js files + Upstream webcrack / wakaru issues for limitations

No worries :) I'll admit, I was a bit curious / suspicious based on the original email, but it definitely sounds more legit now that I've seen more details about it.

Having a bit of a skim through the posts about it, it sounds like an unfortunate situation on both sides:

  • REDACTED

I'm definitely no lawyer / legal expert, but from my very casual knowledge of it, being GPLv3 is definitely helpful for the plan you're exploring. I see that the website also explicitly states this too:

  • REDACTED
    • In particular, REDACTED and/or the software thereto related are provided under a GNU GPLv3 license, allowing Users to access and use the software’s source code.

I didn't look too deeply, but I downloaded their free plugin and had a quick skim, and it looks like some of the REDACTED source files have explicit 'License' headers too, a few I saw being GPL2+. And some other random files being LGPL-2.1-or-later, etc. I would definitely recommend getting some kind of official legal advice on that to be sure of where you stand/any potential boundaries/pitfalls to be aware of. I wonder if the EFF might be helpful in that regard?

From a quick google, it also seems like the Free Software Foundation (FSF) has a way to report GPL violations; which might be another potential avenue to explore if you aren't already? A few resources:

I downloaded the free REDACTED package from:

  • REDACTED

From a quick skim of the free package, it looks like most if not all of the REDACTED is already unminimized/commented/etc, so that's definitely a good start (as that's not an area I have explored as deeply as the JS ecosystem side of things). Looking for JS files, there appear to be a bunch:

  • fd --glob '\*.js'
    • assets/js/
      • REDACTED: a list of related *.min.js/*-[hash].js/*.js/etc files
    • build/
      • REDACTED: a list of related *.js files
    • core/assets/js/
      • REDACTED: a list of related *.js files
    • core/includes/js/
      • REDACTED: a list of related *.js files
    • core/lib/
      • REDACTED: a list of related *.js files

Of those:

I didn't look at specifics of how these JS files get used by/play into the larger application.

Curious if you have reached out to anyone else about the JS reverse engineering aspects of this project yet, and if so, what their advice/interest/etc was like?

I'm definitely intrigued from an 'interesting challenge' and 'open source' angle of things; though at this stage I'm not really sure how to estimate just how big a task it would be in its entirety; and would be hesitant to take that on as a sole responsibility because of that. And similarly, while at a high level it looks like it should be fine from a legal/licensing perspective, having some 'actually a lawyer / legal expert' type clarity around that would definitely be useful as well.

I was going to ask if/what sort of budget you had in mind for this project, but I guess you sort of addressed that indirectly in your last line speaking about raising funds/etc. Curious, what did the plugin cost/how widely used/etc is it? And are there other similar plugins out there that would be able to 'fill the gap' if this one were to die?

Either way, I would definitely be interested to be kept in the loop about progress/potentials; particularly around any other 'domain experts' that may be interested, and/or any suggested tooling/approaches/etc to doing so.

Cheers

Glenn 'devalias' Grant

5 - Me -> X - Additional upstream webcrack issues

A couple of other upstream issues/comments tangentially related to this; unsure if they will end up being solved within core, or if they will require more specific plugins to be developed for them; but figured I'd provide the references regardless.

I was doing the original deeper exploration in this issue comment (and related previous ones), to try and understand why the JSX didn't seem to be decompiling properly within webcrack:

And then when I realised that there may not be adequate solutions to that within webcrack core, I created these additional issues for potentially solving it within more specific plugins:

  • j4k0xb/webcrack#151
    • [plugin] plugin to support WordPress Gutenberg specific blocks features (including how it injects window.React, window.wp.element, etc) within JSX decompilation

  • j4k0xb/webcrack#152
    • [plugin] plugin to support unminifying goober CSS-in-JS library patterns + related JSX decompilation

Glenn 'devalias' Grant

6 - X -> Me - Legal & Strategic Considerations for Forking + Next Steps

Hey Glenn!

Woah! You've gone above and beyond! I'm enthralled by your insightful analysis and initiative here!

You're absolutely right about the unfortunate situation. I wish no more trouble of any sort on REDACTED, which is the main reason I'm not interested in putting legal pressure on him to honor the GPL. He's clearly facing immense ..snip.. challenges, and I have no desire to compound his suffering. I would be quite open to obtaining more legal counsel in this situation, but a straightforward reading of the facts, which you've outlined well, point to being well within our rights to fork the plugin. The GPL, in particular, is painfully explicit about these rights, and this is compounded by the statement in the terms, which condenses the GPL into the phrase you captured. If a non-standard license were chosen or the (somewhat common in REDACTED) split licensing between REDACTED and JS code, I'd have no interest in pushing this forward. Obviously, REDACTED have trademark rights here, and it's understood that we'd need to rename the fork, which is more than reasonable.

I've taken the initiative to create a fresh Internet Archive snapshot of the Terms and Conditions page, in order to strengthen the legal standing of this project. As I see it, we would have reasonable footing to pressure REDACTED to release the source code, but that's not appealing to me. REDACTED would have very little legal standing to litigate against a fork. That's my reading, and I'd be more than happy to see if the folks at FSF or EFF would be willing to weigh in, especially if we proceed with the project.

As for alternative to REDACTED: There certainly is no lack of REDACTED out there! The biggest problem is that each of them, to this point, operate within REDACTED as information silos. A site built with one needs almost completely rebuilt to migrate to another (especially in the case of REDACTED, where the tool was not only used for REDACTED but also REDACTED and vast swaths of user-facing functionality as well.) While that's regrettable, it's the state of the ecosystem and leaves existing REDACTED site investments in a precarious position. I fully expect the next major version of REDACTED to be incompatible with the current build of REDACTED, leaving site builders to either maintain unsupported REDACTED sites, or of course, rebuild from scratch. Besides that, REDACTED occupies a unique position in being so tightly bound to the REDACTED editor combined with a "blank canvas" approach where site builders start with a clean slate rather than a bundle of pre-packaged design opinions. In my opinion, it's everything the REDACTED editor should be but isn't (yet), and perhaps never will be. As long as REDACTED was willing to drive this forward, I and the rest of the community were quite willing to support him, but since he's shutting things down, I believe it's time for others to take this forward.

I have not contacted other domain experts yet. It seems from your response that you're placing the most stock in webcrack for a project like this. Are there others in that community that you'd recommend I loop into this? Also, I understand that this is somewhat large proposition, and perhaps I've packed too much into the initial project proposal. Refactoring the codebase with strong typing (Typescript or JSDoc) would be excellent to have eventually, but perhaps it's too much to include in the decompiling kickoff. If all we're asking is for a reversal of the bundling and decompiling of the React code back to JSX sources, how much does that change the scope, in your opinion?

The webcrack issues are great, and represent your profound ambition and initiative. Is sponsoring the development of those features the bulk of the work that would need to be performed? If so, who would be best positioned to develop these features and do you have any indication of how sponsoring the work could be facilitated?

As for budget, I really don't know. We need to scope out the project and then pitch it to stakeholders who are willing to pitch in. I've looped a few other stakeholders into this thread, to facilitate the discussion.

Thanks again and kind regards.

X

7 - Me -> X - Deep Dive Mind Dump (AKA: Legal, Technical, Strategic, and Collaborative Considerations + Tooling / Methodology + Scope + Funding + etc)

Sorry for the verbosity of the incoming mind dump below, but hopefully its helpful.


You're absolutely right about the unfortunate situation. I wish no more trouble of any sort on REDACTED, which is the main reason I'm not interested in putting legal pressure on him to honor the GPL.

_nods_, I totally understand and agree with that.


I've taken the initiative to create a fresh Internet Archive snapshot of the Terms and Conditions page, in order to strengthen the legal standing of this project

That seems like a good approach, though I notice that the snapshot initially seems to load as just a heading:

  • REDACTED: screenshot of the snapshot described above

Which looking a little deeper, seems to be a 'iubenda' embed to the terms and conditions loaded from another site; though thankfully that also seems to be snapshotted too:

  • REDACTED: screenshot of the snapshot described above

And then eventually it does seem to load the content, as an embed of sorts loaded from that external site:


The biggest problem is that each of them, to this point, operate within REDACTED as information silos. A site built with one needs almost completely rebuilt to migrate to another

I fully expect the next major version of REDACTED to be incompatible with the current build of REDACTED, leaving site builders to either maintain unsupported REDACTED sites, or of course, rebuild from scratch.

Ah, ok; yeah, that definitely makes a lot of sense then; and understandable why there would be a strong desire for a solution that gives a path forward without needing to completely rewrite those sites from scratch with a different site builder/set of blocks/etc.


It seems from your response that you're placing the most stock in webcrack for a project like this. Are there others in that community that you'd recommend I loop into this?

As far as main tools, my go to's in this space are:

Then from the outputs of those tools, there's usually further manual analysis required to understand what code relates to libraries, which libraries they are, etc. I'll often leverage GitHub Code Search to help with that process, and sometimes also just ChatGPT/Claude/similar AI/LLM tools to help understand or rewrite sections (but they can be far less robust than decompilers that work directly with AST manipulations, as the LLM's can hallucinate/make things up, so anything they generate needs to be verified); but it's still a pretty manual process all in all. You can see some examples of my process for that in some of the past 'module-detection' issues I created on wakaru's GitHub:

That's an area of things where I feel like the current tooling isn't ideal, and I have a bunch of vaguely half formed ideas around how a new tool could be created to make that process better/more automated; but for now that only lives in my head/my scattered notes; eg.

And then there is a bunch more knowledge/ideas/snippets/etc scattered across some of my related deepdive gists, such as:

wakaru used to be my main go-to modern JS decompilation tool, but the developer has been pretty busy elsewhere so hasn't had much capacity for the project in the last 6 months or so. webcrack's developer is pretty active with the project, and where it makes sense to solve in the core tool (rather than something more niche), they are usually pretty responsive with fixes/improvements. humanify is more of a niche tool, it relies on webcrack still for the main unpacking/etc, but then leverages ChatGPT / other similar LLM's to come up with useful unminified variable names; though it's main developer has also been similarly pretty non-active on the project in recent times, presumably due to other commitments/etc.

As for other domain experts who might be interested, I'm not sure if webcrack's developer would be, but they would be my first thought as a potential. Off the top of my head there's only a couple of others I can immediately think of that might have interest/capacity; but I mostly only know them through online channels. And I'm sure there would be others out there that I don't know currently. Also maybe one or two people I know IRL, who while I wouldn't call them domain experts at all, would potentially be interested in learning from my process if I was to work on it at all; and could potentially delegate some aspects of the tasks to based on those learnings.


Also, I understand that this is a somewhat large proposition, and perhaps I've packed too much into the initial project proposal. Refactoring the codebase with strong typing (Typescript or JSDoc) would be excellent to have eventually, but perhaps it's too much to include in the decompiling kickoff. If all we're asking is for a reversal of the bundling and decompiling of the React code back to JSX sources, how much does that change the scope, in your opinion?

I hadn't thought too deeply about it at the start, but I agree that the main 'decompile back to useable source code' aspect of the project feels like it should probably be distinct from the 'clean up codebase and make maintainable for future' aspect. Obviously an acceptance criteria for the decompiled part would be that it's brought back to a state that resembles what the original code would likely have looked like (not just a prettier formatted version of the minified code, etc). I also feel like separating the decompile from the maintenance aspect would make it easier to find people for the job; as the decompile aspect is more niche, and they may not be interested in the larger maintenance side of things (or even if they were, their skills may come at a higher cost); whereas once decompiled to a relatively 'normal' sort of codebase, then theoretically any JS type web dev should have the skills to work on it, so you'd likely have a larger pool of potential contributors and maybe lower costs because of it.

Sticking to just the decompiling/etc aspects.. it certainly removes some of the 'additional burden'; and at least for me personally, aligns closer to where the 'interesting challenge' part of the project would be for me. Even with that downscope.. I still feel like it's not going to be the simplest of things; as across those JS files, some of them are still pretty big; and it's hard to know how much of that code is just compiled in dependencies from external libraries, vs actual code relevant to the project. And even if a bulk of it is from dependencies, there's still a bunch of manual effort in determining what those are, and then stripping them out to just imports/etc.

You can get a vague idea from the file sizes of the different JS files; while many of them are quite small (4K), there are some that are much larger (eg. build/index.js is 7.2M, and core/includes/js/REDACTD/build/index.js is 1.7M, core/includes/js/REDACTED/build/index.js is 804K, core/includes/js/REDACTED/build/index.js is 268K, assets/js/REDACTED/dist/REDACTED-[hash].js is 152K, assets/js/REDACTED.js is 140K, etc). Some of those files (eg. in assets/) might be purely external dependencies, in which case we wouldn't need to decompile them so much as just figure out what they are/where they come from/etc. And even in those larger bundled files, I suspect a good chunk of that code is likely also pulled in from dependencies.

⇒ find . -name '\*.js' -exec du -h {} \\

1.7M ./core/includes/js/REDACTED/build/index.js
268K ./core/includes/js/REDACTED/build/index.js
..snip..
804K ./core/includes/js/REDACTED/build/index.js
..snip..
7.2M ./build/index.js
..snip..
140K ./assets/js/REDACTED.js
..snip..
152K ./assets/js/REDACTED/dist/REDACTED-[hash].js
..snip..
4.0K ./assets/js/REDACTED.min.js
4.0K ./assets/js/REDACTED.min.js
..snip..

I'm not sure if just by looking at those files, someone like yourself who is more familiar with the usage of REDACTED would know which bits relate to what functionality/etc; as it may make sense to break down the decompilation task even further. For example, identifying which are the most important / core files that would make/break the project; or identifying where there is a logical boundary between the code that could provide benefit without needing to work through every part of it (eg. if a block operated in a standalone way, then perhaps those blocks could be decompiled/converted as individual deliverables/etc)

It's probably also worth noting that all of the above analysis has been done on the free download of REDACTED_vREDACTED-free.zip; so if there are more files/features in the non-free version of the plugin, then that could impact the scope of the reversing work further again. Out of curiosity, would it be possible for you to share any versions of the full plugin you have access to with me? While I would probably suggest focussing the bulk of the reversing effort on the latest version (for obvious reasons), sometimes older versions can have useful hints/clues; for example, where more files may have been included than intended, or they weren't minified properly or similarly; that might leak useful info.


Another tangential thought here, but it would probably make sense to ensure that there are appropriate wayback snapshots of the REDACTED documentation as well; and eventually it would probably make sense to extract that from it's current 'compiled GitBook' format, back into some semblance of the original markdown 'source' it would have been generated from:

Obviously that wouldn't need to be the same task as the main plugin reverse engineering effort, but it would be tangentially beneficial to it in the long run. I don't know if there are more specific tools for the task, or how hard it might be to create one (given I believe GitBook follows certain conventions in file/folder structure/etc that maps through to how the generated content is laid out); but at least for the bulk of the HTML -> Markdown aspect of it, you can likely leverage pandoc:

And if needs be, pandoc supports it's own AST based filtering process that allows for customising how the data is transformed as well; here's one random example from my dotfiles:


The webcrack issues are great, and represent your profound ambition and initiative. Is sponsoring the development of those features the bulk of the work that would need to be performed? If so, who would be best positioned to develop these features and do you have any indication of how sponsoring the work could be facilitated?

Unfortunately, as sort of alluded to above, those webcrack plugin issues (1, 2, 3) definitely wouldn't be the full extent of the work. They're just a few aspects I noticed while looking at one of the files; mostly with a mind to how to improve webcrack core for edge cases it doesn't seem to handle well currently that were preventing some of it from being reversed as fully as it otherwise should have been.

I'll also note that while those issues are phrased in terms of creating plugins to do the reversing work for some of those libraries/dependencies, for this project, it doesn't necessarily need to be automated through a plugin. There are likely going to be some cases where a plugin would provide outsized benefits; and other cases where manual reversing effort will likely be 'good enough' to figure out/convert it back to what it needs to be. webcrack's playground / online IDE has a few features related to assisting manual reversing efforts beyond what's possible with the standard fully automated approach too.

As far as sponsoring open source work, usually I would be looking to see if the project/developer is setup for GitHub Sponsors/similar, or has some kind of 'funding' type details in the project repo/similar, or even something like OpenCollective:

There was also a concept of 'issue bounties' for open source projects at one stage, I see that one of them (Bounty Source) has since shut down, but from a quick search, here are some others in that space. I haven't used them/don't know how good/useful they are; and likely they would take their own cut of fees too (like starting at 19% for Algora seemingly), so would probably still be better off doing something like OpenCollective and then paying out $$ from that more manually:

And then, if none of those exist / are relevant / useful, just generally directly engaging with the project developer to see if they are interested in being involved in a more traditional contracting/etc related fashion.

A bunch of those methods might even be beneficial for how you approach this reverse engineering/open source project in general. For example, perhaps relevant stakeholders would be interested in contributing to the efforts via GitHub sponsors to a GitHub organisation set up for the open sourced version; or for an Open Collective for it or similar.


As for budget, I really don't know. We need to scope out the project and then pitch it to stakeholders who are willing to pitch in. I've looped a few other stakeholders into this thread, to facilitate the discussion.

_nods_ that makes sense. I think I saw somewhere when I was reading up about the background of REDACTED shutting down that the original pricing was something like ?US?$100 for a lifetime licence. So from that angle alone, I would suspect the costs to reverse engineer the project wouldn't likely be worthwhile. But then the real cost probably comes in, as you mentioned earlier, with how much it would cost in time/effort for everyone who currently has a site built on top of REDACTED to rebuild/transition away from it; either to allow for continued development of the site; or when the current version of REDACTED breaks due to a REDACTED upgrade or similar. The specifics of all that side are sort of beyond my area of expertise; but I'd be interested to hear how it progresses as you do chat with other stakeholders and garner interest levels/etc.


Anyways, hope the above is helpful in getting a better idea of scope/needs/etc for the project.

Cheers

Glenn 'devalias' Grant

8 - X -> Me - Appreciation + No Further Action Needed At This Stage

Glenn, this is great! Again, I deeply appreciate all your thoughts and for the ways you've gone above and beyond here. We held an initial team huddle with the other stakeholders following my opening of this conversation with you, and I discovered that REDACTED (also in this thread) already followed much of the automated steps you've outlined, and is now deep into the "pretty manual process" that lies beyond. Unfortunately, re-incorporating semantic meaning back into the code would be extremely difficult to automate 🫤 It appears that with the the commitment of REDACTED to this process, we have what we need to drive this forward for now.

Again, you've been an incredibly helpful resource and your generosity and enthusiasm is contagious. I'd be glad to compensate you for your consulting time here, as you deem appropriate.

Kind regards!

X

9 - Me -> X - Acknowledgment + Contribution Details + Interest in Advancing JavaScript Reverse Engineering Tooling / Techniques

Hey X,

Thanks for the kind words — I'm really glad my input was helpful to you and the team! It's great to hear that REDACTED have already made headway into the more manual aspects of the reverse-engineering process!

To give you a rough idea, my usual contracting rate is REDACTED, and I spent around REDACTED directly exploring/analyzing/responding to this, plus roughly another REDACTED on exploration around related limitations/potential improvements in webcrack and similar tooling. While I didn't provide any of this information expecting payment, if contributing something feels appropriate, it would certainly be appreciated — but honestly, no pressure if not.

More broadly, I'm really interested in seeing the JavaScript reverse-engineering space grow, especially around tooling/techniques that help improve the clarity/reduce the complexity of understanding bundled/minified code. I'm also always just keen to hear about ways the JS reverse-engineering tooling might be improved in general, or any interesting methods your team discovers during the process — so definitely feel free to loop me in via GitHub issues or share your experiences directly.

Also, if you run into any specific challenges or would like a second set of eyes down the road, don't hesitate to reach out. I'd love to hear how the project develops going forward!

Best of luck with everything.

Cheers,

Glenn 'devalias' Grant

10 - X -> Me - Inquiry About Donation Methods

Hey Glenn,

I won't be able to fully compensate you, but I'd like to send you a donation. What's the best way?

Thanks

X

11 - Me -> X - Donation Options + Related Fee Considerations

Hey X,

Totally understandable, and honestly I was never expecting full rates would have applied at such an early stage of a project even if it did go ahead / before discussing them / etc; but I really appreciate your desire to donate something all the same :)

Umm, that's a good question. From your REDACTED I think you're REDACTED based?

I created this discussion to get more clarity on the GitHub Sponsors + Stripe Connect fees side of things (and also reached out to support about it (you won't be able to load this link, but for my future reference)):

I think if the fees/etc side of things works out roughly equivalent for GitHub Sponsors, that would be my preferred method; REDACTED, but also because you then get a sponsor badge and other fun little perks like that; though if the fees work out to be particularly better through PayPal (not sure they do with the rather high currency conversion cost), that would also be fine. Or open to other suggestions from your side of things if there is something better you might suggest (noting that we don't really have REDACTED / some of those other payment platforms that REDACTED seems to have over here)

Thanks again!

Glenn 'devalias' Grant

See Also

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment