Improving the CPAN experience (a GSoC summer tale)
- Instant availability (new uploads are indexed within a minute)
- Personalisation - "follow your favourites"
- Searchable metadata
- Mashup of other CPAN related services
- Unified (REST) API
- Back-end for Android/iPhone apps, command line tools etc.
- MetaCPAN.local for companies
- Includes BackPAN as well
- Open-source and free
Apply for GSoC to get this thing up and running
MetaCPAN is being developed by a group of perl coders who have jobs and all kinds of stuff on their minds. This means it is hard to get the momentum up. I got very much infected by the idea of having an API to CPAN that everyone could use and a front-end that could eventually replace search.cpan.org. So I joined the MetaCPAN group and started coding. And since I'm still a student, GSoC is a great opportunity to delve even deeper into the guts of MetaCPAN and do some serious work.
In order to finish my GSoC application I want to collect as much input as possible from the community. I compiled a list of features that I feel are nice to have and will improve the experience with CPAN. Though not all of them might be feasible or even desirable.
My application will consist of two subprojects. Improving the backend and writing a state-of-the art frontend. While search.metacpan.org is nice, it doesn't add any additional functionality to search.cpan.org. I'd like to change that and leverage the power of metacpan.
- Follow your favorite Modules / Authors
- Get instant notifications on updates
- with a diff of the Changes file
- Add discussions to modules
- Tag modules as installed, broken, author unresponsive etc.
- Add metadata to your own distribution (e.g. "Looking for maintainer", deprecated etc.)
- "CPAN of trust"
Currently search.cpan.org does a decent job on searching. However, it can be improved. For example it doesn't show previews of the search results and the relevance of the returned results is sometimes questionable.
The following resources can be used to adjust the scoring of search results:
- cpanvote
- Kwalitee
- CPAN Testers
- CPAN Ratings
Using the dependency chain, one can create a graph of modules and calculate a PageRank for each module. This will greatly enhance search results since modules with a high degree of centrality will be ranked higher.
- A full-text search that previews the relevant segments of the document
- Optionally limit search to a release / distribution
- Search for exact matches in the module name (autocompletion)
- Search for authors based on email, name and pauseid
- Exclude results with certain dependencies (e.g. modules using Moose or XS code)
- Keyboard navigation and shortcuts for super fast and mouse-less browsing
- Integrate grep.cpan.me
- Rate distributions from inside the new front-end no need to leave the page and re-login
- and many many more features
minicpan has made it easy for companies to take control over their local CPAN requirements, but they can't search either their local minicpan, or their own internal code.
MetaCPAN.local:
- Will be a distribution that can be installed in your company network
- With all the features of MetaCPAN
- Add internal company modules to the index
- Either index the company's minicpan or fall back to the live CPAN
- Every front-end developed for MetaCPAN will just work for MetaCPAN.local too
Nobody is going to use the MetaCPAN backend if there is no documentation which guides you through the basic steps of querying the metabase or setting up your own front-end.
I'm very excited to hear your ideas. Please don't think too much about implementation details. Let the developer in you rest for a moment and ask youself:
- What do I need to access CPAN more easily?
- What information do I want to access through MetaCPAN?
- What data is required to further improve tools like cpanm?
- What am I missing from search.cpan.org?
- Basically, what can MetaCPAN and its front-end do for you?
Great work! Let me add a few things to the wish list, not in any particular order:
As far as the search site goes:
The API can power web apps, command line applications and mobile apps. It would be nice to see concrete code samples for these use cases. Of course, a lot of this depends on other work being done, but I think we should keep in mind the scope of projects with MetaCPAN could support and that is quite large.
As far as Personalization goes, I think tagging is huge. Once we can tag modules, searches based on tags add all sorts of interesting possibilities. Authors should also be able to apply special tags to their modules:
"deprecated", "looking for co-maintainer", "looking for a new maintainer", "unmaintained" etc. This would let us know where the author currently stands and would also help in finding new owners for unloved modules.
There is also the possibility that you could tag authors in some way. For example, in the unfortunate case where an author is deceased, that's crucial information to have and also a tag with the current author is obviously unable to apply. You may also want to tag an author as unresponsive. There might be some disagreement on that or how it might work, but personally I'd like to have some idea of whether a patch might be applied before I spend time creating it.