Skip to content

Instantly share code, notes, and snippets.

@monken
Created March 14, 2011 19:32
Show Gist options
  • Select an option

  • Save monken/869715 to your computer and use it in GitHub Desktop.

Select an option

Save monken/869715 to your computer and use it in GitHub Desktop.

Improving the CPAN experience (a GSoC summer tale)

What will MetaCPAN offer that other services don't?

  • Instant availability (new uploads are indexed within a minute)
  • Personalisation - "follow your favourites"
  • Searchable metadata
  • Mashup of other CPAN related services
  • Unified (REST) API
  • Back-end for Android/iPhone apps, command line tools etc.
  • MetaCPAN.local for companies
  • Includes BackPAN as well
  • Open-source and free

Now what?

Apply for GSoC to get this thing up and running

MetaCPAN is being developed by a group of perl coders who have jobs and all kinds of stuff on their minds. This means it is hard to get the momentum up. I got very much infected by the idea of having an API to CPAN that everyone could use and a front-end that could eventually replace search.cpan.org. So I joined the MetaCPAN group and started coding. And since I'm still a student, GSoC is a great opportunity to delve even deeper into the guts of MetaCPAN and do some serious work.

Community feedback to complete proposal

In order to finish my GSoC application I want to collect as much input as possible from the community. I compiled a list of features that I feel are nice to have and will improve the experience with CPAN. Though not all of them might be feasible or even desirable.

My application will consist of two subprojects. Improving the backend and writing a state-of-the art frontend. While search.metacpan.org is nice, it doesn't add any additional functionality to search.cpan.org. I'd like to change that and leverage the power of metacpan.

Proposed Features

Personalization

  • Follow your favorite Modules / Authors
  • Get instant notifications on updates
  • with a diff of the Changes file
  • Add discussions to modules
  • Tag modules as installed, broken, author unresponsive etc.
  • Add metadata to your own distribution (e.g. "Looking for maintainer", deprecated etc.)
  • "CPAN of trust"

Improved search results

Currently search.cpan.org does a decent job on searching. However, it can be improved. For example it doesn't show previews of the search results and the relevance of the returned results is sometimes questionable.

Evaluate third-party data

The following resources can be used to adjust the scoring of search results:

  • cpanvote
  • Kwalitee
  • CPAN Testers
  • CPAN Ratings

PageRank-like scoring

Using the dependency chain, one can create a graph of modules and calculate a PageRank for each module. This will greatly enhance search results since modules with a high degree of centrality will be ranked higher.

Front-end

  • A full-text search that previews the relevant segments of the document
  • Optionally limit search to a release / distribution
  • Search for exact matches in the module name (autocompletion)
  • Search for authors based on email, name and pauseid
  • Exclude results with certain dependencies (e.g. modules using Moose or XS code)
  • Keyboard navigation and shortcuts for super fast and mouse-less browsing
  • Integrate grep.cpan.me
  • Rate distributions from inside the new front-end no need to leave the page and re-login
  • and many many more features

MetaCPAN for companies

minicpan has made it easy for companies to take control over their local CPAN requirements, but they can't search either their local minicpan, or their own internal code.

MetaCPAN.local:

  • Will be a distribution that can be installed in your company network
  • With all the features of MetaCPAN
  • Add internal company modules to the index
  • Either index the company's minicpan or fall back to the live CPAN
  • Every front-end developed for MetaCPAN will just work for MetaCPAN.local too

Documentation

Nobody is going to use the MetaCPAN backend if there is no documentation which guides you through the basic steps of querying the metabase or setting up your own front-end.

Your Turn

I'm very excited to hear your ideas. Please don't think too much about implementation details. Let the developer in you rest for a moment and ask youself:

  • What do I need to access CPAN more easily?
  • What information do I want to access through MetaCPAN?
  • What data is required to further improve tools like cpanm?
  • What am I missing from search.cpan.org?
  • Basically, what can MetaCPAN and its front-end do for you?
@oalders

oalders commented Mar 14, 2011

Copy link
Copy Markdown

Great work! Let me add a few things to the wish list, not in any particular order:

As far as the search site goes:

  • I'd like to be able to exclude deprecated modules from searches by default
  • Advanced searches could allow me to refine searches by what is in the META.yml file. For example, I'd like to list modules which currently reside on Github etc
  • We can also make the author searches more interesting. For example, we should be able to list authors by country, which would be much more maintainable than the Acme::CPANAuthors namespace
  • search.cpan.org does not show github issues when taking bugs into account (only RT). this can easily by done with the Github API
  • if this github issue counts are cached in ES, we can even allow you to refine queries based on the acceptable # of open bug tickets
  • Being able to sort modules based on Github watchers or forks would be great. That's perhaps better suited to a 3rd party app, but if we cache that info in ES, it would make for some very powerful searches

The API can power web apps, command line applications and mobile apps. It would be nice to see concrete code samples for these use cases. Of course, a lot of this depends on other work being done, but I think we should keep in mind the scope of projects with MetaCPAN could support and that is quite large.

As far as Personalization goes, I think tagging is huge. Once we can tag modules, searches based on tags add all sorts of interesting possibilities. Authors should also be able to apply special tags to their modules:

"deprecated", "looking for co-maintainer", "looking for a new maintainer", "unmaintained" etc. This would let us know where the author currently stands and would also help in finding new owners for unloved modules.

There is also the possibility that you could tag authors in some way. For example, in the unfortunate case where an author is deceased, that's crucial information to have and also a tag with the current author is obviously unable to apply. You may also want to tag an author as unresponsive. There might be some disagreement on that or how it might work, but personally I'd like to have some idea of whether a patch might be applied before I spend time creating it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment