Undef'd statement

Notes of a coder

How to search for code now that Google Code Search is dead?

On 2011-10-14, in a blog post titled “A fall sweep” on the official Google blog, Bradley Horowitz announced Google Code Search to be shut down on 2012-01-15. Apologies, some explanation and links to alternatives were provided in the thread Code Search Shutdown on Google Groups.

I decided to review alternatives in a real-world situation. Facing with the task of creating a fairly complex custom Django template tag, I first wanted to see some examples of tags involving handling of dates and times.

To find such source code, I came up with these keywords which must exist in modules I’m interested in:

django import datetime template Context Library Node TemplateSyntaxError render parser token

Here’s a run-down of all the code search engines and related tools I could find.

Antepedia

Antepedia is an interesting “search engine”: you can upload a source file now and discover its license, version, and original open source project. Not useful for me right now, but deserves to be mentioned.

Codase

Codase only searches for C, C++ and Java code. The service was also “temporarily unavailable” at the time I tried to test it.

Codefetch

Codefetch searches for code examples in programming books and makes it easy to buy books in the search results from Amazon. No books were found when search for just “django”, but you can search for Python code, and there are multiple Python books in the index.

GrepCode

GrepCode is only for Java, so not of interest to me right now. I still wanted to mention it here to not be too language-specific.

Black Duck Koders

Koders searches for code in a database with 3.3 billion lines of open source code. In addition to the search terms, you can filter by programming language and/or license. Koders can syntax-highlight over 30 languages.

../../../_images/koders-results.png

Search results in Koders

Koders found 54 results from a couple dozen projects for my search with a filter for Python code only, and the search took about 5 seconds. For each result, you can click your way into a project info page, narrow the search for that project only, and browse the file with the match.

A major WTF is that the search form isn’t populated with the current search, which makes narrowing down the search kind of difficult. And pasting into the search field doesn’t clear the field first – you need to go and manually delete the initial “Search” label.

It’s a bit clumsy to get to the homepage of the matching project, since you need to go through its info page on Koders. Similarly, navigating the source tree above or below the currently viewed file has to be done via the info page. I would also have expected a direct link to the matching source file on the project’s own site.

All the search results looked like good matches for what I searched for.

Here’s a list of projects with matches:

FFsomething Google App Engine SDK Group Appointment Calendar Talk.org
aguzzim cherokee cherokee-admin cherokee-pyscgi conspire django django-apps
djangokit flother gnome-web-properties googleappengine handler_webdav
jellyroll myblogongae pycherokeeconf sneeu cherokee_tests

The search engines used were BerliOS, Cherokee, Django, Google, Quamquam and Spider_NNN (NNN being three digits – I assume this is their own crawler).

Krugle

Krugle searches againts code, source control comments and documentation in open source projects. With the simple search you can first enter your search terms, and after getting some initial results, it’s easy to narrow them down by language, date, projects and authors. There’s also an advanced search form in which you can enter these criteria right away.

My search claimed 785083 hits without the Python language filter, and 54702 results with it. Further filtering to only the last year resulted with just 90 search results.

The 90 search results were all from just three different projects: jython, resteasy and eclemma. None of the results were from actual Django projects. I also tried removing the date filter and skimming through the few first pages of the 54702 search results, but found no relevant hits there either.

So in my case Krugle was an utter failure. Sorry.

search[code]

search[code] is a dead simple code search engine which indexes API documentation, open source code repositories and the StackOverflow database as a last resort.

As a Django/Python programmer, my curiosity was sparked by the “import django” example on the front page. Also, the author promises to release parts of his project as Free Software and donate part of his profits to Free Software.

search[code] allows regular expressions as search queries, which allows for very precise and rich searches. Both exact and close matches are presented in the results.

../../../_images/searchcode-search-results.png

Search results in search[code]

For my example query, search[code] found 94 results, but all of them were from documentation. After narrowing the results to only Python source code using the ext:py search term, no search results at all were found.

This search engine was interesting enough to spend the effort to get at least some search results to see how it works. By removing the term TemplateSyntaxError from my query I started to get results. There were 53 hits, the majority of which were in different versions of Django. Some hits were also found in django-nonrel, mango-py and soc.

../../../_images/searchcode-source-view.png

Source view in search[code]

You can click through to cached copies of source files from the search results, and there is a link to the root of the originating repository. A shell command for cloning the repository is also given.

Projects from at least GitHub, BitBucket and Google Code have been indexed in search[code].

SymbolHound

SymbolHound looks like a generic search engine which is nice in that it doesn’t ignore special characters.

For my search, SymbolHound only found two unique hits from the documentation for different versions of Django. It hadn’t indexed non-ASCII characters correctly, so the search results were a bit ugly.

Conclusions

Of the search engines tested, Koders was a clear winner in terms of the number of services searched and the number of good search results. Unfortunately the user interface wasn’t satisfying.

SymbolHound Code Search, on the other hand, offers a great user experience, but the search index is too narrow. GitHub is admittedly the number one place to go for Django-related code nowadays, but I would have expected to see results from Google Code, BitBucket and SourceForge as well.

search[code] looks like an attractive and ambitious project which already works fairly well. Either it has a less complete index or stricter search algorithm than Koders and SymbolHound, since results from only four projects were found for my example query.

In this particular use case, sites like djangosnippets.org and gist.github.com would also have been useful to have in the search index.

For now, none of the engines came close to what Google Code Search used to offer. I’ll be keeping an eye on these three search engines to see how their search indexes and features develop in the future.

Update: Added a review of search[code].

Update: Reformatted blockquotes.