How to search for code now that Google Code Search is dead?
On 2011-10-14, in a blog post titled “A fall sweep” on the official Google blog, Bradley Horowitz announced Google Code Search to be shut down on 2012-01-15. Apologies, some explanation and links to alternatives were provided in the thread Code Search Shutdown on Google Groups.
I decided to review alternatives in a real-world situation. Facing with the task of creating a fairly complex custom Django template tag, I first wanted to see some examples of tags involving handling of dates and times.
To find such source code, I came up with these keywords which must exist in modules I’m interested in:
django import datetime template Context Library Node TemplateSyntaxError render parser token
Here’s a run-down of all the code search engines and related tools I could find.
Antepedia is an interesting “search engine”: you can upload a source file now and discover its license, version, and original open source project. Not useful for me right now, but deserves to be mentioned.
Codase only searches for C, C++ and Java code. The service was also “temporarily unavailable” at the time I tried to test it.
Codefetch searches for code examples in programming books and makes it easy to buy books in the search results from Amazon. No books were found when search for just “django”, but you can search for Python code, and there are multiple Python books in the index.
GrepCode is only for Java, so not of interest to me right now. I still wanted to mention it here to not be too language-specific.
Black Duck Koders
Koders searches for code in a database with 3.3 billion lines of open source code. In addition to the search terms, you can filter by programming language and/or license. Koders can syntax-highlight over 30 languages.
Koders found 54 results from a couple dozen projects for my search with a filter for Python code only, and the search took about 5 seconds. For each result, you can click your way into a project info page, narrow the search for that project only, and browse the file with the match.
A major WTF is that the search form isn’t populated with the current search, which makes narrowing down the search kind of difficult. And pasting into the search field doesn’t clear the field first – you need to go and manually delete the initial “Search” label.
It’s a bit clumsy to get to the homepage of the matching project, since you need to go through its info page on Koders. Similarly, navigating the source tree above or below the currently viewed file has to be done via the info page. I would also have expected a direct link to the matching source file on the project’s own site.
All the search results looked like good matches for what I searched for.
Here’s a list of projects with matches:
FFsomething Google App Engine SDK Group Appointment Calendar Talk.org aguzzim cherokee cherokee-admin cherokee-pyscgi conspire django django-apps djangokit flother gnome-web-properties googleappengine handler_webdav jellyroll myblogongae pycherokeeconf sneeu cherokee_tests
The search engines used were BerliOS, Cherokee, Django, Google, Quamquam and Spider_NNN (NNN being three digits – I assume this is their own crawler).
Krugle searches againts code, source control comments and documentation in open source projects. With the simple search you can first enter your search terms, and after getting some initial results, it’s easy to narrow them down by language, date, projects and authors. There’s also an advanced search form in which you can enter these criteria right away.
My search claimed 785083 hits without the Python language filter, and 54702 results with it. Further filtering to only the last year resulted with just 90 search results.
The 90 search results were all from just three different projects: jython, resteasy and eclemma. None of the results were from actual Django projects. I also tried removing the date filter and skimming through the few first pages of the 54702 search results, but found no relevant hits there either.
So in my case Krugle was an utter failure. Sorry.
search[code] is a dead simple code search engine which indexes API documentation, open source code repositories and the StackOverflow database as a last resort.
As a Django/Python programmer, my curiosity was sparked by the “import django” example on the front page. Also, the author promises to release parts of his project as Free Software and donate part of his profits to Free Software.
search[code] allows regular expressions as search queries, which allows for very precise and rich searches. Both exact and close matches are presented in the results.
For my example query, search[code] found 94 results, but all of them were from documentation. After narrowing the results to only Python source code using the ext:py search term, no search results at all were found.
This search engine was interesting enough to spend the effort to get at least some search results to see how it works. By removing the term TemplateSyntaxError from my query I started to get results. There were 53 hits, the majority of which were in different versions of Django. Some hits were also found in django-nonrel, mango-py and soc.
You can click through to cached copies of source files from the search results, and there is a link to the root of the originating repository. A shell command for cloning the repository is also given.
Projects from at least GitHub, BitBucket and Google Code have been indexed in search[code].
SymbolHound looks like a generic search engine which is nice in that it doesn’t ignore special characters.
For my search, SymbolHound only found two unique hits from the documentation for different versions of Django. It hadn’t indexed non-ASCII characters correctly, so the search results were a bit ugly.
SymbolHound Code Search
SymbolHound Code Search is a beta seach service for open source code repositories. They specifically mention GitHub and SourceForge, and I didn’t see results from other services.
The code snippet displayed for each result was superior compared to any other code search I’ve seen. Multiple excerpts are picked from the file in order to cover most of the search terms. A direct link to the project page on GitHub is shown.
For my search, SymbolHound Code Search found 29 results from GitHub, from 21 different projects:
django-agenda django geraldo jellyroll django-pressbox lettuce hyde applause django-activity-stream packaginator django-articles micolog django-hitcount openblock everyblock_code osqa django-tagcon hue django-grappelli django-schedule django-blog-zinnia
This service seems to do substring matches, which can lead to irrelevant results especially if the tokens in your search query are short. However, every search result was definitely relevant in this case. The number of projects seems strangely small – I’m pretty sure more than 29 Django projects on GitHub do contain files with my search terms.
Navigating the source tree is very convenient using the “View Full Source Tree” feature. As with Koders, I was missing a direct link to the matching source files on the projects’ own sites.
Of the search engines tested, Koders was a clear winner in terms of the number of services searched and the number of good search results. Unfortunately the user interface wasn’t satisfying.
SymbolHound Code Search, on the other hand, offers a great user experience, but the search index is too narrow. GitHub is admittedly the number one place to go for Django-related code nowadays, but I would have expected to see results from Google Code, BitBucket and SourceForge as well.
search[code] looks like an attractive and ambitious project which already works fairly well. Either it has a less complete index or stricter search algorithm than Koders and SymbolHound, since results from only four projects were found for my example query.
In this particular use case, sites like djangosnippets.org and gist.github.com would also have been useful to have in the search index.
For now, none of the engines came close to what Google Code Search used to offer. I’ll be keeping an eye on these three search engines to see how their search indexes and features develop in the future.
Update: Added a review of search[code].
Update: Reformatted blockquotes.