Search Sprint Conclusion
on
May 11, 2008
Search Sprint Conclusion
The Minnesota Search Sprint is unofficially over. Earnest, Djun, and Blake have already left. Chad is local, but I've said goodbye to him too. Robert and I leave tomorrow. And David is vacationing a few extra days in sunny Minnesota (because it's warmer than Montreal???).
For the visual reader, here's our flickr feed.
We'll make this database available to anyone to download and test with. But as I write, it still has 32k nodes left to index, but I'll post the link as soon as it's available.
Download the Drupal 6.x mysql database with 100k nodes fully search indexed (231MB). This would take you a day to create, so downloading it is helpful. You can use this to test search, but because it's just a big database with a lot of nodes, you can use this for other performance tests, such as with views.
Progress Report
Issues
- #145252 refactor node rank. I introduced this a while ago, but we rerolled it this weekend and wrote a test case, giving it enough attention to RTBC. This patch has already spawned another patch #257216, that will expose link relevancy as an additional scoring factor.
- #257279 search performance improvement, remove extra join. This was a hidden gem that Robert found today that has been around for a long time. This patch can and should be backported to 5.x and 6.x. This is a big win with little cost, and is also RTBC
- #256678 - Display search help based on type. This is a pretty simple patch that displays different help text when searching for nodes than it displays for users. It could use a few positive comments to move it along.
- #22627 - Show result count and ranges. This patch adds a position count and total number of results to the search results page, that can be themed. This pretty much works, but needs to be retested, and needs a few more positive comments about the concept.
- #256792 - refactor advanced search form and keywords. I blogged about this patch before and all it's missing is the test case, which Chad has made progress on, but not uploaded yet.
- #257196 - ignore javascript during indexing. This came to my attention via Arthur this morning. I've got a pretty simple solution to this, that I think is a good first step, but it definitely needs a few other HTML experts to review and comment on.
- #177722 - devel batch patch for creating lots of nodes. I wrote this during the Barcelona DrupalCon for the original 6.x search patch, so that we could test search on big datasets. I think that this is pretty much working now, and I used it last night to create 100,000 nodes.
- #257033 - test coverage for search simplify. This patch adds a needed test case.
- #70722 - search results expose private information. This patch fixes a problem and only needs a test case that I think Djun is working on.
- #257244 - improper normalization of comment and statistics node ranks. It appears that the reason the comment and statistics node ranks don't work quite as expected is that the normalization of their scores may be off. This patch definitely needs review. When you do so, please read Robert's article on search results.
The Future of Search
In addition to this concrete progress, we brainstormed about the future of search which is already noted in my Day One blog. We assigned each of the ideas to one of the Sprinters for upcoming articles. So stay tuned!Performance
We knew that the Sprint would be considered less than a success if we didn't find some performance improvements. (Although, I'm a little defensive about this since I have already spent so much time on search performance and since most people will only recognize these gains once d.o is on 6.x.) So on the final day, we spent some time on performance. Robert found one immediate gain that is noted above. We also worked on creating a platform for future testing and benchmarking. As mentioned above, the devel batch patch and the reindex module have made it possible to create a moderate size (100k node) database for testing.Credits
A big shoutout to the entire sprinting team.- Doug Green (me) of CivicActions
- Djun Kim of RainCity Studios
- David Lesieur of Whisky Echo Bravo
- Blake Luchessi of Google SoC fame (still in school)
- Robert Douglas of Acquia
- Earnest Berry of Work Habit
- Chad Fennel from the University of Minnesota
- And shoutouts to "Charlie", Derek and Dmitri who contributed virtually through g.d.o and the chat room
- The empty chair belongs to Steven, the original author of search, who was missed, but was also there with us by virtue of his code













delete from search_indexmishap from yesterday lead to three more performance issues found on the plane ride home, relating to indexing performance: