So much was learned at the search sprint. First, I'd like to take the time to thank Chad for hosting us. I'd also like to thank everyone that came and participated. Sometimes it was slow going, but there were definitely large and important conclusions found. It was a great experience; not only did it focus on Drupal's search implementation/frame-work, but it also covered a wider aspect of information/problems such as unit testing and what a framework should cover just to name a couple. First, lets go over the reason we were there in the first place.
High-level Overview of Findings
Let us start with a camp-fire talk of what we (search sprinters) found and talked about during our sprint. The current search implementation in Drupal is good. However, we also know that "search" is not what Drupal is made for; it's one aspect of the picture. Relevance, search, matching, and all its sub-genres make up a whole field of study. PHD theses are built on this subject matter. That said, we also all agreed that search can probably be handled better by 3rd party implementations (using a dedicated search appliance, knowledge network, Lucene, etc.). The current framework allows for 3rd party search integration, but only as a one-sided search. For example, say I want to search a 3rd party knowledge base on Company A's intranet when I go to the search page. This is totally possible in the current framework, however that search is totally isolated. It has its own tabs, and is basically a single information silo.
Following that paradigm, one can think of each search implementation as its own information silo. In order to get the whole picture of what one is searching for, one has to go to each tab, and search for the same item.
As we mentioned before in another post, one can not override the default content (node) and user searches. These are tightly bound to the "node" and "user" modules, both intrinsic to Drupal. In order to override these searches, one has to create a new hook_search in a module.
Now, the immediate question that comes to mind is "why would I want to separate the node and user searches from the node and user modules?". Indexing. If you remember something I mentioned a bit ago in this post about 3rd party appliances/engines/implementations of search perhaps having a better or different implementation of search, then you will realize that the current framework does not allow us to leverage these implementations.
For example; what if your site's nodes were tons of code snippets, and there is a special search engine/indexer that indexed programming code in a nice, fast, efficient way. It was able to return complex relevance scores, etc. If we were able to use our own "codeindexer" to index nodes, and then have this search show up under a "Content" tab, or what not, this would be fantastic.
We found a few recommendations during the sprint. For this post, I am going to talk about the indexing recommendation as that is what I hit the most. One should read the posts/blogs of the other participants for additional information and/or details on other recommendations, etc.
For this post, I am going to talk about the aspect of the "larger pie" that I worked on. The search sprint came to a grand vision (obviously this vision will likely go through a transformation as it's refined); however, to reach that place we need to fix some pieces to complete the puzzle. Most of the patches we made are a small part of the larger picture.
Regarding my above example of creating and splitting up search indexing, we started on a path to do just that. We first started by splitting the search up to 2 modules: nodesearch and usersearch. The main problem with this implementation off the bat is that drupal uses the module name as the call back. Thus, once this is split, the search path now looks like: search/nodeserach or search/usersearch. So, as a result I need to deal with the "tabs" and "path" issue.
The item that I tasked myself with was the merging of the different search tabs. I knew I needed to do 2 things:
This led to a few problems:
Next, I merged the tabs ( 252211 ). This provides the following:
As a result, there is now a back-end interface to have a search implementation's results show up under another tab, and that tab will now show unified results.
Say you wanted to have the user's search implementation show up under the 'Content' tab (node's search implementation). So, when I search for 'admin', all nodes relating to admin will be search, AND the search result showing users that were found from the user search relating to admin will also be shown. All under the same tab.
However, then comes the problem of normalizing the results. The current implementation is:
A simple rudimentary analysis of this work-flow is something like:
Let L = Number of rows actually processed ( number of times we loop )
Let s = Number of searches that should be under that tab
Let r = Number of results that will/should be displayed on a page
L = (S * R) * 2 * R
On the premise that:
As you can see, this is not very efficient, because lemme tell ya, the graph of that function is pretty nasty :). One nice thing to implement would be a way to get ALL search results in 1 query. However, that is for another discussion in which we talked about passing around a SearchQueryObject/Structure, and then create the "search implementation" from that (e.g. perhaps it would create SQL to search drupal's search index, or perhaps XML to search a different type of index).
My current path on the patch is to:
That said, any comments/thoughts would def. be appreciated.
| Issue number | Who | Summary | Status |
|---|---|---|---|
| 256792 | Doug | refactor search form | needs review |
| 22627 | David/Doug | pager count | needs review |
| 256678 | Doug/Ernest | search type help | almost RTBC |
| 145242 | Doug | refactor node rank | RTBC |
| 252211 | Earnest | merged tabs | needs work |
| 70722 | David/Djun | search exposes private data in search query | working on simpletest |
| *** | David/Robert/all | Search Parsed Query | BLUE SKY |
| 54622 | David | db_rewrite patch | Open |
| 257033 | Blake/Chad | test coverage for search simplify | needs work |
| 257007 | Robert | inputs for search simplify | Open |
Patch details and notes below:
earnest's blog
Comments
Post new comment