Recently, on the Summation Users List, we got into a discussion about the pro's and con's of searching OCR from the Case Explorer tree vs the OCRBase tab. A user said that the most effective way to search a case is from the Case Explorer tree; I disagreed. Another user asked me to explain, because when he searches from the OCRBase tab, he is only searching the currently displayed document. As explaiend here, his process isn't correct, as explained below. I'm an advocate for searching from the OCRBase tab unless you understand the trade-offs.
I thought reprinting my post to the group here might be helpful. I've edited it to remove some of the personal references to the person asking the initial question.
This is how I explain it in my training classes. My training manual has an 8-page appendix entitled “Everything you Thought You Knew about OCR but Didn’t” or something along those lines. (No, it’s not for sale.)
(The person asking the initial question complained that when he searches in the OCRBase tab, he's only searching the currently displayed document. That's not correct, and here's why.) When the OCRBase tab is active, there are options: Search, Fuzzy Search, All Fuzzy and Search All. When you hit the Enter key after you enter your search term, you are only searching the current document, which I bet is what most users do who've not been trained on this particular view. Searching just the current document is handy when you want to find exactly where in that document your search term appears, but it doesn’t retrieve the entire set of documents that respond to your search.
Instead, you want to hit the Search All button to run a search for your exact search term. If the Search All button’s not appearing on your toolbar (which can happen in versions after 2.8), try right-clicking on a gray area of the toolbar (to the right of the Help button works) and select Reset Toolbars. If that doesn’t force the Search All button to appear, you might try monkeying with the resolution on your screen (not my favorite solution since I often present with projectors that force a lower resolution and hide the darn button on this toolbar). Alternatively, go to the Search menu and select “Quick Search all OCRBase. “
But one of the reasons why I prefer searching from the OCRBase tab is the ready availability of the Fuzzy Search All (or you can Fuzzy Search just the current document). (By the way, I’m not at all a fan of how Summation renamed these toolbar buttons. It seemed much clearer in the 2.7.x version.) OCR, as you know, is by its nature inherently imperfect, no matter what claims a vendor makes to you (in my opinion). Although you can use asterisks to surround your search term when you conduct a Quick Search, you can’t use asterisks in the middle of a term to account for misinterpretation of characters (i.e., you’re searching for CAT but the OCR interpreted that A sometimes to be an O or a U or….). That’s where Fuzzy Search All comes in. You’re presented with a list of the possible terms in the OCRBase that are spelled close to the way your search term is spelled.
Here’s an example in the Franc v Morris sample case. With the OCRBase tab active, enter the term INTERROGATORY (doesn’t matter if it’s capped or not). For purposes of this illustration, don’t use any asterisks. Then hit Search All (or select Quick Search All OCRBase from the Search menu). You get back no hits, right? Now, hit the All Fuzzy tool. You get a list of possible hits. You can change the “fuzziness” percentage, going as low as 65% or as high as 99%. The lower the fuzziness percentage, the more terms you’ll see in that list. Select all of the terms, unless you've happened to lower the fuzziness to 65% and got “interceptor” as one of your choices. If so, select all the terms and then de-select “interceptor.” One document is located in that search vs the “0” documents you found without using fuzzy.
Now here’s where just being able to search the document currently displayed on your screen comes in. If you were to hit “Search”, you wouldn’t find any hits in the document because it’s searching for the original search term (“interrogatory”). But if you hit the “Fuzzy Search” tool, you’ll be able to jump from hit to hit within that one document. Why? Because nowhere in that document is the term “interrogatory” spelled correctly so you have to rely on the list of Fuzzy Search terms that was used to conduct the correct search.
You’ll also notice that when you run an OCRBase search from the OCRBase tab, by clicking on the Column tab, the retrieved records are displayed in the Column view. (In fairness, any time you run a search, even from the Case Explorer, the Column view will reflect the results of your search.) Now, if you’ve done any coding in the database, you can use Subset searching to narrow the results. Once you’ve done that, flip back to the OCRBase tab and you will see only those documents that (1) came back as a result of the original fuzzy search and then (2) were narrowed down when you used subset searching in the Column view. Cool, yes? While you can run a search in the Case Explorer and flip to the Column view to do subset searching, you can’t flip back to the Case Explorer and narrow the search results report as you can by flipping to the OCRBase tab.
So here’s why I believe the Case Explorer view searching for OCRBase can lead to problems. The person asking me to explain my position said he can search the "entire database” from the Case Explorer. I’d bet that what he's doing is selecting in the Case Explorer “Core Database”, “OCRBase” and possibly other items like transcripts, transcript notes, etc. So humor me here. Go to the Case Explorer and de-select everything but OCRBase. You have both a “Search” tool and a “Fuzzy Search” tool on the toolbar, right? Now, if you’re using versions earlier than 2.9.x, if you select Core Database as well as OCRBase, the “fuzzy search” tool grays out. That’s because the Core Database can’t be fuzzy-searched. But that also means that you’re not using fuzzy search on the OCRBase either, which I just demonstrated would be a problem. If you’re using 2.9.x, the darn Fuzzy Search tool doesn’t gray out but if you click it, nothing happens. The developers just messed up the graying out of that button. If it were working, you’d be presented with a list of alternate terms.
Now I know what’s going to happen is that you (or someone) will run a Quick Search with both the Core Database and OCRBase selected and get a single hit. What’s important to understand is that you didn’t get the hit because you were searching INTERROGATORY in the OCR. You got it because the term INTERROGATORY appears in the coding of the record. (See the DOCTYPE field for the record you retrieved.)That won’t always happen, especially if you’re relying primarily on OCR to get your search results.
The inclination from the Case Explorer view is to “search all of the database” meaning not just the OCR (the contents of the documents) but the Core Database (the coding) as well. When you do that, you lose the fuzzy search option, which I think can be dangerous when relying on searching the OCR.
Now, if you have perfect OCR with no misspelled words (either by the OCR software or by the author of the document), then you don’t need fuzzy searching. But if you have that, I’d sure like to get the name of the OCR vendor you use because I’ve never, ever seen a perfect set of OCR’d documents in all the years I’ve been doing this (more than I care to count).
Happy searching!