CONTENTS
Preface = xi
1 Introduction to Information Retrieval Systems = 1
1.1 Definition of Information Retrieval System = 2
1.2 Objectives of Information Retrieval Systems = 4
1.3 Functional Overview = 10
1.3.1 Item Normalization = 10
1.3.2 Selective Dissemination of Information = 16
1.3.3 Document Database Search = 18
1.3.4 Index Database Search = 18
1.3.5 Multimedia Database Search = 20
1.4 Relationship to Database Management Systems = 20
1.5 Digital Libraries and Data Warehouses = 21
1.6 Summary = 24
2 Information Retrieval System Capabilities = 27
2.1 Search Capabilities = 28
2.1.1 Boolean Logic = 29
2.1.2 Proximity = 30
2.1.3 Contiguous Word Phrases = 31
2.1.4 Fuzzy Searches = 32
2.1.5 Term Masking = 32
2.1.6 Numeric and Date Ranges = 33
2.1.7 Concept and Thesaurus Expansions = 34
2.1.8 Natural Language Queries = 36
2.1.9 Multimedia Queries = 37
2.2 Browse Capabilities = 38
2.2.1 Ranking = 38
2.2.2 Zoning = 40
2.2.3 Highlighting = 40
2.3 Miscellaneous Capabilities = 41
2.3.1 Vocabulary Browse = 41
2.3.2 Iterative Search and Search History Log = 42
2.3.3 Canned Query = 43
2.3.4 Multimedia = 43
2.4 Z39.50 and WAIS Standards = 44
2.5 Summary = 47
3. Cataloging and Indexing = 51
3.1 History and Objectives of Indexing = 52
3.1.1 History = 52
3.1.2 Objectives = 54
3.2 Indexing Proccss = 56
3.2.1 Scope of Indexing = 57
3.2.2 Precoordination and Linkages = 58
3.3 Automatic Indexing = 58
3.3.1 Indexing by Term = 61
3.3.2 Indexing by Concept = 63
3.3.3 Multimedia Indexing = 64
3.4 Information Extraction = 65
3.5 Summary = 68
4. Data Structure = 71
4.1 Introduction to Data Structure = 72
4.2 Stemming Algorithms = 73
4 2.1 Introduction to the Stemming Process = 74
4.2.2 Porter Stemming Algorithm = 75
4.2.3 Dictionary Look-up Stemmers = 77
4.2.4 Successor Stemmers = 78
4.2.5 Conclusions = 80
4.3 Inverted File Structure = 82
4.4 N-Gram Data Structures = 85
4.4.1 History = 86
4.4.2 N-Gram Data Structure = 87
4.5 PAT Data Structure = 88
4.6 Signature File Structure = 93
4.7 Hypertext and XML Data Structures = 94
4.7.1 Definition of Hypertext Structure = 95
4.7.2 Hypertext History = 97
4.7.3 XML = 98
4.8 Hidden Markov Models = 99
4.9 Summary = 102
5. Automatic Indexing = 105
5.1 Classes of Automatic Indexing = 105
5.2 Statistical Indexing = 108
5.2.1 Probabilistic Weighting = 108
5.2.2 Vector Weighting = 111
5.2.2.1 Simple Term Frequency Algorithm = 113
5.2.2.2 Inverse Document Frequency = 116
5.2.2.3 Signal Weighting = 117
5.2.2.4 Discrimination Value = 119
5.2.2.5 Problems With Weighting Schemes = 120
5.2.2.6 Problems With the Vector Model = 121
5.2.3 Bayesian Model = 122
5.3 Natural Language = 123
5.3.1 Index Phrase Generation = 125
5.3.2 Natural Language Processing = 128
5.4 Concept Indexing = 130
5.5 Hypertext Linkages = 132
5.6 Summary = 135
6. Document and Term Clustering = 139
6.1 Introduction to Clustering = 140
6.2 Thesaurus Generation = 143
6.2.1 Manual Clustering = 144
6.2.2 Automatic Term Clustering = 145
6.2.2.1 Complete Term Method = 146
6.2.2.2 Clustering Using Existing Clusters = 151
6.2.2.3 One Pass,Assignments = 153
6.3 Item Clustering = 154
6.4 Hierarchy of Clusters = 156
6.5 Summary = 160
7. User Search Techniques = 165
7.1 Search Statements and Binding = 166
7.2 Similarity Measures and Ranking = 167
7.2.1 Similarity Measures = 168
7.2.2 Hidden Markov Model Techniques = 173
7.2.3 Ranking Algorithms = 174
7.3 Relevance Feedback = 175
7.4 Selective Dissemination of Information Search = 179
7.5 Weighted Searches of Boolean Systems = 186
7.6 Searching the INTERNET and Hypertext = 191
7.7 Summary = 194
8. Information Visualization = 199
8.1 Introduction to Information Visualization = 200
8.2 Cognition and Perception = 203
8.2.l Background = 203
8.2.2 Aspects of Visualization Process = 204
8.3 Information Visualization Technologies = 208
8.4 Summary = 218
9. Text Search Algorithms = 221
9.1 Introduction to Text Search Techniques = 221
9.2 Software Text Search Algorithms = 225
9.3 Hardware Text Search Systems = 233
9.4 Summary = 238
10. Multimedia Information Retrieval = 241
10.1 Spoken Language Audio Retrieval = 242
10.2 Non-Speech Audio Retrieval = 244
10.3 Graph Retrieval = 245
10.4 Imagery Retrieval = 246
10.5 Video Retrieval = 249
10.6 Summary = 255
11. Information System Evaluation = 257
11.1 Introduction to Information System Evaluation = 257
11.2 Measures Used in System Evaluations = 260
11.3 Measurement Example - TREC Results = 267
11.4 Summary = 278
References = 281
Subject Index = 313