CS60092: Information Retrieval

From Metakgp Wiki
Jump to navigation Jump to search
CS60092
Course name Information Retrieval
Offered by Computer Science & Engineering
Credits 3
L-T-P 3-0-0
Previous Year Grade Distribution
9
7
9
14
5
3
3
EX A B C D P F
Semester Spring


Syllabus

Syllabus mentioned in ERP

Introduction to Information Retrieval: The nature of unstructured and semistructured text. Inverted index and Boolean queries.Text Indexing, Storage and Compression: Text encoding: tokenization, stemming, stop words, phrases, index optimization. Index compression: lexicon compression and postings. lists compression. Gap encoding, gamma codes, Zipfs Law. Index construction. Postings size estimation, merge sort, dynamic indexing, positional indexes, n-gram indexes, real-world issues.Retrieval Models: Boolean, vector space, TFIDF, Okapi, probabilistic, language modeling, latent semantic indexing. Vector space scoring. The cosine measure. Efficiency considerations. Document length normalization. Relevance feedback and query expansion. Rocchio.Performance Evaluation: Evaluating search engines. User happiness, precision, recall, Fmeasure. Creating test collections: kappa measure, interjudge agreement.Text Categorization and Filtering: Introduction to text classification. Naive Bayes models. Spam filtering. Vector space classification using hyperplanes; centroids; k Nearest Neighbors. Support vector machine classifiers. Kernel functions. Boosting.Text Clustering: Clustering versus classification. Partitioning methods. k-means clustering. Mixture of Gaussians model. Hierarchical agglomerative clustering. Clustering terms using documents.Advanced Topics: Summarization, Topic detection and tracking, Personalization, Question answering, Cross language information retrievalWeb Information Retrieval: Hypertext, web crawling, search engines, ranking, link analysis, PageRank, HITS, XML and Semantic web.References1.Manning, Raghavan and Schutze, Introduction to Information Retrieval, Cambridge University Press.2.Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, AddisonWesley.3.Soumen Charabarti, Mining the Web, Morgan-Kaufmann.4.Survey by Ed Greengrass available in the Internet.


Concepts taught in class

Student Opinion

How to Crack the Paper

Classroom resources

Additional Resources


Time Table

Day 8:00-8:55 am 9:00-9:55 am 10:00-10:55 am 11:00-11:55 am 12:00-12:55 pm 2:00-2:55 pm 3:00-3:55 pm 4:00-4:55 pm 5:00-5:55 pm
Monday
Tuesday
Wednesday
Thursday CSE-120 CSE-120
Friday CSE-120