# Information retrieval

Take-home exam

Course: Information retrieval 1, 7,5 hp (C3LIR1)

Publication date: 2022-10-18

Please note that the exam consists of 13 questions (with subquestions). The maximum number of points is 45. To obtain one of the grades A–E, you need to get at least the following number of points: grade E ≥ 31 points, grade D ≥ 34 points, grade C ≥ 37 points, grade B ≥ 40 points, and grade A ≥ 43 points. This exam is to be performed individually and should be submitted in Canvas at the latest by 2022-10-30.

1. Many queries used on the web consist of a few search terms (maybe only one term) and lack operators (e.g., Boolean operators). This makes it in many ways problematic for a search en- gine to retrieve relevant information for the user. Please characterize, using different concepts from the IR theory, two of the problems that the search engine is faced with when dealing with such short, operator-less queries, and indicate for each problem a method or approach that can be used to handle that problem in the design and construction of the search engine. [4 p]
2. Assume that we, in an IR system based on the Boolean model, formulate a query q with the following structure:

q = renewable AND (wind OR (NOT oil))

• With regard to the presence / absence of the three query terms – which documents will be retrieved as (system) relevant? [2 p]
• Which one of the three terms should the system first examine (with regard to the possible presence of the term in the document) to ascertain as quickly as possible whether a given document should be retrieved or not? Why this term in particular? [2 p]
• Term weighting. Let k1 and k2 be two terms (if you want to, you can replace these variables with two concrete terms in the answer), D a document collection, and d a document in D. Assume that the term k1 occurs 4 times in the document d and that the term k2 also occurs 4 times in d. Finally, assume that the term k1 occurs in 10 documents in D and that the term k2 term is found in 15 documents in D. Use reasoning * to find out whether the term weight of k1 in d will be greater than, less than or equal to the term weight of k2 in d, given that the following IR models and/or term weighting schemes are used:
• the vector space model with tf-idf weighting. [1 p]
• the classical probabilistic model (binary independence model). [1 p]

*You should not have to use a calculator to obtain the answer.

• A commonly used similarity measure in the vector space model (VSM) is the cosine measure. Please explain the general reasoning behind the design and use of this measure. You do not need to use any mathematical formulas in your answer – it is sufficient to explain with words only. [2 p]
• A model of how users go about when searching for information is called the berrypicking model (formulated by Marcia Bates). Briefly explain this model and how it differs from the traditional understanding of the search process. [2 p]
• One of the basic components of every IR model is a ranking function sim that in the context of a query q assigns to each document a score that indicates the ”similarity” between the document and the query, according to their representations in the specific IR model. We say that two documents are separable in a given IR model if it is possible to formulate at least one query that results in different values of the ranking function sim for the two documents. Consider the (short) documents:

d1 = to go or not to go

and

d2 = to go or not

Are these documents separable in

• the Boolean model, [2 p]
• the vector space model with tf-idf weighting and the cosine similarity measure? [2 p] Please substantiate your answers.
• Lexical analysis and term weighting.
• What is meant by lexical analysis? [1 p]
• Present a disadvantage of using tf weighting instead of tf-idf weighting for document representation? [1 p]
• What is meant by document length normalization and why may it be important to perform when generating document representations? [2 p]
• Text processing.
• Briefly present a potential advantage of using stemming in an IR system. [1 p]
• Also indicate a potential disadvantage of using stemming in an IR system. [1 p]
• What is the Levenshtein distance between the terms string and song? [1 p]
• Relevance feedback.
• What is meant by relevance feedback and how does this work in principle? [2 p]
• Explain how relevance feedback works, by exemplifying with the Rocchio method or the classical probabilistic model. Please note that it is sufficient that you explain only one of these methods, and that you do not need to use mathematical formalism in your answer (though it may facilitate your presentation). [2 p]
• Compare relevance feedback with query expansion. What similarities and differences regarding purpose and approach can you find? [2 p]
• What is the difference between local and global analysis in the context of implicit feed- back? [2 p]
• We perform a search in a reference collection on a topic with 12 known relevant documents. The current IR system is based on the vector space model. The returned documents are rel- evance assessed up until DCV = 20 (where DCV, i.e. document cutoff value is equal to the position up until which the evaluation measurements are calculated), whereby the following list is obtained. We let R represent a relevant document and 0 a non-relevant document.

R R 0 R 0 0 R R 0 0 R 0 0 0 0 R 0 0 0 0

For the search result above and DCV = 20, please calculate

• recall [1 p]
• precision [1 p]
• R-precision [1 p]
• the F-measure (F1) [1 p]
• Evaluation.
• Why is recall an inappropriate measure when evaluating web searches? [1 p]
• A common observation when evaluating search results is that the recall tends to be higher for larger DCV values (higher document positions), while the precision tends to decrease for larger DCV values. Present an explanation of this phenomenon based on the defini- tions of recall and precision. [2 p]
• Compare the algorithm HITS and PageRank for similarities and differences in how they rank web pages. [2 p]
• For this exam question we use the simple definition of PageRank, which is formulated as fol- lows. Let w and v be web pages. By the notation ∀v : v ↣  w we denote ”for all pages v such that v links to page w”. We also let r(w) denote the PageRank value for page w and let g(v) denote the number of links from page v. PageRank is then defined:

r(w) =

v:v w

r(v)

g(v)

Consider the link structure in the figure below. The nodes (circles) represent web pages and the arrows represent hypertext links. Assume that the value of r(A), that is, the PageRank value for page A, is known and happens to be independent of the structure in the figure. Furthermore, assume that the PageRank values can be fully calculated based on the structure in the figure. Which of the following statements is correct and how did you come up with the answer?

• r(B) > r(C) [r(B) is greater than r(C)].
• r(B) < r(C) [r(B) is less than r(C)].
• r(B) = r(C) [r(B) equals r(C)].

The answer is not dependent on the exact value of r(A), but you can assume, for example, that

r(A) = 18.0. [3 p]

Get expert help for Information retrieval and many more. 24X7 help, plag-free solution. Order online now!

"Information retrieval." Universal Assignment - June 14, 2024, https://universalassignment.com/information-retrieval/
Universal Assignment December 5, 2022 Information retrieval., viewed June 14, 2024,<https://universalassignment.com/information-retrieval/>
Universal Assignment - Information retrieval. [Internet]. [Accessed June 14, 2024]. Available from: https://universalassignment.com/information-retrieval/
"Information retrieval." Universal Assignment - Accessed June 14, 2024. https://universalassignment.com/information-retrieval/
"Information retrieval." Universal Assignment [Online]. Available: https://universalassignment.com/information-retrieval/. [Accessed: June 14, 2024]

## Please note along with our service, we will provide you with the following deliverables:

Please do not hesitate to put forward any queries regarding the service provision.

We look forward to having you on board with us.

# Get 90%* Discount on Assignment Help

### Most Frequent Questions & Answers

Universal Assignment Services is the best place to get help in your all kind of assignment help. We have 172+ experts available, who can help you to get HD+ grades. We also provide Free Plag report, Free Revisions,Best Price in the industry guaranteed.

We provide all kinds of assignmednt help, Report writing, Essay Writing, Dissertations, Thesis writing, Research Proposal, Research Report, Home work help, Question Answers help, Case studies, mathematical and Statistical tasks, Website development, Android application, Resume/CV writing, SOP(Statement of Purpose) Writing, Blog/Article, Poster making and so on.

We are available round the clock, 24X7, 365 days. You can appach us to our Whatsapp number +1 (613)778 8542 or email to info@universalassignment.com . We provide Free revision policy, if you need and revisions to be done on the task, we will do the same for you as soon as possible.

We provide services mainly to all major institutes and Universities in Australia, Canada, China, Malaysia, India, South Africa, New Zealand, Singapore, the United Arab Emirates, the United Kingdom, and the United States.

We provide lucrative discounts from 28% to 70% as per the wordcount, Technicality, Deadline and the number of your previous assignments done with us.

After your assignment request our team will check and update you the best suitable service for you alongwith the charges for the task. After confirmation and payment team will start the work and provide the task as per the deadline.

Yes, we will provide Plagirism free task and a free turnitin report along with the task without any extra cost.

No, if the main requirement is same, you don’t have to pay any additional amount. But it there is a additional requirement, then you have to pay the balance amount in order to get the revised solution.

The Fees are as minimum as \$10 per page(1 page=250 words) and in case of a big task, we provide huge discounts.

We accept all the major Credit and Debit Cards for the payment. We do accept Paypal also.

### ARCH7004: Planning and Development Control Assessment 3

Assessment 3: Choose a particular type of commercial, industrial or high-rise development (Class 2-9 building) or subdivision that is currently being considered by a Consent Authority such as a Local Council (court or tribunal) or a State or Federal Government Agency (this may be the site from Assessment 2). Once

### Promote Person-Centred Approaches in Care Settings:

Remember when you are answering the questions to look at the command words and here is what each of these mean within the questions, below is a table with the meanings of some of the operative words you will see in this unit. Describe Give a clear description that includes

### Promote Health, Safety and Wellbeing in Care Settings:

Remember when you are answering the questions to look at the command words and here is what each of these mean within the questions, below is a table with the meanings of some of the operative words you will see in this unit. Describe Give a clear description that includes

### Promote equality and inclusion in Care settings:

Remember when you are answering the questions to look at the command words and here is what each of these mean within the questions, below is a table with the meanings of some of the operative words you will see in this unit. Describe Give a clear description that includes

### Physical Activity, Health and Wellbeing Assignment

Bsc Public Health and Health Promotion (Top up) June 23 Intake,  LONDON          Physical Activity, Health and Wellbeing Assignment Brief.                  Assessment 1: Poster Design and written presentation of a physical activity intervention (weighted at 20%). 800 words Your aim is to create an intervention that ‘nudges’ students and staff

### Marco studied all evening for a chemistry test scheduled for the following morning

The title of the essay is exactly like this: Center of the page: WEEKLY ESSAY 2) MUST include 500 words ONLY not any more words or less. If you include any content that is NOT in your own words and fail to include appropriate citations, you will receive a zero.

### Innovation Proposal | Part 1 – Feasibility and Sustainability of Innovation

SUMMARY AMAZON Launching a new innovation will always include some level of inherent risk. Failed attempts at innovation can be costly. For this reason, it is important to assess the viability and risk of the innovation.  Get solved or fresh solution on Part 1 – Feasibility and Sustainability of Innovation

### Innovation Proposal | Part 2 – Execution and Change Plan

Instructions:  Part 2 Research indicates that the execution and implementation of innovation is the greatest challenge for leaders. Generating ideas is deemed exciting while implementing change is considered the biggest challenge, which often results in organizational resistance. REQUIREMENTS Top of Form Bottom of Form Submission status Grading criteria Implementation Schedule,

### Innovation Proposal | Part 3 – Leadership Reflection and Application

Instructions  Part 3  REQUIREMENTS Top of Form Bottom of Form Submission status Grading criteria Analysis (see rubric in syllabus for evaluation guidelines) Beginning (0-55); Developing (56-63); Accomplished (64-71); Exemplary (72-80) Fully developed introspective analysis of how innovation impacts personal leadership. Thoroughly examines the influence of personal faith worldview on pursuing

### EDUCATORS INQUIRING ABOUT THE WORLD

EDUCATORS INQUIRING ABOUT THE WORLD     ASSESSMENT 1 PROPOSAL PLAN (FORMATIVE) TEMPLATE (20 marks)  Complete the proposal under the following headings as they provide guidelines for the overall format and contents of the proposal.   DECLARATION: By submitting this assessment I declare the following  Remove ALL Blue Writing before submission. Leave

### Heike’s older brother suffers from a major depressive disorder- Assignment

1) The title of the essay is exactly like this:CHAPTER 8 please Center of the page: WEEKLY ESSAY 2) MUST include 500 words ONLY not any more words or less. If you include any content that is NOT in your own words and fail to include appropriate citations, you will

### Introduction to Sociology

Measurable Objectives Week 7 Materials The materials for the week address the issue of Crime & Deviance. Crime and Deviance are not the same!                                                                                      Crime is a violation of law (local, State, or federal laws).                                                                      Crime is a social construct. Crime is a product of someone’s reality. Deviance Deviance is

### MBA623 Healthcare Management: Technology Analysis

Assessment 3 Information Subject Code: MBA623 Subject Name: Healthcare Management Assessment Title Technology Analysis Assessment Type: Length: Individual video recording 10 minutes maximum Weighting: 30% Total Marks: Submission: 100 Online Due Date: Week 13 Your task Individually, you are required to record a 10-minute webinar discussing My Health Record’s role

### ARCH7004: Planning and Development Control Assessment 4

Assessment 4: Due on: 14 June 2024 NSW Students: The NSW Coast is considered of great importance in terms of its protection, conservation and development opportunities for the State. Describe the elements on the NSW Coastal Management Framework and the key aspects for development control within the State? What is

### EDM9780M CEEL Summative Assignment 2023-2024

Below you will find instructions on completing each of the four parts of your final summative assignment. Part 1 – Personal/professional area of interest in education (1000 – 1,500 words max) For this part of the assignment, you will need to: How to complete this part (Part 1): 1. Choose

Instructions:  Requirements

### CI 596 Materials Analysis and Design for TESOL

Final Assignment Assignment Questions Choose one of the following: Get expert help for CI 596 Materials Analysis and Design for TESOL and many more. 24X7 help, plag free solution. Order online now!

### AT1 PREPARATION REFLECTION TEMPLATE

Weighting: 5 marks (10%) of the assignment. COMPLETE & SUBMIT INDIVIDUALLY. This is the second of THREE documents required for submission for the assignment. Complete the following, describing and reflecting upon your involvement with the preparation for the Group Presentation, including your interaction with other members of your team in

### SUMMATIVE ASSIGNMENT – Mathematics for Science

IMPORTANT INFORMATION 1 Electric power is widely used in industrial, commercial and consumer applications. The latter include laboratory equipment for example water baths, spectrophotometers, and chromatographs. If you have 17.3 kA and 5.5 MV, what is the power? Give the appropriate unit.                                                                                                               (3 marks) 2 Oil immersion objective lenses

### Assignment CW 2. Foundations of Biology

The instructions in RED are the ones which are mark-bearing and need to be answered as part of the assignment. The instructions in BLACK tell you how to carry out the simulation Diffusion simulation: Results table Use Excel to calculate the mean and standard deviation. The functions are AVERGAGE and

### Assignment: Implement five dangerous software errors

Due: Monday, 6 May 2024, 3:00 PM The requirements for assessment 1: Too many developers are prioritising functionality and performance over security. Either that, or they just don’t come from a security background, so they don’t have security in mind when they are developing the application, therefore leaving the business

### LNDN08003 DATA ANALYTICS FINAL PROJECT

Business School                                                                 London campus Session 2023-24                                                                   Trimester 2 Module Code: LNDN08003 DATA ANALYTICS FINAL PROJECT Due Date: 12th APRIL 2024 Answer ALL questions. LNDN08003–Data Analytics Group Empirical Research Project Question 2-The project (2500 maximum word limit) The datasets for this assignment should be downloaded from the World Development Indicators (WDI)

### Microprocessor Based Systems: Embedded Burglar Alarm System

ASSIGNMENT BRIEF 2023/24 Microprocessor Based Systems   Embedded Burglar Alarm System Learning Outcomes This assignment achieves the following learning outcomes:   LO 2 -Use software for developing embedded systems in ‘C’ and testing microcontroller systems including the use of design tools such as Integrated Development Environments and In Circuit Debugger.

### Imagine you are an IT professional and your manager asked you to give a presentation about various financial tools used to help with decisions for investing in IT and/or security

Part 1, scenario: Imagine you are an IT professional and your manager asked you to give a presentation about various financial tools used to help with decisions for investing in IT and/or security. The presentation will be given to entry-level IT and security employees to understand financial investing. To simulate

### DX5600 Digital Artefact and Research Report

COLLEGE OF ENGINEERING, DESIGN AND PHYSICAL SCIENCES BRUNEL DESIGN SCHOOL DIGITAL MEDIA MSC DIGITAL DESIGN AND BRANDING MSC DIGITAL DESIGN (3D ANIMTION) MSC DIGITAL DESIGN (MOTION GRAPHICS) MSC DIGITAL DESIGN (IMMERSIVE MIXED REALITY) DIGITAL ARTEFACT AND RESEARCH REPORT                                                                 Module Code: DX5600 Module Title: MSc Dissertation Module Leader: XXXXXXXXXXXXXXXXX Assessment Title:

### 6HW109 Environmental Management and Sustainable Health

ASSESSMENT BRIEF MODULE CODE: 6HW109 MODULE TITLE: Environmental Management and Sustainable Health MODULE LEADER: XXXXXXXXX ACADEMIC YEAR: 2022-23 1        Demonstrate a critical awareness of the concept of Environmental Management linked to Health 2        Critically analyse climate change and health public policies. 3        Demonstrate a critical awareness of the concept of

### PROFESSIONAL SECURE NETWORKS COCS71196

PROFESSIONAL SECURE NETWORKS– Case Study Assessment Information Module Title: PROFESSIONAL SECURE NETWORKS   Module Code: COCS71196 Submission Deadline: 10th May 2024 by 3:30pm Instructions to candidates This assignment is one of two parts of the formal assessment for COCS71196 and is therefore compulsory. The assignment is weighted at 50% of