Contents
Contents
Part 1 – Exploratory data analysis. 4
Part 2 – Training and testing set (sample) 4
Part 3 – Simple linear regression. 8
Part 4 – Multiple linear regression. 11
Notes on plagiarism and cheating (and how to avoid it) 15
Two very important notes:
- This is a statistics course and the goal of this assignment is to demonstrate your understanding of the whole course. When you are reviewing your work, ask yourselves “are we demonstrating our understanding of relevant topics?”
- Related to 1, though the rubric is in the middle of the document, it is the most important part of the assignment as it specifically tells you what you are being grade on. As you complete each step, ensure that you have checked your work against the rubric to make sure you are maximizing your grade. Also it indicates where to put most of your effort (i.e. the portion of the assignment that is worth the most should be where you put most of your work).
General Information
- This assignment has four parts, which involve utilizing multiple analysis techniques to explore a human resources problem.
- The overall goal of this assignment is to demonstrate your understanding of the key topics in this course and that you can apply them in a real-world situation.
- Worth: 15% of total mark for the course
- Due: Friday, December 17 by 11:59pm
- Late submissions will not be accepted.
Outside sources
NOTE: If you directly use an outside source (e.g., paraphrase or quote), you still need to do a proper APA citation. What is described below is only for outside sources that you looked at for help and not for directly writing your work.
As I am absolutely convinced that most of you are using outside sources and failing to cite them, let us make it easy. If you look at a website outside of the class (i.e. not our textbook or from BB), insert the URL in the table below, state which part of the assignment you used it for (1, 2, 3 ,4), and very briefly how you used it. In the first row, I’ve provided an assignmentple of what I’m expecting. Please delete it before submitting.
URL | Part | How used |
2 | Read about r and how to interpret it. | |
If you claim you used no outside sources OR you directly used all outside sources, instead of submitting the above table, include the following sentence in your work (see submission guidelines for where):
“I, [insert full name], solemnly swear that I did not use any outside source (except those properly cited using APA referencing) to complete this assignment. I understand that failure to indicate outside sources is an act of academic misconduct and could result in getting 0 on this assignment.”
Scenario
The Craybill Instrumentation Company produces highly technical industrial instrumentation devices. The company has 45 sales regions, each headed by a sales manager.
The human resources (HR) director has the business objective of improving recruiting decisions concerning sales managers. The HR director determined that the primary method of evaluating the effectiveness of recruitment is the hire’s resulting “sales index” score, which is the ratio of the regions’ actual sales divided by the target sales. The target values are constructed each year by upper management, in consultation with the sales managers, and are based on past performance and market potential within each region.
At the time of their application, candidates are asked to take the Strong-Campbell Interest Inventory Test and the Wonderlic Personnel Test. The former test measures the applicant’s perceived interest in sales, while the latter measures their perceived ability to manage. For both tests, the higher the score the better. Due to the time and money involved with the testing, some discussion has taken place about dropping one or both of the tests.
The HR director decided to use regression modelling to predict the sales index (Sales) of the sales managers. To start, the HR director gathered information on each of the 45 current sales managers, including years of selling experience (Experience), and the scores from both the Strong-Campbell Interest Inventory Test (SCIIT) and the Wonderlic Personnel Test (Wonder). The attached Excel file contacted information on the 45 current sales managers.
Your goal is to perform analysis to determine: Can the sales index be predicted by the variables chosen by the HR director? If so, which variable or combination of variables is the most effective predictor.
What you need to do
To answer the above question, follow the instructions below. You will submit all of the tables in each of the parts and your Excel file with your completed work. See Submission Guidelines for more details.
Part 1 – Exploratory data analysis
Prior to doing the regression analysis, the HR director wants to get a sense of the quality of sales managers Craybill currently has. To do this, run an exploratory data analysis of the sales index and determine what story you want to tell about the sales managers. Then choose the visualization or numerical summaries (or both) that best support the story you want to tell. Insert them below. Then explain the story.
Table 1
[Evidence: Visualization, numerical summaries or both] | [Explanation] |
Part 2 – Training and testing set (sample)
A common practice when performing regression analysis in analytics is to divide the data into a training data set and a testing data set. The training data set is used to build the model (in this case the regression model), while the testing data set is used to test the ability of the model to make predictions. The normal rule for dividing the data is 80/20. That is, the training set is made up of 80% of the data while the testing set is made up of 20% of the data.
In this first step, divide the data set to make the training and testing set.
- Testing set: Randomly select 20% of the sales managers and their associated data. Copy and paste those into the appropriate part of the table below.
- Training set: Then take the remaining 80% of the sales managers and their associated data, and copy and paste those into the appropriate part of the table below.
- At the top of the table, briefly explain how you collected your sample in the row provided in the table.
Table 2
Brief explanation of how the sample was collected. | |||||
Sales manager ID | Sales | Wonder | SCIIT | Experience (yrs) | |
Testing set | |||||
Training set | |||||
Part 3 – Simple linear regression
Build and evaluate a simple regression model to predict the sales index using the training set. To start, pick one of the three variables (Experience, SCIIT, or Wonder) that you believe will be useful at predicting the sales index.
Based on the variable you’ve chosen, you will:
- Run the residual analysis.
- Determine if linear regression is appropriate.
- Find and interpret the regression equation.
- Evaluate the effectiveness of the model (regression equation).
To do the above, fill in the following table. Leave the titles and labels alone. But anything in [] should be removed or replaced.
Table 3
Question | ANSWER | ||||||||
State the variable you chose and explain why you chose it. | [Provide one or two sentences that explain why you think your variable is a good predictor of sales index. Make sure to clearly state which variable you chose.] | ||||||||
Run the residual analysis to determine if the assumptions of regression are valid for this model. | Linearity | [Insert the relevant visualization for this portion of the residual analysis] | [Explain whether the visualization indicates whether the assumption of linearity is appropriate.] | ||||||
Independence | [Insert the relevant visualization for this portion of the residual analysis] | [Explain whether the visualization indicates whether the assumption of independence is appropriate.] | |||||||
Normality | [Insert the relevant visualization for this portion of the residual analysis] | [Explain whether the visualization indicates whether the assumption of normality is appropriate.] | |||||||
Equal variance | [Insert the relevant visualization for this portion of the residual analysis] | [Explain whether the visualization indicates whether the assumption of equal variance is appropriate.] | |||||||
Build the regression model. | Is linear regression appropriate? | [Insert the scatterplot] | [Find and interpret the r-value. ] | ||||||
Build the model. | [State the regression equation. Include for what values of X the equation is valid.] | ||||||||
Hypothesis test for the slope. (Steps 2 and 3 are done for you) | Step 1 | [Beta symbol if you need it: β] | |||||||
Step 2 | Use a level of significance of 5%. | ||||||||
Step 3 | Use the Student-t distribution. | ||||||||
Step 4 | |||||||||
Step 5 | |||||||||
Step 6 | |||||||||
95% confidence interval for the slope | [Provide a complete and thorough interpretation of the 95% confidence interval for the slope. Make sure to include the confidence interval in your answer.] | ||||||||
Evaluate the regression model. | Find and interpret the coefficient of determination. | ||||||||
For each of the sales managers in the Testing data set, provide the following information. To find the predicted value, use the regression equation from above. | X-value | Actual y-value | Predicted value y-value | Residual | 95% confidence interval | 95% prediction interval | |||
Choose one of the X-values. Above, highlight the X-value you used. | [Interpret what the confidence interval means for the X-value.] | ||||||||
[Interpret what the prediction interval means for the X-value.] | |||||||||
Based on the predicted values and their associated residual analysis, comment on the ability of your model to predict the sales index. | |||||||||
Part 4 – Multiple linear regression
Build two multiple regression models using the training data set and determine which one is better. Each model needs to have at least two independent variables. Once you build your models, fill in the following table.
Table 4
Model 1 | Model 2 | |
Chosen variables | ||
Relevant scatterplots | ||
r | ||
Adjusted r^2 | ||
p-value for group of slopes | ||
Individual p-values for slopes | ||
Compare the two models. Which model would you recommend that the HR director use? Justify your answer. |
Submission Guidelines
Your submission needs to follow these guidelines.
Submit one Word doc or PDF file and one Excel file.
The Word doc needs to follow this format:
- Section 1: Filled in Outside Sources table OR the statement that you did not use any outside sources. If you have used any sources directly in your work, insert the reference list here (in addition to the table).
- Section 2: Table 1
- Section 3: Table 3
- Section 4: Table 4
- Section 5: Table 2
- Section 6: Include any additional information you want. Note: There is no obligation to do this.
See the Word document “ assignment submission (Word).docx” for the required layout.
The Excel doc should have a minimum of five sheets that demonstrate your computations to complete the assignment. Do not provide explanations or any additional information. You are providing the Excel spreadsheet simply so we can see your work. Your spreadsheet needs to include the following:
- Part 1: Your exploratory data analysis
- Part 2: Your training set and testing set
- Part 3: Your regression analysis
- Part 4 – Model 1: The regression analysis for model 1
- Part 4 – Model 2: The regression analysis for model 2
You can have more sheets than this but ensure that you have labelled the sheets appropriately to help your instructor find the information.
See the Excel document “ assignment submission (Excel).xlsx” for the required layout.
Breakdown of marks
Here is how you’ll be marked on the assignment
- Superior performance– A+: The answer is correct, complete, and demonstrates a very strong understanding of the relevant course content.
- Excellent – A: The answer is correct, complete, and demonstrates a strong understanding of the relevant course content.
- Good – B: The answer is mostly correct and complete, with no errors or only small ones. Understanding of relevant course content is generally demonstrated. Above average performance.
- Satisfactory – C: The answer is mostly correct and complete, with either multiple small errors or a significant error. Basic understanding of course content is demonstrated.
- Marginal Performance – D: There is more than one significant error. The response suggests a lack of understanding of course content.
- Fail – F: There are multiple errors and overall, the answer does not demonstrate understanding of the course content.
- Not done: The component is missing.
Description | Mark | ||
Part 1 – Exploratory data analysis | The response presented in the table demonstrates that the student correctly knows how to do data analysis, can identify a story within the data, can present evidence to support the story, and can communicate the story in a meaningful way to their employer. | A+ | 10 |
A | 8.5 | ||
B | 7.5 | ||
C | 6 | ||
D | 5 | ||
F | 2 | ||
Not done | 0 | ||
Part 3 – Residual analysis | The response presented in the table demonstrates that the student correctly understands how to perform residual analysis, understands what the results of the analysis indicates, and can effectively communicate the results of the analysis. | A+ | 10 |
A | 8.5 | ||
B | 7.5 | ||
C | 6 | ||
D | 5 | ||
F | 2 | ||
Not done | 0 | ||
Part 3 – Build the regression model | The response presented in the table demonstrates that the student can correctly make an argument why two variables are related, explain whether the two variables are related, perform inferential statistics on the slope, and communicate the results of the analysis to communicate its meaning. | A+ | 15 |
A | 13 | ||
B | 11 | ||
C | 9 | ||
D | 7.5 | ||
F | 3 | ||
Not done | 0 | ||
Part 3 – Evaluate regression model | The response presented in the table demonstrates that the student can correctly utilize multiple methods to evaluate the usefulness of the regression model at making predications and can communicate those results accurately. | A+ | 5 |
A | 4 | ||
B | 3.5 | ||
C | 3 | ||
D | 2.5 | ||
F | 1 | ||
Not done | 0 | ||
Part 4 – Multiple linear regression | The response presented in the table demonstrates that the student can correctly find relevant features of regression models to allow for their comparison. Additionally, the student can appropriately compare the features and make an argument for why one model is better than the other model. | A+ | 10 |
A | 8.5 | ||
B | 7.5 | ||
C | 6 | ||
D | 5 | ||
F | 2 | ||
Not done | 0 | ||
Part 2 and Excel | The table includes both the training set and the testing set. The explanation of how the random sample was found was correct and sufficiently explained. The Excel file with all required computations was included. | A+ | 5 |
A | 4 | ||
B | 3.5 | ||
C | 3 | ||
D | 2.5 | ||
F | 1 | ||
Not done | 0 | ||
Outside sources | An appropriate list of outside sources is provided. If direct sources are used, correct APA referencing is used. | Complete | |
Incomplete | Depends on severity of omission. Anywhere from -3 to -55 | ||
Total | 55 |
Notes on plagiarism and cheating (and how to avoid it)
Plagiarism is any act where you present work as your own when it is not. When plagiarism is found, a letter is sent to the Office of Student Conduct.
Cheating is when you do something which gives you an unfair advantage over other students.
When you submit anything at with your name on it, you are stating that you are comfortable with all the work presented and you agree that it is your work. Make sure you review your work to ensure it is your own prior to submitting.
Here are two common scenarios that I have seen in the past.
Scenario 1: Suppose you do not quite understand standard deviation. So you google “standard deviation” and then you click on Standard Deviation Definition on Investopedia One sentence makes sense to you “Standard deviation measures the dispersion of a dataset relative to its mean.” What is the right way to deal with it, so you are not engaging in plagiarism?
Options | Result |
We found the standard deviation of income to be $4000. Standard deviation measures the dispersion of a dataset relative to its mean. | Plagiarism! This is a direct copy and paste without any indication of the source. This is work presented as your own when it is not. |
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean. | Plagiarism! Though it is not a direct copy, it is still close to the websites wording and it is still presented as your work when it is not as there is no citation. |
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean (Hargrave & Westfall, 2020). | Not obviously plagiarism but still borderline. A correct in-text citation was used, but the quote was insufficiently paraphrased. Changing one word is not paraphrasing. |
We found the standard deviation of income to be $4000. This measure indicates how much the incomes vary from the mean (Hargrave & Westfall, 2020). | Not plagiarism : ) There is a correct APA in-text citation and the sentence was paraphrased. |
We found the standard deviation of income to be $4000. “Standard deviation measures the dispersion of a dataset relative to its mean” (Hargrave & Westfall, 2020, para. 2). | Not plagiarism : ) Direct quote is used (and indicated by quotation marks) and the a correct APA in-text citation was used. BUT in this assignment, you should avoid using direct quotes and instead she focus on what these definitions mean in the context. |
Note: An APA proper reference at the end of the document needs to be included if outside sources are used. For this assignmentple, the APA reference would look like:
Here are some good habits:
- Never copy and paste a sentence straight into your assignment document. Instead, immediately paraphrase it and include the reference. A lot of students copy and paste and then forget to change it –it is still plagiarism.
- If you spend any time on a website as you are doing this assignment, write down the websites name and URL in a document (use the Outside Sources table for this assignment).
Get expert help for MGMT Course Assignment Fall 2021 and many more. 24X7 help, plag free solution. Order online now!