Word count – 2000 words
Total Marks – 65
we return to Bunnyland and Otherland one year later! When we last saw them things had come to a tentative conclusion but substantial challenges remained. Could people from both lands manage to work together to solve their food problem? Would tensions flare? Would the fragile peace get disrupted? Would they be able to grow enough food that nobody went hungry?
You’ll find answers to these questions in the vignettes below. We’ll visit each of our friends and see what they’re doing and you’ll be able to help them with their statistics one last time. Enjoy!
2 The penalty is as detailed in the student manual: “10% of the total marks available for a given assessment task will be deducted for every 10% that the word count exceeds the word limit specified for the task”. So up to 2,199 results in no penalty. Between 2,200 and 2,399 is a 10% deduction, between 2,400 and 2,599 is 20%, and so forth.
The opening up of relations between Bunnyland and Otherland has been great for Gladly. He hadn’t realised how hard he found it as one of the few bears in a land full of bunnies until things changed. He ended up making fast friends with Super Size, who also had struggled with a similar alienation being so large and different. He was still large and different, but with Gladly that didn’t seem to matter, and the two bonded over one of their favourite topics: food.
Now that they were growing so much more food, both Gladly and Super Size’s persistent hunger was assuaged, and the two friends turned their thoughts to how they could share their love of food with others. One day Gladly had an idea: they could start a restaurant! It was ideal for both of them. They could be around friends all day, and make use of all of the fantastic new foods being grown in their land.
Before long, their restaurant was the social hub linking Bunnyland and Otherland. Located on the border, Meatball Cafe was a vibrant place of food, conversation, and laughter. No longer did Gladly wonder what he wanted to do with his life: he knew! And best of all, there were meatballs every day.
Being responsible business owners, Gladly and Super Size kept track of their sales and before long started wondering which day they sell more meatballs: Friday, Saturday, or Sunday. This data can be found in the dm tibble, which has been loaded for you. Each row corresponds to the data from one week. There are three columns:
week: What week it was
day: Can take three possible values: Friday, Saturday, or Sunday for that week meatballs: The number of meatballs sold on that day
Their research question is whether on average the same number of meatballs are sold on each of the three days. Your job in the next few problems is to use your R and statistics knowledge to find the answer to this question.
Q1 [3.5% of total marks]
First, as always, let’s visualise the data. In the code chunk for this question make a bar plot showing how the number of meatballs varies by day of the week. Your figure should have bars with non- transparent black outlines, error bars corresponding to standard error, a semi-transparent colour scheme for the bar colours with a different colour for each day, the individual data points in the same colour as the bar colours (and not transparent), colours other than the default, and of course a title, informative axis labels, and a nice theme.
Q2 [3.5% of total marks]
There are two assumptions you should check given this research question. Check them in the code chunk for this question. In 70 words or less, indicate what assumptions you are checking and what the results of the code indicate. Was either assumption violated? How can you tell?
Q3 [4% of total marks]
Run the appropriate statistical test to evaluate the research hypothesis, given the results of Q2. Report on the results in 70 words or less. In your report, don’t worry about including descriptive statistics or the results of the assumption check from Q2 but do include an explanation of which statistical test you used and why, what the predictor and outcome variables were, the appropriate stats reference, and the interpretation of this data in terms of the research question. Don’t worry about reporting effect size or pairwise tests.
END OF SECTION
Not much has changed for Cuddly Paws, although all the drama made her think even more about what she wanted out of life. She hadn’t enjoyed all of the tension and worry but had enjoyed how everything had forced her out of her shell a bit and deepened her interactions with people. Though she still preferred the comforts of art over people, she did expand out of her comfort zone enough to begin giving her art away.
It turned out that many people wanted her art, and before long, Cuddly Paws found herself spending most of the weekdays on her art (she has always been very disciplined about keeping weekends free). She eventually grew curious about whether she tended to produce around one painting per day (i.e., five paintings per week). Luckily she has kept track of the number of paintings she produced each week.
This data can be found in the da tibble, which has been loaded for you. Each row corresponds to the data from one week. There are two columns:
week: What week it was
art: The number of paintings Cuddly Paws made that week
The data is shown in the histogram below. Cuddly Paws would like to know if she averages around five paintings per week.
Q4 [1.5% of total marks]
There is one assumption you should check given this research question. Check it in the code chunk for this question. In 35 words or less, indicate what assumption you are checking and what the results of the code indicate. Was the assumption violated? How can you tell?
Q5 [5% of total marks]
Run the appropriate statistical test to evaluate Cuddly Paws’ question and report on the results in 80 words or less. In your report, include descriptive statistics for the mean as well as an explanation of which statistical test you used and why, the appropriate stats reference, and the interpretation of this data in terms of the research question. Be sure to also calculate effect size and report your answer along with its interpretation.
END OF SECTION
The path toward integrating the people of Bunnyland and Otherland was not easy, but was helped a lot by the burgeoning friendship between LFB and Rainbow the unicorn. LFB will always remember Rainbow as her first friend in Otherland. For her part, Rainbow has found LFB to be a delightful companion and loves her fresh views and honest, straightforward manner. The trust between the two of them carried the day several times when tensions were high.
As a result, when it came time for the next elections, there was a groundswell of support for the idea of electing LFB and Rainbow together to be leaders of their respective lands. LFB and Rainbow were more surprised than anyone, but found that they truly enjoyed being able to guide their people and work together to make things better.
It has now been five months since they were elected and Rainbow and LFB would like to know what factors are associated with support for them. They therefore commission a study which investigates demographic information associated with each voting district. This data can be found in the dl tibble, which has been loaded for you. Each row consists of one voting district. There are four columns:
district: Random code corresponding to each voting district
support: Percentage of voters in that district who support Rainbow and LFB
economy: Per capita GDP of that district in the last six months, in thousands of dollars (e.g.
20 means people made an average of $20,000 each) population: Number of people in that district
LFB and Rainbow want to know whether the population and/or economy of a district predict the level of support they have in that district (don’t worry about any possible interactions).
The html file called plot3d contains an interactive plot of the data. Feel free to open it up and play with it to get a sense of things.
Q6 [1.5% of total marks]
There are multiple assumptions you could test for this dataset, but for the sake of simplicity we’ll stick to only a few. First, check whether the assumption of linearity holds in the code chunk for this question. In 50 words or less, indicate what the results of the code indicate. Was the assumption violated? How can you tell?
Q7 [1.5% of total marks]
Next, in the code chunk for this question check whether the assumption of collinearity holds. In 50 words or less, indicate what your results reveal. Was the assumption violated? How can you tell?
Q8 [7.5% of total marks]
Regardless of your answers in Q6 and Q7, assume that all statistical assumptions hold and run the appropriate statistical test to evaluate this research question. Report on the results in 180 words or less. In your report, don’t worry about descriptive statistics but do include a description of which statistical test you used, the relevant stats references, and the interpretation of this data in terms of the research question. Include effect size, an interpretation of all of the unstandardised coefficients except the intercept, and a discussion of the relative contribution of each variable based on the standardised coefficients.
END OF SECTION
The last year has been a good one for Flopsy. They have embraced their non-binary identity and decided to change their name and pronouns to something that felt more like “them”. They are now known as Flaye and have formed a goth punk band along with Sissily the snake. They’ve spent much of the last year touring around Bunnyland and Otherland with their band, which is called Mental Limit Theorem. They play in three main venues: Bunnystadia (in Bunnyland), StarVenue (in Otherland), and ZedZone (on the border).
Of course, like any band, both Flaye and Sissily have acquired a certain number of groupies (very enthusiastic fans). They are curious about whether the two of them have different distributions of groupies across the three venues: in other words, are Flaye’s groupies distributed across venues in the same way that Sissily’s are?
Luckily they have the data on the number of groupies each of them had the last time they played in each venue. It is found in the table punkTable shown below and included in the Markdown file.
Q9 [5% of total marks]
Run the appropriate statistical test to evaluate Flaye’s research question and report on the results in 70 words or less. In your report, don’t worry about the descriptive statistics but do include an explanation of which statistical test you used, the appropriate stats reference, and the interpretation of this data in terms of the research question. Be sure to also calculate effect size and report your answer along with its interpretation.
END OF SECTION
Following the events of a year ago, Bunny achieved her dream and became a psychologist. She also grew to be good friends with Foxy, having been impressed by the empathy and courage that became visible during her adventures in Otherland. That adventure and the bond with Bunny gradually led Foxy to come out of her shell more, sharing more about her upbringing in far-away Foxland. Upon graduating with her degree, Bunny and Foxy decided to open up a psychological practice, with Bunny as the psychologist and Foxy as the manager. Soon their practice — called Hearts and Minds — became central to helping the people of Bunnyland and Otherland begin to process the years of slow trauma caused by increasing hunger, fear, and uncertainty. And both Bunny and Foxy grew greatly in confidence, having faced so many of their biggest fears (including statistics and R).
On that note, Bunny and Foxy have taken to using their skills to analyse their own behaviour. Most recently, they have been wondering about jellybeans. Both of them tend to give patients jellybeans when they visit. There are three kinds of jellybean: blue, red, and green. They wonder: do Bunny and Foxy give out the same amount of jellybeans? Are each of the colours distributed similarly? Are there any interactions?
Luckily they have kept copious records tracking candy usage; this data is in the tibble called dj, which has been loaded for you already. Each row consists of one patient, and there are four columns:
patient: unique identifier for each patient
person: the person providing the jellybeans (Bunny or Foxy) colour: the colour of the jellybeans
jellybeans: the number of jellybeans provided
The results are shown in the figure below, along with a table showing the mean jellybeans in each cell.
Q10 [6.5% of total marks]
For simplicity let’s assume that all of the assumptions of the relevant statistical test have been met, so you don’t have to evaluate them. Instead, run the appropriate statistical test to evaluate Bunny and Foxy’s research question and report on the results in 160 words or less. In your report, don’t worry about descriptive statistics but do include an explanation of which statistical test you used, the appropriate stats reference, and the interpretation of this data in terms of the research question. Report and interpret a measure of effect size but don’t worry about any posthoc tests.
END OF SECTION
END OF SECTION
Quackers has decided that he is tired of people calling him annoying and not seeing his many fine qualities, so in the past year he has embarked on a program of self-improvement. The first part of this program was to gather data about what factors precisely make him annoying. To do so, he decided to go out to lunch one-on-one with lots of different people. He recorded himself and also asked the other person at the end of the lunch to rate how annoying people found him. This data is found in the tibble dq. Each row corresponds to a single lunch and it has four columns.
lunch: a specific lunch
annoyingness: rating of annoyingness on a scale of 0 to 100, where 0 is not annoying at all
and 100 is the most annoying person ever
quacks: the number of times Quackers just shouted out “QUACK!” randomly at lunch listening: the percentage of time Quackers spent listening rather than talking on a scale of 0
to 100, where 0 means he talked always and 100 means he listened always
Quackers wants to know which of the factors (quacks or listening or both) best predicts annoyingness. For this he turns to you.
Q11 [2% of total marks]
Let’s begin our inquiries by visualising the relationship between quacks and listening. Using whatever geom seems appropriate to you, make a figure that shows the relationship between these two variables. One should be on one axis and one on the other. Use a colour other than the default, a nice theme, and appropriate labels for the axes. Display a best-fit line as well using geom_smooth().
Q12 [3% of total marks]
Calculate correlation between quacks and listening and report on the results in 50 words or less. In your report, do describe which statistical test you used and why, use the appropriate stats reference, and interpret what this result can reveal about the relationship between quacks and listening at the population level.
Q13 [5% of total marks]
Quackers’ main question is about how these variables are related to annoyingness and specifically which of them matter, if any. For this we are going to do model selection. For simplicity let’s assume that all assumptions are met. In the code chunk below, create the following three models (you don’t have to show the output, just create them).
1. modelAQ: annoying is the outcome variable, quacks is the only predictor
2. modelAL: annoying is the outcome variable, listening is the only predictor
3. modelAQL: annoying is the outcome variable, listening and quacks are both predictors,
but there is no interaction
Q13a. Perform model selection between these three using AIC or BIC (either is fine) as the complexity penalty. In no more than 15 words, state which model is preferred and why.
Q13b. In no more than 85 words, explain why we use something like AIC or BIC to do model selection, rather than just picking the model that has the highest R2.
Q14 [7.5% of total marks]
Report the best-fitting model from Q13, including a description of which statistical test you used, the appropriate stats references, and the interpretation of this data in terms of the research question. Include a discussion of effect size but for space reasons don’t worry about discussing either the unstandardised or standardised coefficients. Where appropriate, make reference to the AIC/BIC result and/or the non-best-fitting model(s) to answer Quackers’ question about which factors predict annoyingness and how you know that. Use your judgment about what information to include for this. You have no more than 130 words.
His missions to Otherland sparked in Doggie a taste for adventure: while all the other people started settling in and working together, he grew increasingly restless. Moreover, he realised that some early life trauma meant he would probably not be able to trust the Others for a while, so he decided it would be best to heal on his own. He spent more and more time wandering around in the hills nearby, wondering what he wanted to do with his life.
On one of these trips he came across Kevin and Kevin Clark (the string), who were feeling similarly at sea. They got to talking and realised that what they all wanted was to strike out somewhere new — to see what there was to see in the mountains and valleys beyond both Otherland and Bunnyland. They thus decided to give it a try together. Doggie was unruffled by Kevin’s bouts of grumpiness and surliness, and Kevin was impervious to Doggie’s occasional impulsive bursts of excitement. Kevin calmed Doggie and Doggie entertained Kevin, and Kevin Clark smilingly observed them both.
Before long the three of them gained a reputation far and wide as almost mythical adventuring figures. They never stayed anywhere for very long, but were always up for a laugh and some stories and of course some amazing songs (played on Kevin with Kevin Clark, and sung by Doggie).
One of the things that they joked about all of the time was their snoring. They each slept in a separate tent, and both complained often (jokingly) that the other person’s snores kept them awake. (Yes, apparently sentient guitars can snore!) They decided to solve the problem with data and recorded themselves sleeping every night for a while. Their question was whether one of them snored more than the other. This led to the tibble dd, which has been loaded for you. Each row corresponds to one night and it has three columns:
night: the night in question
person: Doggie or Kevin
snores: the number of snores that person gave that night
Q15 [1.5% of total marks]
Convert dd to a wider version of the dataset called dd_wide which has three columns: night, kevin (how many snores Kevin did that night), and doggie (how many snores Doggie did that night).
Q16 [3.5% of total marks]
There are two assumptions you should check given this research question. Check them in the code chunk below. In 70 words or less, indicate what assumptions you are checking and what the results of the code indicate. Was either assumption violated? How can you tell?
Q17 [4% of total marks]
Run the appropriate statistical test to evaluate the research question and report on the results in 70 words or less. In your report, don’t worry about descriptive statistics or effect size but do include an explanation of which statistical test you used and why, the appropriate stats reference, and the interpretation of this data in terms of the research question.
END OF SECTION
Finally, we get to Shadow and Little Blue. This semester showed both of them how much they love statistics and R, and it showed everyone in Bunnyland and Otherland how valuable statistics and R are when making decisions. As a result, they decided to open a statistical consulting business. Their job is both to collect data but also to answer anybody and everybody’s statistical questions. They are very happy.
What follows are a few of the questions they’ve gotten recently. Shadow and Little Blue have gone on leave, so see if you can answer them as well!
Q18 [2.5% of total marks]
One of Shadow and Little Blue’s clients is very proud of their research, in which they collected a huge amount of data and analysed it every way they could, conducting as many different statistical tests on it as they could think of. “I didn’t know what to look for, so I thought I would let the data speak to me,” they said. “And it did! I got many p-values less than 0.05.” Should Shadow and Little Blue endorse their approach to data analysis? If so, describe two reasons for this. If not, describe one reason why not and offer a suggestion for something they could do instead. Use 145 words or less.
Q19 [4.5% of total marks]
Q19a. Consider each of the following pairs of statistical references. Which of each (A or B) is impossible? (Note: use your knowledge about what each test statistic reflects and how it is related to degrees of freedom and p-values to answer this question. You do not need to do any coding).
- (i) 𝐴. χ2(4) = 63.3,𝑝 = .843 or B. χ2(2) = 2.36,𝑝 = .301
- (ii) A.t(15)=0.02,p=.003 or B.t(29)=2.41,p=.023
- (iii) A. F(2,105) = 0.66, p = .518 or B. F(3,98) = -2.12, p = .482
Q19b. For each answer in Q19a, explain it by describing intuitively what the test statistic captures and thus why you chose that answer, making reference to the degrees of freedom and/or p- value as appropriate. Use no more than 170 words in total across all three parts.
Q20 [3% of total marks]
Cook’s Distance incorporates two components. In no more than 70 words, explain what each of these components are and why both matter for identifying data points of concern. (You don’t need to include or talk about the mathematical equation, just describe intuitively what each component captures).
Q21 [4% of total marks]
Imagine we were suddenly transported to a parallel universe in which the Central Limit Theorem stated that the sampling distribution of the mean always became more uniform (rather than becoming more normal) with increasing sample size. What would this mean for the nature of the 95% confidence interval, and why? In this universe, what should scientists do to minimise the error in their experiments? Use 130 words or less.
END OF SECTION
Some of these questions require numerical answers only. Don’t delete anything from the Rmd file; instead you should type answers into the spaces provided.
Before attempting these questions please ensure that you have studied:
– The Week 11 and 12 lectures
– The Week 12 tutorial
– The Practice Quiz entitled “Revision Quiz for Section 12.3 of Week 12 (Day 2)”
– The questions and solutions contained within the document “pa-exam-prep-prompts- solutions.pdf”, which is linked from the Exam page of the Canvas website for this subject.
Q22 [2% of total marks]
Imagine you have a forest plot that is based on only two studies, and therefore has only two confidence intervals in it. The confidence intervals relate to the mean difference between the Control and Experimental groups on some outcome variable, and the confidence intervals happen to be identical.
Study 1 has a 95% Confidence Interval Lower Bound of -0.5, and a 95% Confidence Interval Upper Bound of +6.5.
Study 2 also has a 95% Confidence Interval Lower Bound of -0.5, and a 95% Confidence Interval Upper Bound of +6.5.
Study 1 and Study 2 also have the same p-value, which was .10.
Produce some R code in the chunk provided that produces the lower and upper bound for a meta- analysis combination 95% confidence interval based on those two studies. Make sure that your code outputs (prints) two numerical values: the lower and upper bounds of the confidence interval. Report each value to 3 decimal places, e.g. 99.123, and specify which value is the lower bound, and which value is the upper bound, in the answer space provided. Your output will necessarily need to be a somewhat rough estimate, as you will not have any further information about the two studies beyond what has been provided above.
Q23 [2% of total marks]
One criticism of meta-analysis is that it may ignore important differences across studies, and hence be ‘mixing apples and oranges’. Make two responses to this criticism, based on the lecture. Answer in 80 words or fewer.
Q24 [2% of total marks]
There is a test of hovercraft driving ability which has a reliability of .91, a mean of 50, and a standard deviation of 6. Aleksandra scored 57 on this test. You would like to help reassure Aleksandra’s passengers that we have good grounds for believing her hovercraft driving abilities are above average. Report a 95% Confidence Interval for Aleksandra’s predicted true score. Give two numerical values: the lower and upper bounds of the Confidence Interval. Show your workings in the code chunk provided. Report each value to 3 decimal places, e.g. 99.123, and specify which value is the lower bound, and which value is the upper bound, in the answer space provided.
Q25 [3% of total marks]
Your workplace makes important decisions about individuals on the basis of the results of a certain test. You were the designer of the test, and all the items in it were originally devised by you. The test has a reliability of 0.97.
For many years your workplace has used the test, and as far as you can tell the test has worked well. However, now your boss wants to reduce the number of items, as this will reduce the cost of administering the test. Your boss asks you to estimate what reliability the test will have if only 25% of the items are retained. Your boss also seeks your advice on what percentage of items you think should be retained.
In 100 words or fewer, in the answer space write a message to your boss in relation to this issue. You may present supporting code in the code chunk provided.
Q26 [4% of total marks]
There is a disorder named Glossop’s Disorder. It is a rare disorder, for which neither a cause nor a cure is presently known. Glossop’s Disorder is a member of a family of rare disorders known as the Baldrickian disorders. People with Glossop’s Disorder are a subset of people with Baldrickian disorders – everyone with Glossop’s Disorder has a Baldrickian disorder, but not everyone with a Baldrickian disorder has Glossop’s Disorder.
A test for Glossop’s Disorder has been developed, and a trial was done in which the test was applied to 5,000 people known to definitely be suffering from a Baldrickian disorder. The trial results showed:
280 true positives (people who were criterion positive, test positive) 120 false positives (people who were criterion negative, test positive) 4550 true negatives (people who were criterion negative, test negative) 50 false negatives (people who were criterion positive, test negative)
Give a numerical answer to each of questions Q26a-Q26d. Express all these numerical values as percentages, with each percentage reported to three decimal places (e.g. 99.123%), and write these in the answer spaces provided. Use the code chunks provided to show the calculations for each.
Q26a. What is the test sensitivity?
Q26b. What is the test specificity?
Q26c. What is the test Positive Predictive Power? Q26d. What is the test Negative Predictive Power?
Q27 [2% of total marks]
This question relates to the situation and data described in Q26.
You are a University student and clinician-in-training. As part of your training, the University has decided that you and all your classmates should experience having the test for Glossop’s Disorder administered to you by a professional trained in administering such tests. You have the test administered to you, and to your great surprise you test positive for Glossop’s Disorder.
In 30 words or fewer, do you think that the probability you have Glossop’s Disorder is “less than 70%”, “exactly 70%”, or “greater than 70%”? In your answer you must re-state one of these three options and also briefly explain the reasoning behind your answer.
Q28 [1% of total marks]
Imagine that I included on the Research Methods exam several trivia questions related to the Carlton Football Club’s history and achievements, e.g. “In what year did Carlton last win the Premiership?”. What is the term for the threat to validity that would be posed by the inclusion of such content? Answer in 10 words or fewer.
Q29 [3% of total marks]
Within the context of a Multitrait-Multimethod Matrix, consider the following two correlations:
Correlation A. Correlation between self-report measure of conscientiousness and self- report measure of extraversion.
Correlation B. Correlation between self-report measure of conscientiousness and acquaintance-report measure of conscientiousness.
Q29a. Would you prefer that Correlation A is higher than Correlation B, or that Correlation B is higher than Correlation A, or that the two correlations are about the same? The only permissible answers here are “I would prefer Correlation A be higher than Correlation B”, “I would prefer Correlation B be higher than Correlation A”, or “I would prefer that both correlations are about the same”. Therefore you must answer in exactly 10 words.
Q29b. What sources of variance does each correlation (i.e. Correlation A and Correlation B) relate to? Answer in 70 words or fewer.
Q30 [1% of total marks]
This is a freebie – as long as you say anything, you will get full credit! If you lived in Bunnyland or Otherland, what do you think you would do or be?
END OF SECTION
Get expert help for Bunnyland and Otherland: One Year Later and many more. 24X7 help, plag free solution. Order online now!