FIN330: Advanced Methods Final Project – Fall 2020
********************************************
One of the goals in movie industry is to produce successful movies. A successful movie is often considered as generating revenues. Can information available soon after (or before) a movies release be used to predict its world revenue? To investigate this, lets consider an SRS of 28 movies released in the past to guarantee they are no longer in the theaters. The response variable is a movies world revenue (WorldRevenue). The other variables include: movie title (Title), release? time in US (USRelease), Genre, variables related to the rating of the movie (Rating, Ratingl), Production Method, whether the movie is a sequel (Sequel), the movies budget (Budget), opening-weekend revenue (Opening), how many theaters the movie was in for the opening weekend (Theaters), and others. All dollar amounts are measured in millions of U.S. dollars. In your entire analysis, you should ignore variables Profit, “US Re venue, and “Int Revenue, where the latter two add up to world revenue.
You need to build a multiple linear model to explain /predict WorldRevenue, and to write a report to document why and how certain variables play significant role in explaining WorldRevenue.
General instruction:
Your report should be organized in the following order: introduction (describe the problem under your investigation), methodology (describe your data, and all your models), discussion (discuss your findings based on the statistical analysis, and comment on your final model and its interpretation), and finally one paragraph of summary. Please put all supporting tables and/or graphs in the main body of the report, and all R/RStudio codes in the appendix. Please note that your main report should be typed and no more than 5 pages (excluding R codes, Tables, and plots).
Some comments to be considered:
You do not have to use all the variables in the dataset. Your goal is to search for a model that fits the data as best as possible. A final model with 2-4 variables could be sufficient.
Variable “LOpening” is the logarithm of variable “Opening. Should you use both in your regression model? If so, why. If not, which is better for your data analysis?Eit her way, please justify your answers.
Should you use both “Rating and “Rating1 in your model? Why and why not?
Recent Comments