Dataset stock_performance.xlsx contains data on NYSE-listed companies' stock performance, market capitalization, and sector information. Variables are as follows:
- ticker: Stock ticker symbol (e.g., AAPL, MSFT)
- company_name: Full company name
- sector: Business sector (Technology, Healthcare, Finance, Energy, Consumer, Industrial)
- market_cap: Market capitalization in billions of USD
- price_change_1y: One-year percentage price change
- dividend_yield: Annual dividend yield as percentage
- pe_ratio: Price-to-earnings ratio
- volume_avg_30d: Average daily trading volume over 30 days (in millions)
The following questions concern stock_performance.xlsx.
-
Report the mean and median of market_cap, in billions of USD (2 points).
-
Within each sector, compute the mean and median one-year price change percentage. (12 points)
Sector Technology Healthcare Finance Energy Consumer Industrial Mean Price Change ____% ____% ____% ____% ____% ____% Median Price Change ____% ____% ____% ____% ____% ____% -
Identify the top 5 stocks with the highest dividend yield. Report their ticker symbols in descending order of dividend yield. (5 points)
-
Create a scatterplot that shows the relationship between market capitalization and average trading volume. The plot will be graded on the below components. (Total 6 points)
- The plot is a scatterplot. (1 point)
- The scatterplot has volume_avg_30d on the y-axis. (1 point)
- The scatterplot has market_cap on the x-axis. (1 point)
- The y-axis should run from 0 up to and including 200 (millions). (1 point)
- The x-axis should run from 0 up to and including 3000 (billions). (1 point)
- The plot contains a line of best fit with confidence band. (1 point)
Dataset trader_survey.xlsx contains data on a survey of retail investors' trading behavior and demographics. Variables are as follows:
- investor_id: Unique identifier for each investor
- age: Investor's age
- experience_years: Years of trading experience
- portfolio_value: Total portfolio value in thousands of USD
- trades_per_month: Number of trades executed per month
- risk_tolerance: Self-reported risk tolerance (1-5 scale, 5 = highest risk tolerance)
- annual_return: Self-reported annual return percentage
- investment_style: Primary investment approach (Growth, Value, Index, Day_Trading)
The following questions concern trader_survey.xlsx.
- This dataset contains errors. Identify the percentage of observations containing apparent errors in each of the below variables. (3 points)
a. risk_tolerance _____%
b. annual_return _____%
c. trades_per_month _____%
For questions 2-4, replace all apparent errors in the dataset with NAs.
-
Define 'active traders' as those who execute 20 or more trades per month, and 'high-return investors' as those with annual returns > 15%. Report the percentage of high-return investors for each combination of trading activity and investment style, ignoring participants with errors. (8 points)
Investment Style Growth Value Index Day_Trading Active Trader ____% ____% ____% ____% Not Active Trader ____% ____% ____% ____% -
Report the mean and median portfolio values by investment style, ignoring participants with errors in portfolio_value. (8 points)
Investment Style Growth Value Index Day_Trading Mean Portfolio Value $____k $____k $____k $____k Median Portfolio Value $____k $____k $____k $____k -
Create a bar plot displaying the average number of trades per month by investment style. The plot will be graded on the below components. (6 points) a. The plot is a bar plot. (1 point) b. Investment styles are correctly displayed on the x-axis. (1 point) c. Average trades_per_month values are on the y-axis. (1 point) d. The y-axis ranges from 0 up to and including 50. (1 point) e. The y-axis varies in intervals of 10. (1 point) f. Bars represent mean monthly trades by investment style. (1 point)
Dataset corporate_financials.xlsx contains financial metrics for publicly traded companies. Variables are as follows:
- company_id: Unique identifier for companies
- sector: Business sector
- revenue: Annual revenue in millions of USD
- net_income: Annual net income in millions of USD
- total_debt: Total debt in millions of USD
- cash_reserves: Cash and cash equivalents in millions of USD
- employee_count: Total number of employees
Dataset esg_ratings.xlsx contains Environmental, Social, and Governance ratings for the same companies. Variables are as follows:
- company_id: Unique identifier for companies
- environmental_score: Environmental performance score (0-100)
- social_score: Social responsibility score (0-100)
- governance_score: Corporate governance score (0-100)
- overall_esg_score: Combined ESG score (0-100)
The following questions concern both corporate_financials.xlsx and esg_ratings.xlsx.
-
Report the mean profit margin (net_income/revenue * 100) for each sector. (6 points)
Sector Technology Healthcare Finance Energy Consumer Industrial Mean Profit Margin ____% ____% ____% ____% ____% ____% -
Define 'financially healthy' companies as those with a debt-to-cash ratio < 2.0 (total_debt/cash_reserves). Report the mean revenue and employee count by financial health status. (4 points)
Mean Revenue Mean Employee Count Financially Healthy $____M _____ Not Financially Healthy $____M _____ -
Divide companies into terciles by overall ESG score (lowest 33%, middle 34%, highest 33%). Within each tercile, report the mean environmental, social, and governance scores. (9 points)
Mean Environmental Score Mean Social Score Mean Governance Score Lowest ESG Tercile _____ _____ _____ Middle ESG Tercile _____ _____ _____ Highest ESG Tercile _____ _____ _____ -
Create a box plot showing the distribution of revenue-per-employee ratios (revenue/employee_count * 1000) by sector. The plot will be graded on the following components. (6 points) a. The plot is a box and whisker plot. (1 point) b. Sectors are clearly labeled on the x-axis. (1 point) c. Revenue per employee (in thousands) is on the y-axis. (1 point) d. The y-axis spans from 0 to 800 (thousands USD). (1 point) e. The y-axis varies in intervals of 100. (1 point) f. All six sectors are represented. (1 point)
Dataset market_indicators.xlsx contains monthly economic indicators and market performance data. Variables are as follows:
- month_year: Month and year (e.g., "2023-01")
- sp500_return: Monthly S&P 500 return percentage
- unemployment_rate: Monthly unemployment rate percentage
- inflation_rate: Monthly inflation rate percentage
- interest_rate: Federal funds rate percentage
- consumer_confidence: Consumer confidence index (0-200)
- vix_index: Volatility index (fear gauge)
- gdp_growth: Quarterly GDP growth rate (annualized percentage)
The following questions concern market_indicators.xlsx.
-
Report the percentage of missing values in each of the below variables. (3 points)
Variable gdp_growth consumer_confidence vix_index Percent Missing ____% ____% ____% -
Create a histogram of S&P 500 monthly returns. Your plot will be graded on the following attributes. (8 points) a. The plot is a histogram. (1 point) b. Monthly returns are on the x-axis. (1 point) c. The x-axis ranges from -15 to and including 15 (percent). (1 point) d. The x-axis bins vary in intervals of 2.5. (2 points) e. Frequency counts are on the y-axis. (1 point) f. The y-axis ranges from 0 up to and including 25. (1 point) g. The y-axis varies in intervals of 5. (1 point)
-
Calculate the correlation between S&P 500 returns and each economic indicator. Report correlation coefficients for the relationships, ignoring missing values. (8 points)
Economic Indicator Correlation with S&P 500 Returns Unemployment Rate _____ Inflation Rate _____ Interest Rate _____ Consumer Confidence _____ VIX Index _____ -
Classify months as 'High Volatility' if VIX > 25 and 'Low Volatility' otherwise. Report the mean values of S&P 500 returns, unemployment rate, and consumer confidence for each volatility regime, ignoring missing values. (6 points)
Mean S&P 500 Return Mean Unemployment Rate Mean Consumer Confidence High Volatility ____% ____% _____ Low Volatility ____% ____% _____