Skip to content

Instantly share code, notes, and snippets.

@v3ss0n
Created October 11, 2024 09:43
Show Gist options
  • Save v3ss0n/fc1ae313881555f2d304547bf71e3bc9 to your computer and use it in GitHub Desktop.
Save v3ss0n/fc1ae313881555f2d304547bf71e3bc9 to your computer and use it in GitHub Desktop.
Data Science Test.
**Machine Learning Task for Interview**
**Task: Predicting Stock Prices**
**Background:**
A financial institution wants to predict the future stock prices of a particular company based on its historical stock prices and other relevant features. The institution has collected a dataset of historical stock prices and other relevant features, including:
* `date`: the date of the stock price
* `open`: the opening stock price
* `high`: the highest stock price
* `low`: the lowest stock price
* `close`: the closing stock price
* `volume`: the trading volume
* `moving_average_50`: the 50-day moving average of the stock price
* `moving_average_200`: the 200-day moving average of the stock price
**Task:**
Your task is to build a machine learning model that can predict the future stock prices of the company based on the given features.
**Evaluation Metrics:**
* Mean Absolute Error (MAE)
* Mean Squared Error (MSE)
* Root Mean Squared Error (RMSE)
* Coefficient of Determination (R2)
**Dataset:**
The dataset consists of 1000 days of historical stock prices and other relevant features. The dataset is split into training and testing sets, with 80% of the data used for training and 20% used for testing.
**Dataset Sample:**
| date | open | high | low | close | volume | moving_average_50 | moving_average_200 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 2020-01-01 | 100.0 | 105.0 | 95.0 | 102.0 | 1000 | 100.0 | 95.0 |
| 2020-01-02 | 102.0 | 110.0 | 98.0 | 105.0 | 1200 | 100.5 | 95.5 |
| 2020-01-03 | 105.0 | 115.0 | 100.0 | 110.0 | 1500 | 101.0 | 96.0 |
| ... | ... | ... | ... | ... | ... | ... | ... |
**Deliverables:**
1. A Python code that loads the dataset, preprocesses the data, and trains a machine learning model to predict the future stock prices.
2. A brief report that explains the chosen machine learning algorithm, hyperparameters, and evaluation metrics.
3. A visualization of the predicted stock prices vs actual stock prices.
**Dataset Generation:**
You can generate a sample dataset using the following Python code:
```python
import pandas as pd
import numpy as np
np.random.seed(42)
# Generate random data
date = pd.date_range('2020-01-01', periods=1000)
open_price = np.random.uniform(90, 110, 1000)
high_price = np.random.uniform(95, 115, 1000)
low_price = np.random.uniform(85, 105, 1000)
close_price = np.random.uniform(90, 110, 1000)
volume = np.random.randint(1000, 2000, 1000)
moving_average_50 = np.random.uniform(90, 110, 1000)
moving_average_200 = np.random.uniform(85, 105, 1000)
# Create a pandas dataframe
df = pd.DataFrame({
'date': date,
'open': open_price,
'high': high_price,
'low': low_price,
'close': close_price,
'volume': volume,
'moving_average_50': moving_average_50,
'moving_average_200': moving_average_200
})
# Save the dataframe to a CSV file
df.to_csv('stock_prices.csv', index=False)
```
**Getting Started:**
1. Load the dataset into a pandas dataframe.
2. Preprocess the data by scaling the features using StandardScaler or MinMaxScaler.
3. Split the data into training and testing sets using a 80-20 split.
4. Train a machine learning model using the training data.
5. Evaluate the model using the testing data and calculate the evaluation metrics.
6. Visualize the predicted stock prices vs actual stock prices using a line plot.
Note that this is just an example task to get you started. You may need to modify the task to suit your specific needs and experiment with different machine learning algorithms and hyperparameters to achieve the best results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment