股价预测-机器学习课程作业(一)简介

发布于: 雪球转发:0回复:0喜欢:0

您将运用数据分析和机器学习方面的技能来预测三个上市公司股票的未来价格。

对于熟悉美国股票的人来说,目标股票是苹果AAPL)、特斯拉TSLA)和英伟达NVDA)。

对于熟悉A股的人来说,目标股是贵州茅台(600519)、獐子岛(002069)和沪深300(000300)。

您将收集历史股票数据,预处理和清理数据,使用数据可视化工具探索数据,使用机器学习算法构建预测模型,并评估模型的性能。

然后,您将使用您的模型来预测这三只股票的未来价格,并使用最新/实时股票数据集对其进行测试。

任务示例: 1. 从公共数据库(例如 Tushare、Wind、Alpha Vantage、Yahoo Finance 或 Google Finance)收集给定公司的历史股票数据(或任何形式的数据,例如可能有助于预测的文本社交媒体), 其中可能包括每日开盘价和收盘价、成交量以及任何其他相关指标。 如有必要,预处理并清理数据以删除任何缺失值或异常值。 2. 使用数据可视化工具探索数据以识别模式或趋势。 3. 使用线性回归、随机森林或神经网络等机器学习算法构建预测模型。 4. 使用均方误差等指标评估模型的性能。 5. 在单独的数据集上测试模型,看看它对新数据的推广效果如何。 6. 使用您的模型预测三只股票的未来价格,并使用最新/实时股票数据集对其进行测试。

标准: 您的项目将根据以下标准进行评估: 分析: 1. 数据收集和预处理:您是否有效地收集和预处理了数据? 2. 数据探索:您是否使用数据可视化工具探索数据以识别模式或趋势? 3. 预测模型:您是否使用机器学习算法构建了有效的预测模型? 4. 模型评估:您是否使用适当的指标评估模型的性能? 5. 泛化:您是否在单独的数据集上测试了模型,以了解它对新数据的泛化效果如何? 6. 股票价格预测:您是否使用您的模型来预测这三只股票的未来价格并使用最新/实时股票数据集进行测试?

性能:11月15日测试的性能(MSE每小时评估一次,即上午9点、10点、11点等)将作为指标之一。 完成项目后,您将提交包含代码、分析和预测的 Jupyter 笔记本。 您的代码应该足够详细且有良好的文档记录,以便其他人可以遵循并在他们的计算机上运行它。 您应该在代码中包含可以获取最新/实时股票数据集的方法,以便可以使用最新数据测试您的模型。 祝你好运!

You will apply your skills in data analytics and machine learning to predict the future price of three publicly traded companies' stocks.

For those familiar with the U.S. stock, the target stocks are Apple (AAPL), Tesla (TSLA), and Nvidia (NVDA).

For those familiar with the A-share, the target stocks are 贵州茅台 (600519), 獐子岛(002069), and 沪深300 (000300).

You will collect historical stock data, preprocess and clean the data, explore the data using data visualization tools, build a predictive model using machine learning algorithms, and evaluate the model's performance.

You will then use your model to predict the future prices of the three stocks and test it with the latest/real-time stock dataset. Tasks Example: 1. Collect historical stock data (or any forms of data, e.g., text social media, that might help prediction) from a public database, such as Tushare, Wind, Alpha Vantage, Yahoo Finance, or Google Finance, for the given companies, which might include daily opening and closing prices, volume, and any other relevant metrics. Preprocess and clean the data to remove any missing values or outliers, if necessary. 2. Explore the data using data visualization tools to identify patterns or trends. 3. Build a predictive model using machine learning algorithms such as linear regression, random forests, or neural networks. 4. Evaluate the performance of the model using metrics such as mean squared error. 5. Test the model on a separate dataset to see how well it generalizes to new data. 6. Use your model to predict the future prices of the three stocks and test it with the latest/real-time stock dataset.

Criteria: Your project will be evaluated based on the following criteria: Analysis: 1. Data collection and preprocessing: Did you collect and preprocess the data effectively? 2. Data exploration: Did you explore the data using data visualization tools to identify patterns or trends? 3. Predictive model: Did you build an effective predictive model using machine learning algorithms? 4. Model evaluation: Did you evaluate the performance of the model using appropriate metrics? 5. Generalization: Did you test the model on a separate dataset to see how well it generalizes to new data? 6. Stock price prediction: Did you use your model to predict the future prices of the three stocks and test it with the latest/real-time stock dataset?

Performance: The performance (evaluated by MSE for every hour, i.e., 9am, 10am, 11am, etc.) tested on 15 Nov will be taken as one of the indicators. After completing the project, you will submit your Jupyter notebook containing your code, analysis, and predictions. Your code should be sufficiently detailed and well-documented so that someone else can follow and run it on their computer. You should include methods in your code that can fetch the latest/real-time stock dataset so that your model can be tested with up-to-date data. Good luck!