今天看到科创板一家涨幅剧烈,所以我下载了所有的科创板来研究,首先是获得科创板列表。上海证劵交易所-科创板 https://star.sse.com.cn/market/stocklist/,我找不到好的爬虫工具,就手动下载了每页,总共24页html,然后用以下程序把它们转换为一个excel文件。程序是从chatGPT上下载下来的。
import os
import pandas as pd
from bs4 import BeautifulSoup
html_folder = "/Users/workmac/documents/work-Stock-20241220/S-China"
output_excel = "11.xlsx"
all_data = []
for i in range(1, 25):
filename = f"{i}.html" # Construct filename
file_path = os.path.join(html_folder, filename)
if os.path.exists(file_path): # Check if file exists
with open(file_path, "r", encoding="utf-8") as f:
soup = BeautifulSoup(f, "html.parser")
table = soup.find("table")
if table:
df = pd.read_html(str(table))[0] # Convert HTML table to DataFrame
df["Source_File"] = filename
all_data.append(df)
if all_data:
final_df = pd.concat(all_data, ignore_index=True)
final_df.to_excel(output_excel, sheet_name="All Data", index=False)
print(f"Combined Excel file saved as: {output_excel}")
else:
print("No tables found in the HTML files!")
这里以作一个备份,以免以后要用。
这是处理完的excel表,直接拿去用。