获得科创板列表

Posted by:

|

On:

|

今天看到科创板一家涨幅剧烈,所以我下载了所有的科创板来研究,首先是获得科创板列表。上海证劵交易所-科创板 https://star.sse.com.cn/market/stocklist/,我找不到好的爬虫工具,就手动下载了每页,总共24页html,然后用以下程序把它们转换为一个excel文件。程序是从chatGPT上下载下来的。

import os
import pandas as pd
from bs4 import BeautifulSoup

html_folder = "/Users/workmac/documents/work-Stock-20241220/S-China"
output_excel = "11.xlsx"

all_data = []

for i in range(1, 25):
  filename = f"{i}.html"  # Construct filename
  file_path = os.path.join(html_folder, filename)

  if os.path.exists(file_path):  # Check if file exists
    with open(file_path, "r", encoding="utf-8") as f:
      soup = BeautifulSoup(f, "html.parser")

      table = soup.find("table")
      if table:
        df = pd.read_html(str(table))[0]  # Convert HTML table to DataFrame
        df["Source_File"] = filename
        all_data.append(df)

if all_data:
  final_df = pd.concat(all_data, ignore_index=True)

  final_df.to_excel(output_excel, sheet_name="All Data", index=False)
  print(f"Combined Excel file saved as: {output_excel}")
else:
  print("No tables found in the HTML files!")

这里以作一个备份,以免以后要用。

这是处理完的excel表,直接拿去用。