Skip to content

Instantly share code, notes, and snippets.

@recalde
Last active January 25, 2024 14:52
Show Gist options
  • Save recalde/c19b1a5ed758bbb8426dff260d5d97cb to your computer and use it in GitHub Desktop.
Save recalde/c19b1a5ed758bbb8426dff260d5d97cb to your computer and use it in GitHub Desktop.
ChatGPT - Upload CSV to SQL Server

CSV to SQL Python

User write me a python program that will open a very large csv file, and insert the data to a microsoft sql server database. the program should create a temporary table using the columns in the first row of the file, then insert the remaining rows from the file in batches of 1000.

ChatGPT To write a Python program that can handle the task you've described, we will use the pandas library for processing the CSV file, and pyodbc or sqlalchemy for interacting with the Microsoft SQL Server. The program will:

  1. Read the CSV file in chunks.
  2. Connect to the SQL Server database.
  3. Create a temporary table with the same structure as the CSV file.
  4. Insert data in batches of 1000 rows.

First, you need to install the required packages if you haven't already:

pip install pandas pyodbc sqlalchemy

Here's the Python script:

import pandas as pd
import pyodbc
from sqlalchemy import create_engine

# Constants
CSV_FILE_PATH = 'path_to_your_large_csv_file.csv'
SQL_SERVER = 'your_sql_server'
DATABASE = 'your_database'
USERNAME = 'your_username'
PASSWORD = 'your_password'
TABLE_NAME = 'temporary_table_name'
CHUNK_SIZE = 1000

# Create a connection string
conn_str = f'mssql+pyodbc://{USERNAME}:{PASSWORD}@{SQL_SERVER}/{DATABASE}?driver=ODBC+Driver+17+for+SQL+Server'

# Create a SQLAlchemy engine
engine = create_engine(conn_str)

# Read CSV and create temporary table
with pd.read_csv(CSV_FILE_PATH, chunksize=CHUNK_SIZE) as reader:
    for i, chunk in enumerate(reader):
        if i == 0:
            # Use the first chunk to create a table
            chunk.head(0).to_sql(TABLE_NAME, engine, if_exists='replace', index=False)
        
        chunk.to_sql(TABLE_NAME, engine, if_exists='append', index=False)
        print(f"Inserted batch {i+1}")

print("Data insertion complete.")

Before running this script, ensure that:

  1. You have replaced CSV_FILE_PATH, SQL_SERVER, DATABASE, USERNAME, PASSWORD, and TABLE_NAME with your actual file path and database credentials.
  2. The SQL Server driver is installed on your machine (ODBC Driver 17 for SQL Server in this example).
  3. The user specified in USERNAME has the necessary permissions to create and insert data into tables in the specified database.

This script reads the CSV file in chunks, creates a temporary table with the structure of the first row of the CSV file, and inserts the data in batches of 1000 rows. Note that the performance may vary depending on the size of the data and the SQL Server's performance.

User change the script to use windows authentication against the sql server database.

ChatGPT To modify the script to use Windows Authentication for connecting to a SQL Server database, you can adjust the connection string accordingly. With Windows Authentication, you won't need to specify the username and password in the connection string. The modified script will look like this:

import pandas as pd
import urllib
from sqlalchemy import create_engine

# Constants
CSV_FILE_PATH = 'path_to_your_large_csv_file.csv'
SQL_SERVER = 'your_sql_server'
DATABASE = 'your_database'
TABLE_NAME = 'temporary_table_name'
CHUNK_SIZE = 1000

# Create a connection string for Windows Authentication
params = urllib.parse.quote_plus(
    f"DRIVER={{ODBC Driver 17 for SQL Server}};"
    f"SERVER={SQL_SERVER};"
    f"DATABASE={DATABASE};"
    f"Trusted_Connection=yes;"
)
conn_str = f"mssql+pyodbc:///?odbc_connect={params}"

# Create a SQLAlchemy engine
engine = create_engine(conn_str)

# Read CSV and create temporary table
with pd.read_csv(CSV_FILE_PATH, chunksize=CHUNK_SIZE) as reader:
    for i, chunk in enumerate(reader):
        if i == 0:
            # Use the first chunk to create a table
            chunk.head(0).to_sql(TABLE_NAME, engine, if_exists='replace', index=False)
        
        chunk.to_sql(TABLE_NAME, engine, if_exists='append', index=False)
        print(f"Inserted batch {i+1}")

print("Data insertion complete.")

In this version of the script, the connection string is constructed using urllib.parse.quote_plus to properly format it, and Trusted_Connection=yes is used to indicate Windows Authentication. Ensure that the SQL Server is configured to allow Windows Authentication and that the account running the script has the necessary permissions on the database.

As before, replace CSV_FILE_PATH, SQL_SERVER, DATABASE, and TABLE_NAME with your actual file path and database details. Make sure that the ODBC Driver for SQL Server is installed on your machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment