Friday, 2 July 2021

MySQLdb._exceptions.OperationalError: (2006, '') when closing a SQL query. Is it because of the connection?

When executing some SQL queries on the user table in my ETL I get an error:

(venv) C:\Users\antoi\Documents\Programming\Work\data-tools>python -m etl.main
2021-06-29 15:34:53.515286 - Connecting to database hozana_data...
2021-06-29 15:34:53.523283 - Connecting to archive database hozana_archive...
2021-06-29 15:34:53.755949 - Start ETL main process
2021-06-29 15:34:53.755949 - `users` table:
2021-06-29 15:34:53.755949 - Hashing column `users`.`email:`table_name:  users
c.execute("SELECT 15+10 FROM users"):  0
 done.
2021-06-29 15:34:53.763899 - Hashing column `users`.`email_notification:`table_name:  users
Traceback (most recent call last):
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\anonymization.py", line 18, in hash_column
    print("c.execute(\"""SELECT 15+10 FROM users\"""): ", c.execute("""SELECT 15+10 FROM users"""))
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\venv\lib\site-packages\MySQLdb\cursors.py", line 183, in execute
    while self.nextset():
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\venv\lib\site-packages\MySQLdb\cursors.py", line 137, in nextset
    nr = db.next_result()
MySQLdb._exceptions.OperationalError: (2006, '')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\antoi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\antoi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\main.py", line 52, in <module>
    main()
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\main.py", line 24, in main
    anonymization.main()
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\anonymization.py", line 61, in main
    hash_column('users', 'email_notification', 'user_id', True)
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\anonymization.py", line 52, in hash_column
    print('.', end='', flush=True)
  File "C:\Users\antoi\Documents\Programming\Work\data-tools\venv\lib\site-packages\MySQLdb\connections.py", line 239, in __exit__
    self.close()
MySQLdb._exceptions.OperationalError: (2006, '')

I don't get it. This error seems to be because of a closed connection or too much data ... But I only have the schema, the database is empty. And even when I do a simple math operation I get this error:

Here is part of the ETL:

from etl.mysql.operations import drop_column_if_exists
from etl.utils.logging import info
from etl.mysql.connect import db, db_name
from etl.utils.array import chunks


def hash_column(table_name, column_name, pk_name, email_mode=False):
    # Executed a query that will overwrite a column with an hashed version of its content.
    # With email mode, the domain name will be kept:
    # aba@stoacj.com -> 379ac32fe8f576c4c63b17cd576e6c40c7dcd[...]b04c7a4695a7baa54ad5ce44528a0b30ab@stoacj.com
    info('Hashing column `{table_name}`.`{column_name}:`'
         .format(table_name=table_name, column_name=column_name), end='', flush=True)

    with db as c:
        # Execute the update in batches, to avoid "Lock wait timeout exceeded"
        c = c.cursor()
        print("table_name: ", table_name)
        print("c.execute(\"""SELECT 15+10 FROM users\"""): ", c.execute("""SELECT 15+10 FROM users"""))
        c.execute("""
            SELECT {pk_name} as row_id
            FROM {table_name} 
            WHERE {column_name} IS NOT NULL AND {column_name} NOT LIKE 'hash_%'
            ORDER BY row_id ASC
        """.format(table_name=table_name, column_name=column_name, pk_name=pk_name))
        ids = []
...

Here is the code in from etl.mysql.connect import db:

import os
import MySQLdb

from etl.utils.logging import info

db_host = os.environ['DATA_DB_HOST']
db_port = int(os.environ['DATA_DB_PORT'])
db_user = os.environ['DATA_DB_USER']
db_password = os.environ['DATA_DB_PASSWORD']
db_name = os.environ['DATA_DB_NAME']
db_name_archive = os.environ['DATA_DB_ARCHIVE_NAME']

info("Connecting to database {}...".format(db_name))
db = MySQLdb.connect(host=db_host,
                     port=db_port,
                     db=db_name,
                     user=db_user,
                     passwd=db_password)

The problem isn't because of the connection, I have it and it's okay.

introducir la descripción de la imagen aquí

I tried to increment max_allowed_packet:

mysql> select @@max_allowed_packet;
+----------------------+
| @@max_allowed_packet |
+----------------------+
|              4194304 |
+----------------------+
1 row in set (0.00 sec)

mysql> set global max_allowed_packet=10485760;
Query OK, 0 rows affected (0.01 sec)

But it is still the same space:

mysql> select @@max_allowed_packet;
+----------------------+
| @@max_allowed_packet |
+----------------------+
|              4194304 |
+----------------------+
1 row in set (0.00 sec)


from MySQLdb._exceptions.OperationalError: (2006, '') when closing a SQL query. Is it because of the connection?

No comments:

Post a Comment