I am using celery/redis and python/flask to process some very long background tasks. I am using heroku to host the project.
For the background tasks (I have 12 in total) I connect to an external MySQL DB (This means there should actually be no difference localhost vs heroku). For this I use sqlalchemy. Here is the connection:
DB_URI = 'mysql+pymysql://USER:PW@SERVER/DB'
engine = create_engine(stats_config.DB_URI, convert_unicode=True, echo_pool=True)
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()
In the task I am connecting to a mysql DB. I am using pymysql here and sqlalchemy.
The tasks are executed one by one.
The issue is that I get slightly different results for some of my tasks on localhost and on heroku. The result difference is really small, f.e. one of my tasks returns on localhost always 30111 and on heroku 29881.
The tasks, which return wrong values are always the same and the wrong values are always the same.
The only difference between localhost and heroku is, that I am using windows on localhost and linux on heroku. (I have red already some threads).
I am already using the same timezone by executing:
db_session.execute("SET SESSION time_zone = 'Europe/Berlin'")
So it cant be a time_zone issue.
Here is an example of a task with different results:
@shared_task(bind=True, name="get_gross_revenue_task")
def get_gross_revenue_task(self, g_start_date, g_end_date, START_TIME_FORM):
db_session.close()
start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')
gross_rev_trans_VK = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
gross_rev_trans_Stripe = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
gross_rev_trans = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()
if gross_rev_trans_VK is None:
gross_rev_trans_VK = 0
if gross_rev_trans_Stripe is None:
gross_rev_trans_Stripe = 0
if gross_rev_trans is None:
gross_rev_trans = 0
print ('gross', gross_rev_trans_VK, gross_rev_trans_Stripe, gross_rev_trans)
total_gross_rev = gross_rev_trans_VK + gross_rev_trans_Stripe + gross_rev_trans
return {'total_rev' : str(total_gross_rev / 100), 'current': 100, 'total': 100, 'statistic': 'get_gross_revenue', 'time_benchmark': (datetime.today() - START_TIME_FORM).total_seconds()}
# Selects gross revenue between selected dates
@app.route('/get-gross-revenue', methods=["POST"])
@basic_auth.required
@check_verified
def get_gross_revenue():
if request.method == "POST":
task = get_gross_revenue_task.apply_async([session['g_start_date'], session['g_end_date'], session['START_TIME_FORM']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_gross_revenue', task_id=task.id)}
Its actually very simple and fast, it finishes in a few seconds.
So what could be the issue here and how to address it?
P.S.: I forgot to mention, that sometimes the values are not wrong. I dont know yet why, I'd say if I execute the task 10 times on heroku, it will return 5 times the correct value 30111 and 5 times the wrong one 29881
EDIT
In the worker logs I noticed some SQL errors:
2013 Lost connection to MySQL
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically
2014 commands out of sync
It seems like everytime these errors appear, the returned values are slightly wrong. The errors sound very serious, but my tasks still return successfully.
EDIT
No thats not it. I had now 2 executions without any SQL errors, but the results were wrong.
EDIT
Also it looks like that ALL tasks either return the correct values or the wrong values. It can never happen that TASK A returns a wrong value and TASK B a correct. If the values are wrong, than all tasks return a wrong value.
A very dirty fix would be to hardcode a variable of one of the tasks, execute it at first and simply check wether the value is correct. If not re-submit the form.
EDIT
I think I can now say where the problem is. It has something to do with the connection to the external MySQL DB. Its either the cache or the isolation_level or the way I use the session (db_session
) or something similar.
Before every task I have tried now different approaches, which all influenced the behaviour on heroku:
#db_session.close()
#db_session.commit()
#db_session.execute('SET TRANSACTION READ ONLY')
For example db_session.execute('SET TRANSACTION READ ONLY')
worked on localhost but shows an error on heroku.
I also tried to alter the connection itself with 'isolation_level="READ UNCOMMITTED'
. I saw some changes on heroku, but it didn't solved it.
Basically I need the correct configuration. For the connection to the MySQL DB I only need to READ/GET the values, I will never INSERT or DELETE or UPDATE something.
I would still like to know why the behaviour is different (localhost VS. heroku).
After some more testing I am certain that the workers should not reuse the same db_session
, also it seems that only tasks, which use db_session
in the query can return wrong results:
For example the query above uses db_session.query(...)
. It has a chance to return a wrong value.
An other task I have uses: Users.query...
instead of db_session.query(Users)
. It seems that it always returns a correct value.
Is there really a difference here? I always thought its simply a different syntax.
from Python mysql sqlalchemy different results on localhost and server (heroku) | Look at last EDIT
No comments:
Post a Comment