NOTE: I'm trying to rewrite this Python function using Pythran, a library that compiles .py code into .pyd modules that can be imported into Python, which can bypass things like the GIL (global interpreter lock) and leverage OpenMP and SIMD, as it rewrites Python code into C++ code, and makes all the Python connections for you automatically. Pretty great library honestly. So the bounty is for rewriting a bit of Python code to take advantage of XSIMD + OpenMP... The OpenMP part is easy, the XSIMD is not (for me) - now I haven't worked with SIMD before, so I'm blind to the limits of SIMD and how code must be structured to take advantage of it (SIMD = single instruction/multiple data) The Pythran maintainer noted that transposed matrices can't work with the XSIMD library, but non-transposed ones "might." So I attempted to rewrite the code without transposes but it's around 33% slower, so I can be pretty sure SIMD is not being used. And due to my lack of knowledge of SIMD coding, well, it could very well be my fault.
First I'll say what my function does; it simulates potential future paths that assets may take on the way to expiration. Nothing too complicated. When I try to leverage the XSIMD library that Pythran can link to with a pythran myfile.py argument with my original code and one of my config files, I get many errors. They did patch it to not use XSIMD on any transposed matrices while this question was sitting here, so installing Pythran from source with a python install setup.py from the most recent code won't throw an error: https://github.com/serge-sans-paille/pythran But it won't use XSIMD either. Here's the original code:
#pythran export generate_paths(int, int, int, int, float64, float64, float64[:,:] order(C), float64[:,:,:] order(C))
import numpy as np
def generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr):
Simulated_paths = np.zeros((N_assets, sims, Simulated_time_steps))
#omp parallel for
for i in range(Simulated_time_steps):
randnumscorr = randnums3Dcorr[:,:, i]
dt = np.array((timestep, timestep*2, timestep*3))
drift_component = np.multiply( drift - 0.5*CurrentVol**2, dt.reshape(dt.shape[0],1))
random_component = np.multiply(randnumscorr.T , np.multiply(CurrentVol.reshape(CurrentVol.shape[0],) , np.sqrt(dt) ))
Simulated_paths[:,:,i] = np.exp( drift_component.reshape(drift_component.shape[0],1) + random_component.T)
return Simulated_paths
# [NOTE] uncomment the below if you want to run it in Python...
#
# if __name__ == "__main__":
# N_assets = 3
# sims = 8192
# Simulated_time_steps = 20
# timesteps = 20
# timestep = 1/365
# drift = 0.0
# CurrentVol = np.array([0.5,0.4,0.3])
# CurrentVol = CurrentVol.reshape(CurrentVol.shape[0],1)
# randnums3Dcorr = np.random.rand(N_assets,sims,timesteps)
# Simulated_paths = generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr)
Here's my horrible rewrite to avoid transposes that mathematically ties out to the original, but compiles fine with the latest release of Pythran. It obviously doesn't use XSIMD since it's 33% slower than the original code with the patch to not use XSIMD:
#pythran export generate_paths(int, int, int, int, float64, float64, float64[:,:] order(C), float64[:,:,:] order(C))
import numpy as np
def generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr):
Simulated_paths = np.zeros((sims, N_assets, Simulated_time_steps))
randnums3Dcorr = np.swapaxes(randnums3Dcorr,0,2)
#omp parallel for
for i in range(Simulated_time_steps):
randnumscorr = randnums3Dcorr[i,:,:]
dt = np.array((timestep, timestep*2, timestep*3))
drift_component = np.multiply( drift - 0.5*CurrentVol**2, dt.reshape(dt.shape[0] , 1) )
drift_component = drift_component.reshape(drift_component.shape[1],drift_component.shape[0])
random_component = np.multiply(randnumscorr, np.multiply(CurrentVol.reshape(CurrentVol.shape[0],) , np.sqrt(dt) ))
Simulated_paths[:,:,i] = np.exp( drift_component + random_component)
Simulated_paths = np.swapaxes(Simulated_paths,1,0)
return Simulated_paths
If anyone can find a way to rewrite the same algo where XSIMD gets applied that would be tremendously helpful. As I'm trying to use OpenMP (parallel for) + SIMD to accelerate the algo... But so far, not successfully.
The pythran-win32.cfg using MKL (Anaconda3 + oneAPI Intel MKL):
[compiler]
defines=
undefs=
include_dirs='C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib', 'C:\Users\[username]\Anaconda3\Library\include'
libs=
library_dirs='C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\compiler\lib\intel64_win'
cflags=/std:c++14 /w -DUSE_XSIMD -Xclang -fopenmp -march=haswell
ldflags=\libiomp5md.lib
blas=mkl
CC=clang-cl.exe
CXX=clang-cl.exe
ignoreflags=
The pythran-win32.cfg using the pythran-openblas package (pip installed NumPy):
[compiler]
defines=
undefs=
include_dirs=
libs=
library_dirs='C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib'
cflags=/std:c++14 /w -DUSE_XSIMD -Xclang -fopenmp -march=haswell
ldflags=\libiomp5md.lib
blas=pythran-openblas
CC=clang-cl.exe
CXX=clang-cl.exe
ignoreflags=
AND the pythran-linux.cfg using pythran-openblas:
[compiler]
defines=
undefs=
include_dirs=
libs=
library_dirs=
cflags=-std=c++11 -DUSE_XSIMD -fopenmp -fno-math-errno -fvisibility=hidden -fno-wrapv -Wno-unused-function -Wno-int-in-bool-context -Wno-unknown-warning-option
ldflags=-fvisibility=hidden -Wl,-strip-all -lgomp
blas=pythran-openblas
CC=gcc
CXX=g++
ignoreflags=-Wstrict-prototypes
from Pythran with XSIMD (Python compiled module from .py code) - uses OpenMP but not XSIMD - looking for rewritten code that uses both, example given
No comments:
Post a Comment