Monday, 14 March 2022

Pythran with XSIMD (Python compiled C++ module PYD) - fails to compile (transposed items fail) - trying to rewrite equivalent code w/o transposes

NOTE: I'm trying to rewrite this function to use XSIMD as I've found speaking with the maintainer, XSIMD is not supported if a function transposes anything inside the loop. So the bounty is for that... Now I've rewritten the function which matches and avoids any transpose operations, but it's far slower than just using OpenMP and turning XSIMD off. Maybe it's because my code is just hacked together.

If any setup files are needed based on your platform (Windows - clang-cl and Intel MKL, Linux - gcc/g++ and OpenBLAS) there are some basic ones at the bottom of this post.

First I'll say what my function does; it simulates potential future paths that assets may take on the way to expiration. Nothing too complicated. When I try to leverage the XSIMD library that Pythran can link to with a pythran -DUSE_XSIMD myfile.py argument, I get a load of errors (they put a patch in that avoids the errors but it turns SIMD operations off, which isn't what I'm looking for):

#pythran export generate_paths(int, int, int, int, float64, float64, float64[:,:] order(C), float64[:,:,:] order(C))
import numpy as np
def generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr):
    
    Simulated_paths = np.zeros((N_assets, sims, Simulated_time_steps))
    #omp parallel for    
    for i in range(Simulated_time_steps):
        
        randnumscorr = randnums3Dcorr[:,:, i]
        
        dt = np.array((timestep, timestep*2, timestep*3))          
        drift_component = np.multiply( drift - 0.5*CurrentVol**2, dt.reshape(dt.shape[0],1))
        random_component = np.multiply(randnumscorr.T , np.multiply(CurrentVol.reshape(CurrentVol.shape[0],) ,  np.sqrt(dt) ))
        Simulated_paths[:,:,i] = np.exp( drift_component.reshape(drift_component.shape[0],1) + random_component.T)
    
    return Simulated_paths

# [NOTE] uncomment the below if you want to run it in Python...
#

# if __name__ == "__main__":
    
#     N_assets = 3
#     sims = 8192
#     Simulated_time_steps = 20
#     timesteps = 20
#     timestep = 1/365
#     drift = 0.0
#     CurrentVol = np.array([0.5,0.4,0.3])
#     CurrentVol = CurrentVol.reshape(CurrentVol.shape[0],1)
#     randnums3Dcorr = np.random.rand(N_assets,sims,timesteps)
    
#     Simulated_paths = generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr)

The main error is:

pythran\pythonic/include/types/numpy_expr.hpp(736,53): error: no type named 'simd_iterator' in '(anonymous namespace)::pythonic::types::numpy_texpr<(anonymous namespace)::pythonic::types::broadcasted<(anonymous namespace)::pythonic::types::numpy_expr<(anonymous namespace)::pythonic::numpy::functor::multiply, (anonymous namespace)::pythonic::types::ndarray<double, (anonymous namespace)::pythonic::types::pshape>, (anonymous namespace)::pythonic::types::numpy_expr<(anonymous namespace)::pythonic::numpy::functor::sqrt, (anonymous namespace)::pythonic::types::ndarray<double, (anonymous namespace)::pythonic::types::pshape<std::integral_constant<long, 3>>>>>>>' typename std::remove_reference<Args>::type::simd_iterator...>;

Just wondering if someone knows how to reformat this so it will compile with XSIMD (and actually speed up the routine). I already filed a bug report with more details on the Pythran project site here: https://github.com/serge-sans-paille/pythran/issues/1973 where if you need to, you can download the .py file, the error log, and the .cpp generated files there. Although I already know the issue - you can't do a transpose of a variable inside the function. Here's my horrible rewrite to avoid transposes that mathematically ties out (note it's terribly slow - I imagine the swapaxes routine is the issue here, but it compiles without any errors)...

#pythran export generate_paths(int, int, int, int, float64, float64, float64[:,:] order(C), float64[:,:,:] order(C))
import numpy as np
def generate_paths(N_assets, sims, Simulated_time_steps, timesteps, timestep, drift, CurrentVol, randnums3Dcorr):
    Simulated_paths = np.zeros((sims, N_assets, Simulated_time_steps))
    randnums3Dcorr = np.swapaxes(randnums3Dcorr,0,2)
    #omp parallel for
    for i in range(Simulated_time_steps):
        
        randnumscorr = randnums3Dcorr[i,:,:]
        
        dt = np.array((timestep, timestep*2, timestep*3))            
        drift_component = np.multiply( drift - 0.5*CurrentVol**2, dt.reshape(dt.shape[0] , 1) )
        drift_component = drift_component.reshape(drift_component.shape[1],drift_component.shape[0])
        random_component = np.multiply(randnumscorr, np.multiply(CurrentVol.reshape(CurrentVol.shape[0],) ,  np.sqrt(dt) ))
        Simulated_paths[:,:,i] = np.exp( drift_component + random_component)
        
    Simulated_paths = np.swapaxes(Simulated_paths,1,0)    
    return Simulated_paths    

Any help is greatly appreciated! Below are setup files for Windows and Linux... If anyone can find a way to rewrite the same algo where SIMD gets applied that would be tremendously helpful. As I'm trying to use OpenMP (parallel for) + SIMD to accelerate the algo... But so far, not successfully.

The pythran-win32.cfg using MKL:

[compiler]
defines=
undefs=
include_dirs='C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib',     'C:\Users\Matt.Slezak\Anaconda3\Library\include'
libs=
library_dirs='C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\compiler\lib\intel64_win'
cflags=/std:c++14 /w -Xclang -fopenmp -march=haswell
ldflags=\libiomp5md.lib
blas=mkl
CC=clang-cl.exe
CXX=clang-cl.exe
ignoreflags=

AND the pythran-linux.cfg using pythran-openblas:

[compiler]
defines=
undefs=
include_dirs=
libs=
library_dirs=
cflags=-std=c++11 -fopenmp -fno-math-errno -fvisibility=hidden -fno-wrapv -Wno-unused-function -Wno-int-in-bool-context -Wno-unknown-warning-option
ldflags=-fvisibility=hidden -Wl,-strip-all -lgomp
blas=pythran-openblas
CC=gcc
CXX=g++
ignoreflags=-Wstrict-prototypes


from Pythran with XSIMD (Python compiled C++ module PYD) - fails to compile (transposed items fail) - trying to rewrite equivalent code w/o transposes

No comments:

Post a Comment