Saturday, 29 June 2019

iOS app -- no cellular access to our domain on some devices

With a particular React Native app, some iPhone users are experiencing an issue where the app can almost never make web requests to our API when connected via cellular data. The domain that is having issues points to an Amazon Elastic Load Balancer, which points to an Nginx reverse proxy. Other APIs (e.g. Mapbox) called by the app work fine over cellular data, including one of ours hosted on a dedicated server, just not those on our ELB domain. When the user switches to WiFi, our app is able to make web requests to that domain. This has been observed on iPhone 7, iPhone 8, and iPhone X, all running iOS 12.3.1. One device is Verizon and the other 4 reported are AT&T. Every API call is HTTPS. Deleting and reinstalling the app and restarting the device does not resolve the issue. We confirmed in all cases that cellular data was enabled for the app in Settings > Cellular > [App name] and in Settings > [App name] > Use Cellular Data.

The app is built with React Native and web requests are performed with the cross-fetch library.

We were able to get a device that has the issue and run it through Xcode. Here is a subset of the error stack captured in Xcode:

nw_connection_copy_connected_local_endpoint [C12] Connection has no local endpoint
2019-06-27 11:26:16.841347-0400 myapp[23700:1527268] [BoringSSL] 
nw_protocol_boringssl_get_output_frames(1301) [C10.1:2][0x117d5a050] get output frames failed, state 8196

2019-06-27 11:26:22.465855-0400 myapp[23700:1527305] [BoringSSL] nw_protocol_boringssl_error(1584) [C20.1:2][0x119b0e420] Lower protocol stack error: 54
2019-06-27 11:26:22.466665-0400 myapp[23700:1527305] TIC TCP Conn Failed [20:0x280022400]: 1:54 Err(54)

2019-06-27 11:26:23.040101-0400 myapp[23700:1527399] Task <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7> HTTP load failed (error code: -1005 [1:54])
2019-06-27 11:26:23.040408-0400 myapp[23700:1527305] Task <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7> finished with error - code: -1005
load failed with error Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=54, NSUnderlyingError=0x283a521f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x28161ab70 [0x1e9e5d420]>{length = 16, capacity = 16, bytes = 0x100201bb3416ca8a0000000000000000}, _kCFStreamErrorCodeKey=54, _kCFStreamErrorDomainKey=1}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7>"
), NSLocalizedDescription=The network connection was lost.

Queries to this particular ELB -> Nginx -> Kubernetes services setup will occasionally work but then stop. It almost indicates a keep-alive situation like this issue. We had the ELB idle timeout set at its default (60s) and we increased it to 300s with no apparent effect. We tried with the keep-alive for Nginx both set to 360s and with it disabled completely.

For the domain in question we have a mix of services hosted in the Kubernetes cluster, such as Java and Node.js. The issue affects all of them equally.

None of the Android app users have reported this issue.

The devices that experience this issue all do so consistently, it is not intermittent.

Due to the type of error, the requests never reach our Nginx logs.



from iOS app -- no cellular access to our domain on some devices

No comments:

Post a Comment