What is the difference between S3n S3a and S3

Question

I have been through this page related to Amazon S3.

Also, I have read the documentation that states as follows:

S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.

S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: the system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.

S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

Yet, my doubt is, how does a single word in the URL could make such a big difference?

I would drop down a code for better understandability.

val data = sc.textFile("s3n://bucket-name/key")

to

val data = sc.textFile("s3a://bucket-name/key")

Can anyone help me to understand the technical difference underlying this change? Can you suggest any good articles that I can read on this?

ravikiran · Answer 1 · Jul 30, 2019

Your doubt is quite an interesting one.

Yes, the difference between one single word can make a huge difference. To understand this in a better way, you can consider the difference between HTTP and HTTPS.

Similarly, S3a, S3n, and S3 work with different interfaces. Hence, one single word can make a huge difference.

S3a and S3n are an Object-Based overlay on top of Amazon S3, while, on the other hand, S3 is a Block-Based overlay on top of Amazon S3.