image_pdfimage_print

This blog on Trino S3 initially appeared on Medium. It was republished with the author’s credit and consent. 

In this blog, I’ll go over how to use S3 storage on a Pure Storage® FlashBlade® with Trino, the fast distributed SQL query engine for big data.

I deploy Trino using the hive chart and provide a values.yaml file with the following configuration:

This is pointing to my hive-metastore server. See this blog post for more information on setting that up. I then edit the Trino service to switch from ClusterIP to NodePort to facilitate external access.

As usual, I use the helm install command:

On a Linux client with the trino-cli installed, I use the following command to connect to my in Kubernetes running instance, and list the current catalogs available:

I can then select my hive source and check the available tables:

Note that to see the available schemas, you can use:

trino> show schemas from hive;
      Schema      
——————–
default           
information_schema
(2 rows)
Query 20230810_152954_00003_a45jd, FINISHED, 3 nodes
Splits: 68 total, 68 done (100.00%)
0.61 [2 rows, 35B] [3 rows/s, 57B/s]

To see the table schema, you can use:

I can now run various queries on the data set. Please note this is from a very limited lab Kubernetes cluster with low resources and network connectivity, so performance was not the aim: