AWS SDK2 java s3 select example - how to get result bytes

1.2k Views Asked by At
I am trying to use aws sdk2 java for s3 select operations but not able to get extract the final data. Looking for an example if someone has implemented it. I got some idea from [this post][1] but not able to figure out how to get and read the full data . 

Fetching specific fields from an S3 document

Basically, equivalent of v1 sdk:
``` InputStream resultInputStream = result.getPayload().getRecordsInputStream(
                            new SelectObjectContentEventVisitor() {
                                @Override
                                public void visit(SelectObjectContentEvent.StatsEvent event)
                                {
                                    System.out.println(
                                            "Received Stats, Bytes Scanned: " + event.getDetails().getBytesScanned()
                                                    +  " Bytes Processed: " + event.getDetails().getBytesProcessed());
                                }

                                /*
                                 * An End Event informs that the request has finished successfully.
                                 */
                                @Override
                                public void visit(SelectObjectContentEvent.EndEvent event)
                                {
                                    isResultComplete.set(true);
                                    System.out.println("Received End Event. Result is complete.");
                                }
                            }
                    );```


///IN AWS SDK2, how do get ResultOutputStream ? 

  ```public byte[] getQueryResults() {
        logger.info("V2 query");

        S3AsyncClient s3Client = null;
        s3Client = S3AsyncClient.builder()
                .region(Region.US_WEST_2)
                .build();


        String fileObjKeyName = "upload/" + filePath;

try{
        logger.info("Filepath: " + fileObjKeyName);

            ListObjectsV2Request listObjects = ListObjectsV2Request
                    .builder()
                    .bucket(Constants.bucketName)
                    .build();
              ......


               InputSerialization inputSerialization = InputSerialization.builder().
                        json(JSONInput.builder().type(JSONType.LINES).build()).build()            
               OutputSerialization outputSerialization = null;
                outputSerialization = OutputSerialization.builder().
                            json(JSONOutput.builder()
                                    .build()

                            ).build();



                SelectObjectContentRequest selectObjectContentRequest = SelectObjectContentRequest.builder()
                        .bucket(Constants.bucketName)
                        .key(partFilename)
                        .expression(query)
                        .expressionType(ExpressionType.SQL)
                        .inputSerialization(inputSerialization)
                        .outputSerialization(outputSerialization)
                        .scanRange(ScanRange.builder().start(0L).end(Constants.limitBytes).build())
                        .build();


                    final DataHandler handler = new DataHandler();

                    CompletableFuture future = s3Client.selectObjectContent(selectObjectContentRequest, handler);

//hold it till we get a end event
                    EndEvent endEvent = (EndEvent) handler.receivedEvents.stream()
                            .filter(e -> e.sdkEventType() == SelectObjectContentEventStream.EventType.END)
                            .findFirst()
                            .orElse(null);```

//Now, from here how do I get the response bytes ? ///////---> ISSUE: How do I get ResultStream bytes ????

                    return <bytes>
    }```

// handler private static class DataHandler implements SelectObjectContentResponseHandler { private SelectObjectContentResponse response; private List receivedEvents = new ArrayList<>(); private Throwable exception;

        @Override
        public void responseReceived(SelectObjectContentResponse response) {
            this.response = response;
        }

        @Override
        public void onEventStream(SdkPublisher<SelectObjectContentEventStream> publisher) {
            publisher.subscribe(receivedEvents::add);
        }

        @Override
        public void exceptionOccurred(Throwable throwable) {
            exception = throwable;
        }

        @Override
        public void complete() {
        }
    }  ```


  [1]: https://stackoverflow.com/questions/67315601/fetching-specific-fields-from-an-s3-document
1

There are 1 best solutions below

0
user2617187 On

i came to your post since I was working on the same issue as to avoid V1.

After hours of searching i ended up with finding the answer at. https://github.com/aws/aws-sdk-java-v2/pull/2943/files

The answer is located at SelectObjectContentIntegrationTest.java File

services/s3/src/it/java/software/amazon/awssdk/services/SelectObjectContentIntegrationTest.java

The way to get the bytes is by using the RecordsEvent class, please note for my use case I used CSV, not sure if this would be different for a different file type.

in the complete method you have access to the receivedEvents. this is where you get the first index to get the filtered returned results and casting it to the RecordsEvent class. then this class provides the payload as bytes

    @Override
    public void complete() {
      
            RecordsEvent records = (RecordsEvent) this.receivedEvents.get(0)
            String result = records.payload().asUtf8String();
            
    

    }