I'm making AI service using spring boot and Google's vertex AI.

I have this piece of code:

public PredictedResult prediction(MultipartFile file) {
        String inferSpeciesPrompt = "my prompt";
        try (VertexAI vertexAi = new VertexAI("my-project", "asia-northeast3");) {
            GenerationConfig generationConfig =
                    GenerationConfig.newBuilder()
                            .setMaxOutputTokens(2048)
                            .setTemperature(0.1F)
                            .setTopK(15)
                            .setTopP(0.7F)
                            .build();
            GenerativeModel model = new GenerativeModel("gemini-pro-vision", generationConfig, vertexAi);
            List<SafetySetting> safetySettings = Arrays.asList(
                    SafetySetting.newBuilder()
                            .setCategory(HarmCategory.HARM_CATEGORY_HATE_SPEECH)
                            .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
                            .build(),
                    SafetySetting.newBuilder()
                            .setCategory(HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT)
                            .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
                            .build(),
                    SafetySetting.newBuilder()
                            .setCategory(HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT)
                            .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
                            .build(),
                    SafetySetting.newBuilder()
                            .setCategory(HarmCategory.HARM_CATEGORY_HARASSMENT)
                            .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
                            .build()
            );
            List<Content> contents = new ArrayList<>();
            contents.add(Content.newBuilder()
                    .setRole("user")
                    .addParts(PartMaker.fromMimeTypeAndData(file.getContentType(), file.getBytes())) 
                    .addParts(Part.newBuilder().setText(inferSpeciesPrompt))
                    .build());
            GenerateContentResponse generateContentResponse = model.generateContent(contents, safetySettings);
            String text = generateContentResponse.getCandidates(0).getContent().getParts(0).getText().replace("```json", ""); 
            ObjectMapper objectMapper = new ObjectMapper(); 
            PredictedResult predictedResult = objectMapper.readValue(text, PredictedResult.class);
            if (predictedResult.getLivingThings() == "false") {
                throw new NoCreatureException();
            }
            return predictedResult;
        } catch (Exception e) {
            GeminiException exception = new GeminiException("validation", "try another image");
            exception.addValidation("image", e.getMessage());
            throw exception;
        }
    }

For url's request using "gemini", I created a new instance of vertex AI each time to deliver the prediction results. But I think this way of generating instances every time seems inefficient. Is there any way to efficiently utilize the ai model?

The gemini of "vetex ai" is used as a multimodal model, and when I send a specific image, I get the prediction in the json format. And once I send an image and receive the results of the json format, instead of exchanging conversations, the role of ai ends there.

1

There are 1 best solutions below

1
Linda Lawton - DaImTo On

Im not sure i understand your issue.

gemini-pro-vision is not conversational it is single request multimodal image and text.

If you want to hold on a conversation you need to use gemini-pro and drop the images. Even with image you are still sending the full conversation history with each added conversation input from the user.