Cloudwatch Logs AWS

122 Views Asked by At

I am attempting to host a model on AWS Sagemaker, and when I deploy my endpoint, I want to see the errors. To see them, in the last weeks I have been checking CloudWatch logs, but now they are not appearing.

Checked and rechecked and checked again, IAM role, which one its using and the permissions it has. The role im assuming the endpoint is using is the role assigned to the model during the creation of model. Also attempted making the endpoint through CLI but that did not change anything.

Tried creating new models (same artifacts and inference code, just an official model) and using that fresh model for an endpoint. That did not work. Tried giving time in-between trials to make sure that max session was expired. That changed nothing. Tried different region. That did not work. Tried a different model (different artifacts and inference code). That did not work. Not really sure what else my options are at this point.

This is the 2 logging methods in my inference code I used:

import torch
import logging
from accelerate import init_empty_weights
from transformers import pipeline, AutoTokenizer, LlamaForCausalLM, AutoConfig, AutoModelForCausalLM

logging.basicConfig(level=logging.INFO, format='%(asctime)s: %(levelname)s: %(message)s')

def model_fn(model_dir):
    try:
        model_name = "johaanm/grader-public"  # Replace with your actual model name
        save_folder = model_dir  # This should be the path to your quantized model directory

        # Initialize an empty model  with the architecture
        with init_empty_weights():
            empty_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
        empty_model.tie_weights()

        # Load the quantized model weights from save_folder
        quantized_model = AutoModelForCausalLM.from_pretrained(save_folder, torch_dtype=torch.float16, device_map="auto")

        tokenizer = AutoTokenizer.from_pretrained(save_folder)

        return quantized_model, tokenizer
    except Exception as e:
        logging.error(f"Error in model_fn: {e}")
        raise

def strip_prompt_from_output(prompt, output):
    similar logging logic

def input_fn(request_body, request_content_type):
    similar logging logic

def predict_fn(input_data, model_and_tokenizer):
    similar logging logic

def output_fn(prediction_output, response_content_type):
   similar logging logic

import json
import torch
import boto3
import logging
from accelerate import init_empty_weights
from transformers import pipeline, AutoTokenizer, LlamaForCausalLM, AutoConfig, AutoModelForCausalLM

logger = logging.getLogger()
logger.setLevel(logging.INFO)


def model_fn(model_dir):
    try:
        model_name = "johaanm/grader-public"  # Replace with your actual model name
        save_folder = model_dir  # This should be the path to your quantized model directory

        # Initialize an empty model  with the architecture
        with init_empty_weights():
            empty_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
        empty_model.tie_weights()

        # Load the quantized model weights from save_folder
        quantized_model = AutoModelForCausalLM.from_pretrained(save_folder, torch_dtype=torch.float16,
                                                               device_map="auto")

        tokenizer = AutoTokenizer.from_pretrained(save_folder)

        return quantized_model, tokenizer
    except Exception as e:
        logger.info(f"Error in model_fn: {e}")
        raise

def strip_prompt_from_output(prompt, output):
    similar logging logic

def input_fn(request_body, request_content_type):
    similar logging logic


def predict_fn(input_data, model_and_tokenizer):
    similar logging logic

def output_fn(prediction_output, response_content_type):
   similar logging logic

 
1

There are 1 best solutions below

2
Akbari On

If you are not seeing CloudWatch logs for your SageMaker endpoint, there are several things you can check:

IAM Role Permissions: Ensure that the IAM role assigned to your SageMaker endpoint has the necessary permissions to write logs to CloudWatch. The role should have the AmazonCloudWatchFullAccess policy attached.

Check CloudWatch Log Group: Verify that the log group for your endpoint exists in CloudWatch. The log group is usually named /aws/sagemaker/Endpoints/YourEndpointName. If it doesn't exist, it could be an indicator of an issue during endpoint creation.

Logging Setup in Your Code: Inside your inference code, ensure that you have set up logging correctly. You might be using a logging library like boto3 to send logs to CloudWatch. Make sure that the logging configuration is accurate.

Example using boto3:

import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info("Your log message here.")
    # rest of your code