LogStash - Issue with sql_last_value and last_run_metadata_path

356 Views Asked by At

I'm new to ELK and I'm currently struggling with some setup - maybe I missed a point. I have set up my Logstash to parse my DB (MySql), and I've got 2 cases:

  1. "Unforeseen maintenance" -> In this case, I would need to stop/start Logstash manually without purging my DB, so when it starts, I need it to track the records that have not been ingested yet.

  2. "Planned maintenance" -> In this case, I make a TRUNCATE on my DB, thus getting back to 0 with my Ids. I thought that by altering the file set in "last_run_metadata_path", Logstash would start using this new value that I give it. Though, it looks like it keeps the old Id in memory. Hence, my new entries are lost and never ingested.

Here is my setup:

input {
  jdbc {
    codec => plain { charset=>"UTF-8" }
    clean_run => false
    jdbc_driver_library => "C:\com.mysql.jdbc_5.1.5.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/talend_logs?characterEncoding=utf8"
    jdbc_user => "root"
    jdbc_password => "************"
    schedule => "* * * * *"
    statement => "SELECT id, pid, ...... FROM flowlogs WHERE id > :sql_last_value order by id" 
    use_column_value => true
    tracking_column => "id"
    last_run_metadata_path => "C:\logstash-8.6.1\config\lastValueOfMySqlForLogs.yml"
  }
}

output {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "myLogs"
    user => "elastic"
    password => "*********"
    cacert => "C:\elasticsearch-8.6.1\config\certs\http_ca.crt"
  }

stdout{}

}

Could you please let me know if I misunderstood something in the behaviour of logstash with the indexes ? Is it even possible to achieve what I'm trying to do with IDs ? (couldn't work with dates unfortunately)

Thank you very much.

I have tried to work with the clean_run parameter. When set to true, it restarts at 0 at every restart, which is normal per the documentation. When I set it to false with the last_run_metadata_path, I was expecting it to restart using the last value set in this file. It's the case, but it keeps on increasing - I'm not able to reset it (I would need it after I truncate data from my table).

1

There are 1 best solutions below

0
Cedric MATHIS On

Nevermind, I think I solved my issue by modifying the setup. I'm no longer based on the id but on the timestamps eventually.

I'm using a date from my DB (moment) that is the date at which I'm inserting the record in the DB. Then, I let logstash set the sql_last_value with the last query execution date.

I just had to handle the timezones in this configuration. So far it looks like it solved both my issues, so I'll close this topic.

Here is the new setup in case it would help anyone.

input {
  jdbc {
    codec => plain { charset=>"UTF-8" }
    clean_run => false
    jdbc_driver_library => "C:\com.mysql.jdbc_5.1.5.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/talend_logs?characterEncoding=utf8"
    jdbc_user => "root"
    jdbc_password => "**********"
    jdbc_default_timezone => "Europe/Luxembourg[dst_enabled_on_overlap:true]"
    schedule => "* * * * *"
    statement => "SELECT id, pid, ... FROM flowlogs WHERE moment > :sql_last_value order by moment ASC" 
    record_last_run => true
    plugin_timezone => "local"
  }
}

output {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "talend-logs-etl"
    user => "elastic"
    password => "***********"
    cacert => "C:\elasticsearch-8.6.1\config\certs\http_ca.crt"
  }

stdout{}

}