I have two services, a backend and a DB. Backend is written in go, where i use go-pg v9 to manage DB connection; DB is a postgres DB. Those two services runs in two containers in a dedicated docker-network.
The services acts normally usually, but at some point, very not-deterministic and randomly, the backend returns an error error="pg: connection pool timeout", after trying for 30 seconds (the default time) to entabilish a new connection from the pool of DB.
After this error appears, then the backend will always return this error on every action on DB, and the only way I found to get rid of this is to restart it.
I don't have so many connections (about 1 at second, bu parallel no more than 10), so I don't understand why, even if the connection is not available in a certain moment, the connection is no more available after this.
Here my backend configurations and connection to DB:
func ConnectToDB(opts DBConfig) (*DBModel, error) {
pgopts := pg.Options{
User: opts.User,
Addr: opts.Host + ":" + opts.Port,
Password: opts.Password,
Database: opts.Name,
IdleCheckFrequency: 5 * time.Second,
IdleTimeout: 30 * time.Second,
PoolSize: 200,
}
creds := fmt.Sprintf("postgres://%s:%s@%s:%s/%s?sslmode=disable", opts.User, opts.Password, opts.Host, opts.Port, opts.Name)
fmt.Println("DB Connect to:", creds)
for i := 0; i < opts.RetryConnect; i++ {
gdb := pg.Connect(&pgopts)
rdb := &DBModel{gdb, creds}
if err := rdb.Ping(); err != nil {
fmt.Println("Cannot ping db: ", err.Error())
time.Sleep(time.Duration(opts.RetryDelay) * time.Second)
} else {
fmt.Println("Connected to db succesfully")
return rdb, nil
}
}
return nil, errors.MakeDatabaseError("Can't connect to database")
}
All my functions have this sort of implementation in which they use DB connection:
func (s Service) PostMarpossData(_ context.Context, req *models.PostMarpossDataRequest) error {
return s.DbRepository.DB().RunInTransaction(func(tx *pg.Tx) error {
for i := 0; i < len(req.Data); i++ {
err := s.DbRepository.PostMarpossData(tx, req.Data[i])
if err != nil {
return err
}
}
return nil
})
}
func (d DBModel) PostMarpossData(tx *pg.Tx, md models.MarpossData) error {
q := `INSERT INTO marposs_data (id, dim, sid, date, result, batch_name, prop_name, prop_desc, code_desc, code_name, operator,
range, value, check_date, anomaly, created_at, ucl_rng, lcl_rng, ucl_avg, lcl_avg) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT (id, sid) DO UPDATE SET dim = ?, date = ?, result = ?, batch_name = ?, range = ?, value = ?, prop_desc = ?,
code_desc = ?, prop_name = ?, code_name = ?, operator = ?, check_date = ?, anomaly = ?, created_at = ?, ucl_rng = ?, lcl_rng = ?, ucl_avg = ?, lcl_avg = ?`
res, err := tx.Query(&md, q, md.Id, md.Dim, md.Sid, md.Date, md.Result, md.BatchName, md.PropName, md.PropDesc, md.CodeDesc, // insert values
md.CodeName, md.Operator, md.Range, md.Value, md.CheckDate, md.Anomaly, time.Now().UTC(), md.UclRng, md.LclRng, md.UclAvg, md.LclAvg, // insert values
md.Dim, md.Date, md.Result, md.BatchName, md.Range, md.Value, md.PropDesc, md.CodeDesc, // on conflict values
md.PropName, md.CodeName, md.Operator, md.CheckDate, md.Anomaly, time.Now().UTC(), md.UclRng, md.LclRng, md.UclAvg, md.LclAvg) // on conflict values
if err != nil {
return errors.MakeErrBadQuery("error inserting marposs_data: " + err.Error())
}
if res.RowsAffected() != 1 {
return errors.MakeErrBadQuery("error inserting marposs_data")
}
return nil
}
As I said, I don't get why the backend service stucks on the pool timeout and also further and future queries or connections to DB don't work.
Moreover, i check also from pgadmin and DB logs that, while the error was still there and new function calls return that error, in the DB there were only 3 active connections.
Some ideas?
I would expect that after some times, the pool return to be reachable. But this never happens.