I have encountered a very strange bug:
Here is my nodejs function which stream unzips a zip file stored in AWS S3 bucket. Without actually downloading the entire zip file to local file system, it simply extracts the contents in the zip to local file system.
const streamUnzipFromS3ToLocal = async (
objectBucket: string,
objectKey: string,
extractDir: string,
) => {
try {
const data = await S3.getFile(objectBucket, objectKey); // data.Body is a readable stream;
try {
await pipeline(data.Body, unzipper.Extract({ path: extractDir }));
} catch (error) {
logger.error(error);
}
logger.info(`${objectKey} is decompressed to ${extractDir}`);
} catch (error) {
logger.error(`Failed to stream unzip file ${objectKey}: ${error}`);
throw new Error('StreamUnzipFromS3Error');
}
};
Here is the S3.getFile():
import { s3Client } from 'utils/AWS/s3Client';
import { GetObjectCommand} from '@aws-sdk/client-s3';
export const getFile = async (bucket_name, object_key) => {
let res;
const params = {
Bucket: bucket_name,
Key: object_key,
};
try {
const data = await s3Client.send(new GetObjectCommand(params));
res = data;
} catch (err) {
logger.error(`Failed when s3Controller getFile: ${err}`);
}
return res;
};
After it is completed, I get the hash value of the extracted file. i.e., by running: md5sum extracted_files/00001.dcm
Now, I can start the nodejs app with 3 ways:
yarn start: using this method, the extracted file ends up having a differentmd5sumhash value; I tried 3 times, and got 3 different hash values. And as expected, the when trying to parse the file with.dcmtools, it throws an error saying that the fileextracted_files/00001.dcmis corrupted.sudo yarn start;sudo systemctl start myapp.service, wheremyapp.serviceis:
[Unit]
Description=myapp backend api
After=network.target
[Service]
Type=simple
WorkingDirectory=/home/ubuntu/myapp/code/api
Restart=always
RestartSec=1
User=ubuntu
ExecStart=yarn start
[Install]
WantedBy=multi-user.target
which is essentially running yarn start, with User=ubuntu, but with sudo.
With method 2 and 3, the extracted file always has the identical hash value, and of course, parsing was successful with .dcm tools.
yarn start is running this command:
nodemon --exec ts-node --files src/index.ts
Two questions:
what differentiates starting a nodejs application with the above 3 methods, such that method fails but method 2 & 3 works just fine?
what is the difference between
sudo yarn startandsudo systemctl start myapp.service? The later one is essentially runningyarn start, withUser=ubuntu, but withsudo?
ubuntu@ip-123-123-123-123:~/myapp/api$ sudo which node
/usr/bin/node
ubuntu@ip-123-123-123-123:~/myapp/api$ sudo node -v
v10.19.0
ubuntu@ip-123-123-123-123:~/myapp/api$ which node
/home/ubuntu/.nvm/versions/node/v21.2.0/bin/node
ubuntu@ip-123-123-123-123:~/myapp/api$ node -v
v21.2.0
ubuntu@ip-123-123-123-123:~/myapp/api$ yarn -v
1.22.19
ubuntu@ip-123-123-123-123:~/myapp/api$ which yarn
/usr/bin/yarn
ubuntu@ip-123-123-123-123:~/myapp/api$