How to get a list of numbers out of an awk output in bash

82 Views Asked by At

ok so I am currently in the process of making a very basic bash script to search a jsonl file of mario maker levels with their id's, and then converting the found id's into the actual in game id's using a python script given to me (the database and explanation is in this reddit thread, im searching the courses.jsonl file https://www.reddit.com/r/MarioMaker/comments/wlkwp9/easily_searchable_database_of_super_mario_maker_1/)

with my current code, im able to search, seperate just the id's, then convert 1 id to the correct in game id. however, i am not sure how i would be able to convert all the id's found in the search separately. would it just be a for loop, or would i have to do something more complex? here is the code, any help would be appreciated, as most of my bash knowledge is very basic

#!/bin/sh
id=$(grep $1 /file/path/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}')


python3 /file/path/mmid.py $id

edit: here is the current mmid.py file, its basically the initial one i was given with the small change of just adding the id to an argument on runtime instead of asking for input after running

import struct
import hashlib
import hmac
import sys
idno = int(sys.argv[1])
key = hashlib.md5(b"9f2b4678").digest()
data = struct.pack("<Q", idno)
checksum = hmac.HMAC(key, data, 'md5').digest()
checksum = checksum[3:1:-1].hex().upper()
idstring = str.upper(hex(idno))[2:]
for y in range(8 - len(idstring)):
    idstring = '0' + str(idstring)
code = str(checksum) + '0000' + str(idstring)
print(code)

edit 2: i was asked for an example of the courses.jsonl file, so here is one entry. they all look like this, the important thing im grabbing here is the id, which is the first part

{"id":69558108,"retrieval_date":"2021-06-05T08:04:59.154776Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00069558108-00001?Expires=1625472299&Signature=BqxLU18d~4rV4FeHNN-Sm2dbKqKyLhS5qX2liHPqr3MN9Y0Fgt5GCYCXrseFt-tinj4LVjrtMIWZW4G1ijj4FSHQyXN9OoW3PfyZbSuatl9~aNHLPtCCBfMDm7HmF~Njry1pvFjvmeMTJf4ZlUTrKIJF00gfA~2Mu2K1qjIR4yqSQCyxsxG61J~KG3CQMVJUKnbdGupeHN5-nigDqZ3EbT3-odEmjJsq04wb8K98iNMVHKINg8MZwuWHtEtECGWWqvhAvkdRDj6ZZQxAPW1fwxkLJBY4PSRW28A20T-lcq0CDDeU4yFUzeeE6GqFV0xvwj2IOFrlWVOo3McmE21HqA__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"hello kitty saves xmas by Lario\u2122","creator":{"pid":1745383521,"nnid":"realdealLario","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQMHCaSxE5iEQ2lRGSOgTQXuGIgAAiANMAGEAcgBpAG8AIiEAAAAAAAAAACNiKABCAQPHRBgLM0cQDg8RsCoAACIAUkhQAABlAAAAcgBlAGEAawAAAAAAAAAAAO1hAAAAAAAAAAAAAAAAAAAAAAAAAAQ="},"upload_time":"2021-03-30T22:59:39Z","user_plays":2,"clears":1,"total_attempts":5,"failures":4,"world_record":{"best_time_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":13245,"created_time":"2021-03-30T23:52:40Z","updated_time":"2021-03-30T23:52:40Z"}}

I should clarify, this jsonl file is over 16gb of raw text

3

There are 3 best solutions below

0
Léa Gris On BEST ANSWER

Do it entirely within python:

EDIT: Now with matching course_name

import hashlib
import hmac
import jsonlines
import re
import struct
import sys

course_name = str(sys.argv[1])
idno = -1
reader = jsonlines.Reader(sys.stdin)
for obj in reader:
    if re.search(course_name, obj["course_name"]):
        idno = int(obj["id"])
        break
if idno > -1:
    key = hashlib.md5(b"9f2b4678").digest()
    data = struct.pack("<Q", idno)
    checksum = hmac.HMAC(key, data, "md5").digest()
    checksum = checksum[3:1:-1].hex().upper()
    idstring = str.upper(hex(idno))[2:]
    for y in range(8 - len(idstring)):
        idstring = "0" + str(idstring)
    code = str(checksum) + "0000" + str(idstring)
    print(code)
else:
    print("Cannot match course_name: %s\n" % course_name)

Usage example:

$ ./bin/python3 ./mmid.py kitty < ./courses.jsonl
7AC9000004255F5C
5
Vladyslav On

I think that you can modify the code, like:

#!/bin/bash


ids=$(grep $1 /file/path/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}')


for id in $ids; do
    
    python3 /file/path/mmid.py $id
done
0
Daweo On

Firstly, I must note that working with JSON objects like so

grep $1 /file/path/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}'

is extremely fragile and happens to work due to order inside JSON object, meanwhile according to RFC 8259

An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

therefore following two

{"id":123,"name":"somename"}
{"name":"somename","id":123}

are equivalent but would give different result if you would shove it into

| awk -F ',' '{print $1}' | awk -F ':' '{print $2}'

You might use tools for dealing with JSON like jq to lessen fragility, consider that

echo -e '{"id":123,"name":"somename"}\n{"name":"somename","id":123}' | jq '.id'

gives output

123
123

Now you have one id per line, your mmid.py might be altered to take ids from standard input following way

import struct
import hashlib
import hmac
import sys
for line in sys.stdin:
    idno = int(line)
    key = hashlib.md5(b"9f2b4678").digest()
    data = struct.pack("<Q", idno)
    checksum = hmac.HMAC(key, data, 'md5').digest()
    checksum = checksum[3:1:-1].hex().upper()
    idstring = str.upper(hex(idno))[2:]
    for y in range(8 - len(idstring)):
        idstring = '0' + str(idstring)
    code = str(checksum) + '0000' + str(idstring)
    print(code)

and then you might assemble whole thing as follows, let courses.jsonl content be

{"id":69558108,"retrieval_date":"2021-06-05T08:04:59.154776Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00069558108-00001?Expires=1625472299&Signature=BqxLU18d~4rV4FeHNN-Sm2dbKqKyLhS5qX2liHPqr3MN9Y0Fgt5GCYCXrseFt-tinj4LVjrtMIWZW4G1ijj4FSHQyXN9OoW3PfyZbSuatl9~aNHLPtCCBfMDm7HmF~Njry1pvFjvmeMTJf4ZlUTrKIJF00gfA~2Mu2K1qjIR4yqSQCyxsxG61J~KG3CQMVJUKnbdGupeHN5-nigDqZ3EbT3-odEmjJsq04wb8K98iNMVHKINg8MZwuWHtEtECGWWqvhAvkdRDj6ZZQxAPW1fwxkLJBY4PSRW28A20T-lcq0CDDeU4yFUzeeE6GqFV0xvwj2IOFrlWVOo3McmE21HqA__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"hello kitty saves xmas by Lario\u2122","creator":{"pid":1745383521,"nnid":"realdealLario","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQMHCaSxE5iEQ2lRGSOgTQXuGIgAAiANMAGEAcgBpAG8AIiEAAAAAAAAAACNiKABCAQPHRBgLM0cQDg8RsCoAACIAUkhQAABlAAAAcgBlAGEAawAAAAAAAAAAAO1hAAAAAAAAAAAAAAAAAAAAAAAAAAQ="},"upload_time":"2021-03-30T22:59:39Z","user_plays":2,"clears":1,"total_attempts":5,"failures":4,"world_record":{"best_time_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":13245,"created_time":"2021-03-30T23:52:40Z","updated_time":"2021-03-30T23:52:40Z"}}
{"id":69558108,"retrieval_date":"2021-06-05T08:04:59.154776Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00069558108-00001?Expires=1625472299&Signature=BqxLU18d~4rV4FeHNN-Sm2dbKqKyLhS5qX2liHPqr3MN9Y0Fgt5GCYCXrseFt-tinj4LVjrtMIWZW4G1ijj4FSHQyXN9OoW3PfyZbSuatl9~aNHLPtCCBfMDm7HmF~Njry1pvFjvmeMTJf4ZlUTrKIJF00gfA~2Mu2K1qjIR4yqSQCyxsxG61J~KG3CQMVJUKnbdGupeHN5-nigDqZ3EbT3-odEmjJsq04wb8K98iNMVHKINg8MZwuWHtEtECGWWqvhAvkdRDj6ZZQxAPW1fwxkLJBY4PSRW28A20T-lcq0CDDeU4yFUzeeE6GqFV0xvwj2IOFrlWVOo3McmE21HqA__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"hello kitty saves xmas by Lario\u2122","creator":{"pid":1745383521,"nnid":"realdealLario","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQMHCaSxE5iEQ2lRGSOgTQXuGIgAAiANMAGEAcgBpAG8AIiEAAAAAAAAAACNiKABCAQPHRBgLM0cQDg8RsCoAACIAUkhQAABlAAAAcgBlAGEAawAAAAAAAAAAAO1hAAAAAAAAAAAAAAAAAAAAAAAAAAQ="},"upload_time":"2021-03-30T22:59:39Z","user_plays":2,"clears":1,"total_attempts":5,"failures":4,"world_record":{"best_time_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":13245,"created_time":"2021-03-30T23:52:40Z","updated_time":"2021-03-30T23:52:40Z"}}
{"id":69558108,"retrieval_date":"2021-06-05T08:04:59.154776Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00069558108-00001?Expires=1625472299&Signature=BqxLU18d~4rV4FeHNN-Sm2dbKqKyLhS5qX2liHPqr3MN9Y0Fgt5GCYCXrseFt-tinj4LVjrtMIWZW4G1ijj4FSHQyXN9OoW3PfyZbSuatl9~aNHLPtCCBfMDm7HmF~Njry1pvFjvmeMTJf4ZlUTrKIJF00gfA~2Mu2K1qjIR4yqSQCyxsxG61J~KG3CQMVJUKnbdGupeHN5-nigDqZ3EbT3-odEmjJsq04wb8K98iNMVHKINg8MZwuWHtEtECGWWqvhAvkdRDj6ZZQxAPW1fwxkLJBY4PSRW28A20T-lcq0CDDeU4yFUzeeE6GqFV0xvwj2IOFrlWVOo3McmE21HqA__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"hello kitty saves xmas by Lario\u2122","creator":{"pid":1745383521,"nnid":"realdealLario","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQMHCaSxE5iEQ2lRGSOgTQXuGIgAAiANMAGEAcgBpAG8AIiEAAAAAAAAAACNiKABCAQPHRBgLM0cQDg8RsCoAACIAUkhQAABlAAAAcgBlAGEAawAAAAAAAAAAAO1hAAAAAAAAAAAAAAAAAAAAAAAAAAQ="},"upload_time":"2021-03-30T22:59:39Z","user_plays":2,"clears":1,"total_attempts":5,"failures":4,"world_record":{"best_time_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1747253384,"nnid":"ilovesnowandcold","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAEGpkmkWgRBCggeC0lA/Q2aSlewAAghVLAGUAaQB0AGgAAAAAAAAAAAAAAAsVAAAsApJuRBgnNEcYhhIOaA8AACnQUUhQSwBlAGkAdABoAAAAAAAAAAAAAAAAAJLUAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":13245,"created_time":"2021-03-30T23:52:40Z","updated_time":"2021-03-30T23:52:40Z"}}

and mmid.py content be as shown above, then

jq '.id' courses.jsonl | python3 mmid.py

gives output

7AC9000004255F5C
7AC9000004255F5C
7AC9000004255F5C

Explanation: I added for loop, which body encompasses everything from idno = ... to end and changed idno line so it is compute from line of standard input (sys.stdin) rather than 1st command line argument (sys.argv[1]). Note: I repeated your sample input row to present that it works with many rows.

(tested in jq-1.6 and Python 3.10.12)