How to efficiently parse a string into string variables of an object

140 Views Asked by At

I have a string I will present in a text box that will vary somewhat but has some format to it. The string is coming from a .pdf file. It will be formatted as below:

1
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9002
2
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9002
3
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9003
4
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9003

I want to pull several items from this text in a loop. To put in perspective, this text is a purchase order from a customer.

I want to pull the line item, the qty, the part number, description, who drew it, and the due date for the line.

My problem is that the method I am using to get the info doesn't seem like the best option out there. How would one go about this in a more efficient manner?

I plan to loop through each line item(I figure there's a way to loop every 7 lines) and place each piece of data into a variable in a object I will create, and place those objects in a list.

So far I am getting some of the information with little effort but I feel I am doing it in a rather messy way. here is what I came up with this morning:

string startString = richTextBox1.Text;
string qty = startString.Substring(6, startString.IndexOf(' '));
int index = startString.IndexOf(' ');
index = startString.IndexOf(' ', index + 1);
string partNumber = startString.Substring(index, 14);
string description = startString.Substring(index+ 25, startString.IndexOf(":"));

once I build this list I will place this data into an excel sheet, which I will figure out later(one problem at a time!)

2

There are 2 best solutions below

8
jdweng On BEST ANSWER

Use code like this

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;


namespace ConsoleApplication52
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = @"
1
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9002
2
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9002
3
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9003
4
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9003
";
            Purchas_Order po = new Purchas_Order(input);
            po.Print(@"c:\temp\test.txt");
        }
    }
    public class Purchas_Order
    {
        public static List<Purchas_Order> purchase_orders { get; set; }

        public int quantity { get; set; }
        public string size { get; set; }
        public string partNumber { get; set; }
        public string description { get; set; }
        public string drawing { get; set; }
        public DateTime due { get; set; }
        public string requester { get; set; }
        public string order { get; set; }
        public int seq_num { get; set; }

        public Purchas_Order() { }
        public Purchas_Order(string orders)
        {
            StringReader reader = new StringReader(orders);
            string line = "";
            int index = 0;

            purchase_orders = new List<Purchas_Order>();
            Purchas_Order newOrder = null;
            string[] splitLine;
            string pattern = @"\s*(?'key'[^:]+):\s*(?'value'[^\s]+)";
            MatchCollection matches;
            int lineNumber = 0;
            while ((line = reader.ReadLine()) != null)
            {
                lineNumber++;
                if (line.Length > 0)
                {
                    try
                    {
                        switch (index % 7)
                        {
                            case 0:
                                newOrder = new Purchas_Order();
                                purchase_orders.Add(newOrder);
                                newOrder.quantity = int.Parse(line);
                                break;

                            case 1:
                                newOrder.size = line.Trim();
                                break;

                            case 2:
                                newOrder.partNumber = line.Trim();
                                break;

                            case 3:
                                newOrder.description = line.Trim();
                                break;

                            case 4:
                                splitLine = line.Split(new char[] { ':' });
                                newOrder.drawing = splitLine[1].Trim();
                                break;

                            case 5:
                                matches = Regex.Matches(line, pattern);
                                newOrder.due = DateTime.Parse(matches[0].Groups["value"].Value);
                                newOrder.requester = matches[1].Groups["value"].Value.Trim();
                                break;

                            case 6:
                                matches = Regex.Matches(line, pattern);
                                newOrder.order = matches[0].Groups["value"].Value.Trim();
                                newOrder.seq_num = int.Parse(matches[1].Groups["value"].Value);
                                break;


                        }
                    }
                    catch(Exception ex)
                    {
                        Console.WriteLine("Line Number = {0}, Index = {1}, Case = {2}, Line = {3}", lineNumber, index, index % 7, line);
                    }
                    index++;
                }
            }
        }
        public void Print(string filename)
        {
            StreamWriter writer = new StreamWriter(filename);
            foreach(Purchas_Order order in Purchas_Order.purchase_orders)
            {
                writer.WriteLine("Quantity : {0}", order.quantity);
                writer.WriteLine("Size : {0}", order.size);
                writer.WriteLine("Part Number : {0}", order.partNumber);
                writer.WriteLine("Description : {0}", order.description);
                writer.WriteLine("Drawing : {0}", order.drawing);
                writer.WriteLine("Due : {0}", order.due);
                writer.WriteLine("Requestor : {0}", order.requester);
                writer.WriteLine("Order : {0}", order.order);
                writer.WriteLine("Sequence Number : {0}", order.seq_num);

                writer.WriteLine("");
            }
            writer.Flush();
            writer.Close();
        }
    }
 
}

Here is the Regex pattern

Regex pattern

s* - One or more spaces

Define Group Key
(?'key'[^:]+)
Matches any characters up to the colon
^ - NOT

: - Match a colon

\s* - One or Mode spaces

Define Group Value
(?'value'[^\s]+)
Matches any characters up to a space
^ - NOT
0
X3R0 On

You could try using Regular Expressions. Test Here

^(?<index>\d)\nEA\n\s(?<quantity>[0-9\.]+)\s(?<partNumber>[0-9A-Z-]+)\s(?<unknown1>[0-9\.]+)\s(?<unknown2>[0-9\.]+)\n(?<partName>[A-Z\s]+)\nDrawing\:\s(?<drawing>[A-Z]+)\nDue\:\s(?<due>[0-9]+\/[0-9]+\/[0-9]+)\sRequester\:\s(?<requester>[A-Z]+)\nOrder\:\s(?<order>[0-9A-Z-]+)\sSeq\sNo\:\s(?<sequence>[0-9]+)$

This matches per record:

  • Index
  • Quantity
  • PartNumber
  • Unkown1
  • Unkown2
  • PartName
  • Drawing
  • Due
  • Requester
  • Order
  • Sequence

you can also wrap this as a record such as:

^(?<entry>(?<index>\d)\nEA\n\s(?<quantity>[0-9\.]+)\s(?<partNumber>[0-9A-Z-]+)\s(?<unknown1>[0-9\.]+)\s(?<unknown2>[0-9\.]+)\n(?<partName>[A-Z\s]+)\nDrawing\:\s(?<drawing>[A-Z]+)\nDue\:\s(?<due>[0-9]+\/[0-9]+\/[0-9]+)\sRequester\:\s(?<requester>[A-Z]+)\nOrder\:\s(?<order>[0-9A-Z-]+)\sSeq\sNo\:\s(?<sequence>[0-9]+))$

This does the same as the above but groups each sub match.

This matches per record:

  • Entry
    • Index
    • Quantity
    • PartNumber
    • Unkown1
    • Unkown2
    • PartName
    • Drawing
    • Due
    • Requester
    • Order
    • Sequence

You can load it like this in C#

can be tested here

using System;
using System.Text.RegularExpressions;
                    
public class Program
{
    public static void Main()
    {
        String data = @"1
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9002
2
EA
 2.00 2814-212-D003 0.00 0.00
LONG JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9002
3
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-213-703 Seq No: 9003
4
EA
 2.00 2814-212-D004 0.00 0.00
SHORT JACK PAD
Drawing: OPT
Due: 05/19/2023 Requester: NMB
Order: 2843HR-214-703 Seq No: 9003";
        String exprGroup = @"(?<entry>(?<index>\d)\nEA\n\s(?<quantity>[0-9\.]+)\s(?<partNumber>[0-9A-Z-]+)\s(?<unknonw1>[0-9\.]+)\s(?<unknonw2>[0-9\.]+)\n(?<partName>[A-Z\s]+)\nDrawing\:\s(?<drawing>[A-Z]+)\nDue\:\s(?<date>[0-9]+\/[0-9]+\/[0-9]+)\sRequester\:\s(?<requester>[A-Z]+)\nOrder\:\s(?<order>[0-9A-Z-]+)\sSeq\sNo\:\s(?<sequence>[0-9]+))";
        String exprRecord = @"(?<index>\d)\nEA\n\s(?<quantity>[0-9\.]+)\s(?<partNumber>[0-9A-Z-]+)\s(?<unknonw1>[0-9\.]+)\s(?<unknonw2>[0-9\.]+)\n(?<partName>[A-Z\s]+)\nDrawing\:\s(?<drawing>[A-Z]+)\nDue\:\s(?<date>[0-9]+\/[0-9]+\/[0-9]+)\sRequester\:\s(?<requester>[A-Z]+)\nOrder\:\s(?<order>[0-9A-Z-]+)\sSeq\sNo\:\s(?<sequence>[0-9]+)";
        MatchCollection mc = Regex.Matches(data, exprGroup);
        
            Console.WriteLine(mc[0]);
        foreach (Match m in mc) {
            MatchCollection mc2 = Regex.Matches(m.Value, exprRecord);
            foreach (Match m2 in mc2) {
                Console.WriteLine(m2.Value);
            }
        }
    }
}