Need Excel formula to determine the most common 3 digits in position 3-5 of a numeric string?

193 Views Asked by At

I have a column of numeric data and need to determine the most common three digits that appear in positions 3-5 in any order. For example:

99123456
98321889
99456777
98231666
99221457
98997000

Most common would be 2 1 3

Struggling with count (if and sum(if(mid

3

There are 3 best solutions below

4
MGonet On

My proposal of formula is for newer versions of Excel (2021 or 365):

    =LET(data,$B$2:$B$7, s_10,SEQUENCE(10,,0), s_9,SEQUENCE(9,,0),
         fr,FREQUENCY(--MID(data,{3,4,5},1),s_9)-s_10/100,
         MATCH(LARGE(fr,{1;2;3}),fr,0)-1)  

Here data represents the range of data, {3,4,5} - positions of digits taken into account, {1;2;3} - list of ranks.
For a given numbers the most frequent digits are 2, 1, and 3.
If the digits occur as often, the smaller digits will be displayed first.

4
Mayukh Bhattacharya On

Assuming there is no Excel Constraints as per the tags posted, then using the following formula this can be accomplished as well using the incredible MMULT() function.

enter image description here


• Formula used in cell B1

=LET(
     α,TOCOL(MID(A1:A6,{3,4,5},1)+1),
     TAKE(SORT(UNIQUE(HSTACK(α,MMULT(N(α=TOROW(α)),α^0))),2,-1),3,1)-1)

  • Extract digits at positions 3,4,5 using MID() and convert the same into a single column array using TOCOL() this defines the variable α.
  • Comparing each α with its equivalent horizontal array i.e. TOROW(α).
  • Using N() function converting each TRUE & FALSEs returned by the comparison to 1 & 0s so as it can be used to perform a Matrix Multiplication of both the arrays using MMULT() to return SUM of the values.
  • Using HSTACK() to horizontally stack both the arrays α & the one returned using MMULT() matrix.
  • Wrapped within UNIQUE() function to exclude duplicates and SORT() the array by the 2nd index in descending order (-1).
  • Lastly, using TAKE() to get the first 3 sorted ones.

6
David Leal On

This is another try, in cell C1 put the following formula:

=LET(A,A1:A6, nums,TOCOL(MID(A,SEQUENCE(,3,3),1)), uxNums,UNIQUE(nums),
 freq,MMULT(N(uxNums=TOROW(nums)),SEQUENCE(ROWS(nums),,,0)),
 TAKE(SORTBY(uxNums,freq,-1),3))

Here is the output: output

The formula MID(A,SEQUENCE(,3,3),1) (or just MID(A,{3,4,5},1)) splits the digits among columns for each row for 3-5 digit positions. The name nums represents a column array with all selected digits and uxNums the corresponding unique digits. The name freq counts the frequency for each unique digit. We want to sort them by the frequency (freq) so we can use SORTBY for that in descending order and finally take the first three rows via TAKE function.

Note: Instead of using MMULT to calculate freq, you can use: BYROW(N(uxNums=TOROW(nums)),LAMBDA(r,SUM(N(r=1))))

In situations like that, where you have an array (nums) and you want to count values. You cannot use any of the RACON functions since all of them expect a range as input. There is a trick for that combining XMATCH with FREQUENCY that we can use:

=LET(A,A1:A6, nums,TOCOL(MID(A,SEQUENCE(,3,3),1)), uxNums,UNIQUE(nums),
 idx,XMATCH(nums,uxNums), freq,DROP(FREQUENCY(idx,UNIQUE(idx)),-1),
 TAKE(SORTBY(uxNums,freq,-1),3))

For more details, you can check my answer to this question: How to search for the most common sequence of 6 letters in Excel file?. DROP is used to remove the last open bins, it is not necessary in this case, I just keep it for a general case.

From a performance perspective, the MMULT approach is about 50% faster than BYROW and XMATCH/FREQUENCY solutions.