Messy.Cats Introduction • messy.cats

messy.cats contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos and different formatting choices, can be difficult to work with. messy.cats aims to provide functions that make cleaning categorical data simple and easy.

This introduction will lead you through examples of the functions in use, explain the arguments, and show how to get the most out of these functions.

First lets create some example vectors:

cars_bad = c("teal Mazda RX4", "black Mazda RX4 Wag",
             "green Datsun 710", "Hornet 4 Drive",
           "green Hornet Sportabout", "Valiant",
           "Duster 360", "orange Merc 240D",
           "Merc 230", "teal Merc 280",
           "Merc 280C", "green Merc 450SE",
           "Merc 450SL", "blue Merc 450SLC",
           "green Cadillac Fleetwood", "Lincoln Continental",
           "Chrysler Imperial", "Fiat 128",
           "red Honda Civic", "Toyota Corolla",
           "Toyota Corona", "Dodge Challenger",
           "red AMC Javelin", "Camaro Z28",
           "Pontiac Firebird", "black Fiat X1-9",
           "blue Porsche 914-2", "Lotus Europa",
           "Ford Pantera L", "black Ferrari Dino",
           "black Maserati Bora", "black Volvo 142E")

cars_good = c("Mazda RX4", "Mazda RX4 Wag",
              "Datsun 710", "Hornet 4 Drive",
           "Hornet Sportabout", "Valiant",
           "Duster 360", "Merc 240D",
           "Merc 230", "Merc 280",
           "Merc 280C", "Merc 450SE",
           "Merc 450SL", "Merc 450SLC",
           "Cadillac Fleetwood", "Lincoln Continental",
           "Chrysler Imperial", "Fiat 128",
           "Honda Civic", "Toyota Corolla",
           "Toyota Corona", "Dodge Challenger",
           "AMC Javelin", "Camaro Z28",
           "Pontiac Firebird", "Fiat X1-9",
           "Porsche 914-2", "Lotus Europa",
           "Ford Pantera L", "Ferrari Dino",
           "Maserati Bora", "Volvo 142E")

Suppose you have two lists of cars descriptions, one containing information on the make of the car, and the other containing make and color. Instead of string processing and deleting the color descriptors, which can be a fincky and time consuming process, cat_match() can match the contents of the two lists.

cat_match(cars_bad, cars_good, method = "jw")
#>                         bad               match  dists
#> 1            teal Mazda RX4           Mazda RX4 0.2302
#> 2       black Mazda RX4 Wag       Mazda RX4 Wag 0.2591
#> 3          green Datsun 710          Datsun 710 0.2417
#> 4            Hornet 4 Drive      Hornet 4 Drive 0.0000
#> 5   green Hornet Sportabout   Hornet Sportabout 0.1948
#> 6                   Valiant             Valiant 0.0000
#> 7                Duster 360          Duster 360 0.0000
#> 8          orange Merc 240D           Merc 240D 0.2199
#> 9                  Merc 230            Merc 230 0.0000
#> 10            teal Merc 280            Merc 280 0.2324
#> 11                Merc 280C           Merc 280C 0.0000
#> 12         green Merc 450SE          Merc 450SL 0.2532
#> 13               Merc 450SL          Merc 450SL 0.0000
#> 14         blue Merc 450SLC         Merc 450SLC 0.1799
#> 15 green Cadillac Fleetwood  Cadillac Fleetwood 0.2037
#> 16      Lincoln Continental Lincoln Continental 0.0000
#> 17        Chrysler Imperial   Chrysler Imperial 0.0000
#> 18                 Fiat 128            Fiat 128 0.0000
#> 19          red Honda Civic         Honda Civic 0.1798
#> 20           Toyota Corolla      Toyota Corolla 0.0000
#> 21            Toyota Corona       Toyota Corona 0.0000
#> 22         Dodge Challenger    Dodge Challenger 0.0000
#> 23          red AMC Javelin         AMC Javelin 0.2101
#> 24               Camaro Z28          Camaro Z28 0.0000
#> 25         Pontiac Firebird    Pontiac Firebird 0.0000
#> 26          black Fiat X1-9           Fiat X1-9 0.2259
#> 27       blue Porsche 914-2       Porsche 914-2 0.1952
#> 28             Lotus Europa        Lotus Europa 0.0000
#> 29           Ford Pantera L      Ford Pantera L 0.0000
#> 30       black Ferrari Dino        Ferrari Dino 0.2083
#> 31      black Maserati Bora       Maserati Bora 0.2719
#> 32         black Volvo 142E          Volvo 142E 0.2250

After making sure that the string distance calculation is not making error with cat_match(), a user can use cat_replace to swap the contents of one list for their closest match in another.

cat_match(cars_bad, cars_good, method = "jw")
#>                         bad               match  dists
#> 1            teal Mazda RX4           Mazda RX4 0.2302
#> 2       black Mazda RX4 Wag       Mazda RX4 Wag 0.2591
#> 3          green Datsun 710          Datsun 710 0.2417
#> 4            Hornet 4 Drive      Hornet 4 Drive 0.0000
#> 5   green Hornet Sportabout   Hornet Sportabout 0.1948
#> 6                   Valiant             Valiant 0.0000
#> 7                Duster 360          Duster 360 0.0000
#> 8          orange Merc 240D           Merc 240D 0.2199
#> 9                  Merc 230            Merc 230 0.0000
#> 10            teal Merc 280            Merc 280 0.2324
#> 11                Merc 280C           Merc 280C 0.0000
#> 12         green Merc 450SE          Merc 450SL 0.2532
#> 13               Merc 450SL          Merc 450SL 0.0000
#> 14         blue Merc 450SLC         Merc 450SLC 0.1799
#> 15 green Cadillac Fleetwood  Cadillac Fleetwood 0.2037
#> 16      Lincoln Continental Lincoln Continental 0.0000
#> 17        Chrysler Imperial   Chrysler Imperial 0.0000
#> 18                 Fiat 128            Fiat 128 0.0000
#> 19          red Honda Civic         Honda Civic 0.1798
#> 20           Toyota Corolla      Toyota Corolla 0.0000
#> 21            Toyota Corona       Toyota Corona 0.0000
#> 22         Dodge Challenger    Dodge Challenger 0.0000
#> 23          red AMC Javelin         AMC Javelin 0.2101
#> 24               Camaro Z28          Camaro Z28 0.0000
#> 25         Pontiac Firebird    Pontiac Firebird 0.0000
#> 26          black Fiat X1-9           Fiat X1-9 0.2259
#> 27       blue Porsche 914-2       Porsche 914-2 0.1952
#> 28             Lotus Europa        Lotus Europa 0.0000
#> 29           Ford Pantera L      Ford Pantera L 0.0000
#> 30       black Ferrari Dino        Ferrari Dino 0.2083
#> 31      black Maserati Bora       Maserati Bora 0.2719
#> 32         black Volvo 142E          Volvo 142E 0.2250

Alternatively, a user could join together two dataframes that use these lists as id variables with the function cat_join().

bad_cars_df = data.frame(car = cars_bad, state_registration = "CA")
good_cars_df= data.frame(car = cars_good, insur_comp = "All State")

cat_join(bad_cars_df, good_cars_df, by="car", method="jw", join="left")
#>                    car state_registration insur_comp
#> 1            Mazda RX4                 CA  All State
#> 2        Mazda RX4 Wag                 CA  All State
#> 3           Datsun 710                 CA  All State
#> 4       Hornet 4 Drive                 CA  All State
#> 5    Hornet Sportabout                 CA  All State
#> 6              Valiant                 CA  All State
#> 7           Duster 360                 CA  All State
#> 8            Merc 240D                 CA  All State
#> 9             Merc 230                 CA  All State
#> 10            Merc 280                 CA  All State
#> 11           Merc 280C                 CA  All State
#> 12          Merc 450SL                 CA  All State
#> 13          Merc 450SL                 CA  All State
#> 14         Merc 450SLC                 CA  All State
#> 15  Cadillac Fleetwood                 CA  All State
#> 16 Lincoln Continental                 CA  All State
#> 17   Chrysler Imperial                 CA  All State
#> 18            Fiat 128                 CA  All State
#> 19         Honda Civic                 CA  All State
#> 20      Toyota Corolla                 CA  All State
#> 21       Toyota Corona                 CA  All State
#> 22    Dodge Challenger                 CA  All State
#> 23         AMC Javelin                 CA  All State
#> 24          Camaro Z28                 CA  All State
#> 25    Pontiac Firebird                 CA  All State
#> 26           Fiat X1-9                 CA  All State
#> 27       Porsche 914-2                 CA  All State
#> 28        Lotus Europa                 CA  All State
#> 29      Ford Pantera L                 CA  All State
#> 30        Ferrari Dino                 CA  All State
#> 31       Maserati Bora                 CA  All State
#> 32          Volvo 142E                 CA  All State

These are some of the most basic uses of the core functions in the messy.cats package. Each function mentioned has a plethora of additional arguments that users can utilize in order to fine tune their string distance calculations or increase the ease with which they use the functions.

In this more extensive example we have two datasets of caterpillar data collected over the the summers of 2019-2021 by Wesleyan University researchers. The first: messy_caterpillars contains information about the average weight and length of caterpillars, and as the name suggests, has messy very caterpillar names.clean_caterpillars is a dataset containing the species and number of caterpillars you found by researchers and clean caterpillar names.

If you want to calculate how much caterpillar you actually saw in both centimeters and and milligrams, you’ll have to fix those messy names somehow.

# load in messy_caterpillars and clean_caterpillars
data("clean_caterpillars")
data("messy_caterpillars")

head(messy_caterpillars)
str(messy_caterpillars)

head(clean_caterpillars)
str(clean_caterpillars)

To fix these names we can either use cat_replace() and change the caterpillar name variables and then use a merging function such as the dplyr join functions, or use cat_join().

But first, in order to properly configure our string distance arguments, we will first use cat_match() to explore how the messy and clean caterpillar names match up.

We input the messy and clean vectors—in this case columns of caterpillar names—and specify no other arguments other than to return the distance between each string pair.

cat_match(messy_caterpillars$CaterpillarSpecies,
          clean_caterpillars$species,
          return_dists = T)

The output shows the clean string with the lowest string distance from each messy string, and the distance between the pair is returned as a third column.

If we arrange by the distance in descending order, we can see the items of the messy vector with the worst matches. We can observe that the worst match is between “Papilio_glaucus” and “Orgyia leucostigma”. Additionally, this is the only incorrect match. This means that if we set a threshold lower than .5, cat_match will return no incorrect matches.

cat_match(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,return_dists = T,method="jaccard") %>% arrange(desc(dists))

cat_match(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,return_dists = T,method="jaccard", threshold = .49) %>% arrange(desc(dists))
#>                       bad                  match  dists
#> 1         Papilio_glaucus                   <NA> 0.5000
#> 2           Zale_lunefera          Zale lunifera 0.2727
#> 3   Parallelia_bistriarus  Parallelia bistriaris 0.2500
#> 4     Alsophila_pomataria    Alsophila pometaria 0.2308
#> 5        Itame_postularia       Itame pustularia 0.2308
#> 6   Crocidographa_normani    Crocigrapha normani 0.2308
#> 7             Zale_lunata            Zale lunata 0.2222
#> 8       Orthosia_rubesens     Orthosia rubescens 0.2143
#> 9    Pyrefera_hesperidage  Pyreferra hesperidago 0.2143
#> 10      Achatia_distincta      Achatia distincta 0.2000
#> 11        Phigaliae_titea         Phigalea titea 0.2000
#> 12   Lithophane_antennata   Lithophane antennata 0.1818
#> 13       Himella_intracta     Himella intractata 0.1667
#> 14    Iridopsis_ephyraria    Iridopsis ephyraria 0.1667
#> 15    Malacasoma_disstria    Malacasoma disstria 0.1667
#> 16     Morrisonia_confusa     Morrisonia confusa 0.1667
#> 17       Nola_triquetrana       Nola triquetrana 0.1667
#> 18 Amphipyra_pyramadoides Amphipyra pyramidoides 0.1538
#> 19       Lymantria_dispar       Lymantria dispar 0.1538
#> 20       Morrisonia_latex       Morrisonia latex 0.1538
#> 21 Nematocampa_resistaria Nematocampa resistaria 0.1538
#> 22 Hypagyrtis_unipunctata Hypagyrtis unipunctata 0.1429
#> 23 Melanolophia_canadaria Melanolophia canadaria 0.1429
#> 24    Prochoerodes_linola   Prochoerodes lineola 0.1429
#> 25     Orgyia_leucostigma     Orgyia leucostigma 0.1333

messy_caterpillars$CaterpillarSpecies = cat_replace(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,method="jaccard", threshold = .49)

left_join(clean_caterpillars,messy_caterpillars, by = c("species"="CaterpillarSpecies"))

Alternatively, a user could accomplish this task in one step using cat_join().

data("messy_caterpillars")
clean_caterpillars$species
#>  [1] "Achatia distincta"      "Alsophila pometaria"    "Amphipyra pyramidoides"
#>  [4] "Crocigrapha normani"    "Ennomos subsignaria"    "Eutrapela clemataria"  
#>  [7] "Himella intractata"     "Hypagyrtis unipunctata" "Iridopsis ephyraria"   
#> [10] "Iridopsis larvaria"     "Itame pustularia"       "Lithophane antennata"  
#> [13] "Lithophane bethunei"    "Lymantria dispar"       "Malacasoma disstria"   
#> [16] "Melanolophia canadaria" "Morrisonia confusa"     "Morrisonia latex"      
#> [19] "Nematocampa resistaria" "Nola triquetrana"       "Orgyia leucostigma"    
#> [22] "Orthosia rubescens"     "Parallelia bistriaris"  "Phigalea titea"        
#> [25] "Prochoerodes lineola"   "Pyreferra hesperidago"  "Zale phoeocapne"       
#> [28] "Zale lunata"            "Zale lunifera"          "Achatia distincta"     
#> [31] "Alsophila pometaria"    "Amphipyra pyramidoides" "Crocigrapha normani"   
#> [34] "Ennomos subsignaria"    "Eutrapela clemataria"   "Himella intractata"    
#> [37] "Hypagyrtis unipunctata" "Iridopsis ephyraria"    "Iridopsis larvaria"    
#> [40] "Itame pustularia"       "Lithophane antennata"   "Lithophane bethunei"   
#> [43] "Lymantria dispar"       "Malacasoma disstria"    "Melanolophia canadaria"
#> [46] "Morrisonia confusa"     "Morrisonia latex"       "Nematocampa resistaria"
#> [49] "Nola triquetrana"       "Orgyia leucostigma"     "Orthosia rubescens"    
#> [52] "Parallelia bistriaris"  "Phigalea titea"         "Prochoerodes lineola"  
#> [55] "Achatia distincta"      "Alsophila pometaria"    "Amphipyra pyramidoides"
#> [58] "Crocigrapha normani"    "Ennomos subsignaria"    "Eutrapela clemataria"  
#> [61] "Himella intractata"     "Hypagyrtis unipunctata" "Iridopsis ephyraria"   
#> [64] "Iridopsis larvaria"     "Itame pustularia"       "Lithophane antennata"  
#> [67] "Lithophane bethunei"    "Lymantria dispar"       "Malacasoma disstria"   
#> [70] "Melanolophia canadaria" "Morrisonia confusa"     "Morrisonia latex"      
#> [73] "Nematocampa resistaria" "Nola triquetrana"
cat_join(messy_df = messy_caterpillars, clean_df = clean_caterpillars, by = c("CaterpillarSpecies", "species"), method="jaccard", threshold = .49,join="left")
#> # A tibble: 62 x 5
#>    species                `Avg Weight (mg)` `Avg Length (cm)` count  year
#>    <chr>                              <dbl>             <dbl> <int> <dbl>
#>  1 Achatia distincta                  0.809              2.64    24  2021
#>  2 Achatia distincta                  0.809              2.64    14  2020
#>  3 Achatia distincta                  0.809              2.64    16  2019
#>  4 Alsophila pometaria                2.03               1.73     8  2021
#>  5 Alsophila pometaria                2.03               1.73    18  2020
#>  6 Alsophila pometaria                2.03               1.73    11  2019
#>  7 Amphipyra pyramidoides             0.914              1.76    26  2021
#>  8 Amphipyra pyramidoides             0.914              1.76    26  2020
#>  9 Amphipyra pyramidoides             0.914              1.76     9  2019
#> 10 Himella intractata                 1.53               2.54     3  2021
#> # ... with 52 more rows

data("mtcars")
mtcars_colnames_messy = mtcars
colnames(mtcars_colnames_messy)[1:5] = paste0(colnames(mtcars)[1:5], "_17")
colnames(mtcars_colnames_messy)[6:11] = paste0(colnames(mtcars)[6:11], "_2017")

Another messy dataset problem that our package hopes to help solve is row binding two datasets with different columns names. fuzzy_rbind() allows a user to join columns in dataframes using string distance matching. Any two columns with similar enough names will be bound together, and fuzzy_rbind() takes similar arguments as the rest of the functions in messy.cats to allow the user to fine tune their string distance matching.

fuzzy_rbind(df1 = mtcars, df2 = mtcars_colnames_messy, threshold = .5, 
            method = "jw")
#>      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2   21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3   22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4   21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 5   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> 6   18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 7   14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> 8   24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 9   22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11  17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 12  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> 13  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 15  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> 16  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> 17  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> 18  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 19  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 20  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 21  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 22  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> 23  15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> 24  13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> 25  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> 26  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 27  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 28  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 29  15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> 30  19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> 31  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> 32  21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
#> 33 110.0   6 160.0 110 3.90 2.620 16.46  0  4    4    4
#> 34 110.0   6 160.0 110 3.90 2.875 17.02  0  4    4    4
#> 35  93.0   4 108.0  93 3.85 2.320 18.61  1  1    1    1
#> 36 110.0   6 258.0 110 3.08 3.215 19.44  1  1    1    1
#> 37 175.0   8 360.0 175 3.15 3.440 17.02  0  2    2    2
#> 38 105.0   6 225.0 105 2.76 3.460 20.22  1  1    1    1
#> 39 245.0   8 360.0 245 3.21 3.570 15.84  0  4    4    4
#> 40  62.0   4 146.7  62 3.69 3.190 20.00  1  2    2    2
#> 41  95.0   4 140.8  95 3.92 3.150 22.90  1  2    2    2
#> 42 123.0   6 167.6 123 3.92 3.440 18.30  1  4    4    4
#> 43 123.0   6 167.6 123 3.92 3.440 18.90  1  4    4    4
#> 44 180.0   8 275.8 180 3.07 4.070 17.40  0  3    3    3
#> 45 180.0   8 275.8 180 3.07 3.730 17.60  0  3    3    3
#> 46 180.0   8 275.8 180 3.07 3.780 18.00  0  3    3    3
#> 47 205.0   8 472.0 205 2.93 5.250 17.98  0  4    4    4
#> 48 215.0   8 460.0 215 3.00 5.424 17.82  0  4    4    4
#> 49 230.0   8 440.0 230 3.23 5.345 17.42  0  4    4    4
#> 50  66.0   4  78.7  66 4.08 2.200 19.47  1  1    1    1
#> 51  52.0   4  75.7  52 4.93 1.615 18.52  1  2    2    2
#> 52  65.0   4  71.1  65 4.22 1.835 19.90  1  1    1    1
#> 53  97.0   4 120.1  97 3.70 2.465 20.01  1  1    1    1
#> 54 150.0   8 318.0 150 2.76 3.520 16.87  0  2    2    2
#> 55 150.0   8 304.0 150 3.15 3.435 17.30  0  2    2    2
#> 56 245.0   8 350.0 245 3.73 3.840 15.41  0  4    4    4
#> 57 175.0   8 400.0 175 3.08 3.845 17.05  0  2    2    2
#> 58  66.0   4  79.0  66 4.08 1.935 18.90  1  1    1    1
#> 59  91.0   4 120.3  91 4.43 2.140 16.70  0  2    2    2
#> 60 113.0   4  95.1 113 3.77 1.513 16.90  1  2    2    2
#> 61 264.0   8 351.0 264 4.22 3.170 14.50  0  4    4    4
#> 62 175.0   6 145.0 175 3.62 2.770 15.50  0  6    6    6
#> 63 335.0   8 301.0 335 3.54 3.570 14.60  0  8    8    8
#> 64 109.0   4 121.0 109 4.11 2.780 18.60  1  2    2    2

fuzzy_rbind(df1 = mtcars, df2 = mtcars_colnames_messy, threshold = .2,
            method = "jw")
#>     mpg cyl  disp drat  qsec gear carb
#> 1  21.0   6 160.0 3.90 16.46    4    4
#> 2  21.0   6 160.0 3.90 17.02    4    4
#> 3  22.8   4 108.0 3.85 18.61    4    1
#> 4  21.4   6 258.0 3.08 19.44    3    1
#> 5  18.7   8 360.0 3.15 17.02    3    2
#> 6  18.1   6 225.0 2.76 20.22    3    1
#> 7  14.3   8 360.0 3.21 15.84    3    4
#> 8  24.4   4 146.7 3.69 20.00    4    2
#> 9  22.8   4 140.8 3.92 22.90    4    2
#> 10 19.2   6 167.6 3.92 18.30    4    4
#> 11 17.8   6 167.6 3.92 18.90    4    4
#> 12 16.4   8 275.8 3.07 17.40    3    3
#> 13 17.3   8 275.8 3.07 17.60    3    3
#> 14 15.2   8 275.8 3.07 18.00    3    3
#> 15 10.4   8 472.0 2.93 17.98    3    4
#> 16 10.4   8 460.0 3.00 17.82    3    4
#> 17 14.7   8 440.0 3.23 17.42    3    4
#> 18 32.4   4  78.7 4.08 19.47    4    1
#> 19 30.4   4  75.7 4.93 18.52    4    2
#> 20 33.9   4  71.1 4.22 19.90    4    1
#> 21 21.5   4 120.1 3.70 20.01    3    1
#> 22 15.5   8 318.0 2.76 16.87    3    2
#> 23 15.2   8 304.0 3.15 17.30    3    2
#> 24 13.3   8 350.0 3.73 15.41    3    4
#> 25 19.2   8 400.0 3.08 17.05    3    2
#> 26 27.3   4  79.0 4.08 18.90    4    1
#> 27 26.0   4 120.3 4.43 16.70    5    2
#> 28 30.4   4  95.1 3.77 16.90    5    2
#> 29 15.8   8 351.0 4.22 14.50    5    4
#> 30 19.7   6 145.0 3.62 15.50    5    6
#> 31 15.0   8 301.0 3.54 14.60    5    8
#> 32 21.4   4 121.0 4.11 18.60    4    2
#> 33 21.0   6 160.0 3.90 16.46    4    4
#> 34 21.0   6 160.0 3.90 17.02    4    4
#> 35 22.8   4 108.0 3.85 18.61    4    1
#> 36 21.4   6 258.0 3.08 19.44    3    1
#> 37 18.7   8 360.0 3.15 17.02    3    2
#> 38 18.1   6 225.0 2.76 20.22    3    1
#> 39 14.3   8 360.0 3.21 15.84    3    4
#> 40 24.4   4 146.7 3.69 20.00    4    2
#> 41 22.8   4 140.8 3.92 22.90    4    2
#> 42 19.2   6 167.6 3.92 18.30    4    4
#> 43 17.8   6 167.6 3.92 18.90    4    4
#> 44 16.4   8 275.8 3.07 17.40    3    3
#> 45 17.3   8 275.8 3.07 17.60    3    3
#> 46 15.2   8 275.8 3.07 18.00    3    3
#> 47 10.4   8 472.0 2.93 17.98    3    4
#> 48 10.4   8 460.0 3.00 17.82    3    4
#> 49 14.7   8 440.0 3.23 17.42    3    4
#> 50 32.4   4  78.7 4.08 19.47    4    1
#> 51 30.4   4  75.7 4.93 18.52    4    2
#> 52 33.9   4  71.1 4.22 19.90    4    1
#> 53 21.5   4 120.1 3.70 20.01    3    1
#> 54 15.5   8 318.0 2.76 16.87    3    2
#> 55 15.2   8 304.0 3.15 17.30    3    2
#> 56 13.3   8 350.0 3.73 15.41    3    4
#> 57 19.2   8 400.0 3.08 17.05    3    2
#> 58 27.3   4  79.0 4.08 18.90    4    1
#> 59 26.0   4 120.3 4.43 16.70    5    2
#> 60 30.4   4  95.1 3.77 16.90    5    2
#> 61 15.8   8 351.0 4.22 14.50    5    4
#> 62 19.7   6 145.0 3.62 15.50    5    6
#> 63 15.0   8 301.0 3.54 14.60    5    8
#> 64 21.4   4 121.0 4.11 18.60    4    2

The second fuzzy_rbind() call results in fewer bound columns because the user asked for a lower threshold.