messy.cats
contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos and different formatting choices, can be difficult to work with. messy.cats
aims to provide functions that make cleaning categorical data simple and easy.
This introduction will lead you through examples of the functions in use, explain the arguments, and show how to get the most out of these functions.
First lets create some example vectors:
cars_bad = c("teal Mazda RX4", "black Mazda RX4 Wag",
"green Datsun 710", "Hornet 4 Drive",
"green Hornet Sportabout", "Valiant",
"Duster 360", "orange Merc 240D",
"Merc 230", "teal Merc 280",
"Merc 280C", "green Merc 450SE",
"Merc 450SL", "blue Merc 450SLC",
"green Cadillac Fleetwood", "Lincoln Continental",
"Chrysler Imperial", "Fiat 128",
"red Honda Civic", "Toyota Corolla",
"Toyota Corona", "Dodge Challenger",
"red AMC Javelin", "Camaro Z28",
"Pontiac Firebird", "black Fiat X1-9",
"blue Porsche 914-2", "Lotus Europa",
"Ford Pantera L", "black Ferrari Dino",
"black Maserati Bora", "black Volvo 142E")
cars_good = c("Mazda RX4", "Mazda RX4 Wag",
"Datsun 710", "Hornet 4 Drive",
"Hornet Sportabout", "Valiant",
"Duster 360", "Merc 240D",
"Merc 230", "Merc 280",
"Merc 280C", "Merc 450SE",
"Merc 450SL", "Merc 450SLC",
"Cadillac Fleetwood", "Lincoln Continental",
"Chrysler Imperial", "Fiat 128",
"Honda Civic", "Toyota Corolla",
"Toyota Corona", "Dodge Challenger",
"AMC Javelin", "Camaro Z28",
"Pontiac Firebird", "Fiat X1-9",
"Porsche 914-2", "Lotus Europa",
"Ford Pantera L", "Ferrari Dino",
"Maserati Bora", "Volvo 142E")
Suppose you have two lists of cars descriptions, one containing information on the make of the car, and the other containing make and color. Instead of string processing and deleting the color descriptors, which can be a fincky and time consuming process, cat_match() can match the contents of the two lists.
cat_match(cars_bad, cars_good, method = "jw")
#> bad match dists
#> 1 teal Mazda RX4 Mazda RX4 0.2302
#> 2 black Mazda RX4 Wag Mazda RX4 Wag 0.2591
#> 3 green Datsun 710 Datsun 710 0.2417
#> 4 Hornet 4 Drive Hornet 4 Drive 0.0000
#> 5 green Hornet Sportabout Hornet Sportabout 0.1948
#> 6 Valiant Valiant 0.0000
#> 7 Duster 360 Duster 360 0.0000
#> 8 orange Merc 240D Merc 240D 0.2199
#> 9 Merc 230 Merc 230 0.0000
#> 10 teal Merc 280 Merc 280 0.2324
#> 11 Merc 280C Merc 280C 0.0000
#> 12 green Merc 450SE Merc 450SL 0.2532
#> 13 Merc 450SL Merc 450SL 0.0000
#> 14 blue Merc 450SLC Merc 450SLC 0.1799
#> 15 green Cadillac Fleetwood Cadillac Fleetwood 0.2037
#> 16 Lincoln Continental Lincoln Continental 0.0000
#> 17 Chrysler Imperial Chrysler Imperial 0.0000
#> 18 Fiat 128 Fiat 128 0.0000
#> 19 red Honda Civic Honda Civic 0.1798
#> 20 Toyota Corolla Toyota Corolla 0.0000
#> 21 Toyota Corona Toyota Corona 0.0000
#> 22 Dodge Challenger Dodge Challenger 0.0000
#> 23 red AMC Javelin AMC Javelin 0.2101
#> 24 Camaro Z28 Camaro Z28 0.0000
#> 25 Pontiac Firebird Pontiac Firebird 0.0000
#> 26 black Fiat X1-9 Fiat X1-9 0.2259
#> 27 blue Porsche 914-2 Porsche 914-2 0.1952
#> 28 Lotus Europa Lotus Europa 0.0000
#> 29 Ford Pantera L Ford Pantera L 0.0000
#> 30 black Ferrari Dino Ferrari Dino 0.2083
#> 31 black Maserati Bora Maserati Bora 0.2719
#> 32 black Volvo 142E Volvo 142E 0.2250
After making sure that the string distance calculation is not making error with cat_match(), a user can use cat_replace to swap the contents of one list for their closest match in another.
cat_match(cars_bad, cars_good, method = "jw")
#> bad match dists
#> 1 teal Mazda RX4 Mazda RX4 0.2302
#> 2 black Mazda RX4 Wag Mazda RX4 Wag 0.2591
#> 3 green Datsun 710 Datsun 710 0.2417
#> 4 Hornet 4 Drive Hornet 4 Drive 0.0000
#> 5 green Hornet Sportabout Hornet Sportabout 0.1948
#> 6 Valiant Valiant 0.0000
#> 7 Duster 360 Duster 360 0.0000
#> 8 orange Merc 240D Merc 240D 0.2199
#> 9 Merc 230 Merc 230 0.0000
#> 10 teal Merc 280 Merc 280 0.2324
#> 11 Merc 280C Merc 280C 0.0000
#> 12 green Merc 450SE Merc 450SL 0.2532
#> 13 Merc 450SL Merc 450SL 0.0000
#> 14 blue Merc 450SLC Merc 450SLC 0.1799
#> 15 green Cadillac Fleetwood Cadillac Fleetwood 0.2037
#> 16 Lincoln Continental Lincoln Continental 0.0000
#> 17 Chrysler Imperial Chrysler Imperial 0.0000
#> 18 Fiat 128 Fiat 128 0.0000
#> 19 red Honda Civic Honda Civic 0.1798
#> 20 Toyota Corolla Toyota Corolla 0.0000
#> 21 Toyota Corona Toyota Corona 0.0000
#> 22 Dodge Challenger Dodge Challenger 0.0000
#> 23 red AMC Javelin AMC Javelin 0.2101
#> 24 Camaro Z28 Camaro Z28 0.0000
#> 25 Pontiac Firebird Pontiac Firebird 0.0000
#> 26 black Fiat X1-9 Fiat X1-9 0.2259
#> 27 blue Porsche 914-2 Porsche 914-2 0.1952
#> 28 Lotus Europa Lotus Europa 0.0000
#> 29 Ford Pantera L Ford Pantera L 0.0000
#> 30 black Ferrari Dino Ferrari Dino 0.2083
#> 31 black Maserati Bora Maserati Bora 0.2719
#> 32 black Volvo 142E Volvo 142E 0.2250
Alternatively, a user could join together two dataframes that use these lists as id variables with the function cat_join().
bad_cars_df = data.frame(car = cars_bad, state_registration = "CA")
good_cars_df= data.frame(car = cars_good, insur_comp = "All State")
cat_join(bad_cars_df, good_cars_df, by="car", method="jw", join="left")
#> car state_registration insur_comp
#> 1 Mazda RX4 CA All State
#> 2 Mazda RX4 Wag CA All State
#> 3 Datsun 710 CA All State
#> 4 Hornet 4 Drive CA All State
#> 5 Hornet Sportabout CA All State
#> 6 Valiant CA All State
#> 7 Duster 360 CA All State
#> 8 Merc 240D CA All State
#> 9 Merc 230 CA All State
#> 10 Merc 280 CA All State
#> 11 Merc 280C CA All State
#> 12 Merc 450SL CA All State
#> 13 Merc 450SL CA All State
#> 14 Merc 450SLC CA All State
#> 15 Cadillac Fleetwood CA All State
#> 16 Lincoln Continental CA All State
#> 17 Chrysler Imperial CA All State
#> 18 Fiat 128 CA All State
#> 19 Honda Civic CA All State
#> 20 Toyota Corolla CA All State
#> 21 Toyota Corona CA All State
#> 22 Dodge Challenger CA All State
#> 23 AMC Javelin CA All State
#> 24 Camaro Z28 CA All State
#> 25 Pontiac Firebird CA All State
#> 26 Fiat X1-9 CA All State
#> 27 Porsche 914-2 CA All State
#> 28 Lotus Europa CA All State
#> 29 Ford Pantera L CA All State
#> 30 Ferrari Dino CA All State
#> 31 Maserati Bora CA All State
#> 32 Volvo 142E CA All State
These are some of the most basic uses of the core functions in the messy.cats package. Each function mentioned has a plethora of additional arguments that users can utilize in order to fine tune their string distance calculations or increase the ease with which they use the functions.
In this more extensive example we have two datasets of caterpillar data collected over the the summers of 2019-2021 by Wesleyan University researchers. The first: messy_caterpillars
contains information about the average weight and length of caterpillars, and as the name suggests, has messy very caterpillar names.clean_caterpillars
is a dataset containing the species and number of caterpillars you found by researchers and clean caterpillar names.
If you want to calculate how much caterpillar you actually saw in both centimeters and and milligrams, you’ll have to fix those messy names somehow.
# load in messy_caterpillars and clean_caterpillars
data("clean_caterpillars")
data("messy_caterpillars")
head(messy_caterpillars)
str(messy_caterpillars)
head(clean_caterpillars)
str(clean_caterpillars)
To fix these names we can either use cat_replace()
and change the caterpillar name variables and then use a merging function such as the dplyr join functions, or use cat_join()
.
But first, in order to properly configure our string distance arguments, we will first use cat_match()
to explore how the messy and clean caterpillar names match up.
We input the messy and clean vectors—in this case columns of caterpillar names—and specify no other arguments other than to return the distance between each string pair.
cat_match(messy_caterpillars$CaterpillarSpecies,
clean_caterpillars$species,
return_dists = T)
The output shows the clean string with the lowest string distance from each messy string, and the distance between the pair is returned as a third column.
If we arrange by the distance in descending order, we can see the items of the messy vector with the worst matches. We can observe that the worst match is between “Papilio_glaucus” and “Orgyia leucostigma”. Additionally, this is the only incorrect match. This means that if we set a threshold lower than .5, cat_match will return no incorrect matches.
cat_match(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,return_dists = T,method="jaccard") %>% arrange(desc(dists))
cat_match(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,return_dists = T,method="jaccard", threshold = .49) %>% arrange(desc(dists))
#> bad match dists
#> 1 Papilio_glaucus <NA> 0.5000
#> 2 Zale_lunefera Zale lunifera 0.2727
#> 3 Parallelia_bistriarus Parallelia bistriaris 0.2500
#> 4 Alsophila_pomataria Alsophila pometaria 0.2308
#> 5 Itame_postularia Itame pustularia 0.2308
#> 6 Crocidographa_normani Crocigrapha normani 0.2308
#> 7 Zale_lunata Zale lunata 0.2222
#> 8 Orthosia_rubesens Orthosia rubescens 0.2143
#> 9 Pyrefera_hesperidage Pyreferra hesperidago 0.2143
#> 10 Achatia_distincta Achatia distincta 0.2000
#> 11 Phigaliae_titea Phigalea titea 0.2000
#> 12 Lithophane_antennata Lithophane antennata 0.1818
#> 13 Himella_intracta Himella intractata 0.1667
#> 14 Iridopsis_ephyraria Iridopsis ephyraria 0.1667
#> 15 Malacasoma_disstria Malacasoma disstria 0.1667
#> 16 Morrisonia_confusa Morrisonia confusa 0.1667
#> 17 Nola_triquetrana Nola triquetrana 0.1667
#> 18 Amphipyra_pyramadoides Amphipyra pyramidoides 0.1538
#> 19 Lymantria_dispar Lymantria dispar 0.1538
#> 20 Morrisonia_latex Morrisonia latex 0.1538
#> 21 Nematocampa_resistaria Nematocampa resistaria 0.1538
#> 22 Hypagyrtis_unipunctata Hypagyrtis unipunctata 0.1429
#> 23 Melanolophia_canadaria Melanolophia canadaria 0.1429
#> 24 Prochoerodes_linola Prochoerodes lineola 0.1429
#> 25 Orgyia_leucostigma Orgyia leucostigma 0.1333
messy_caterpillars$CaterpillarSpecies = cat_replace(messy_caterpillars$CaterpillarSpecies,clean_caterpillars$species,method="jaccard", threshold = .49)
left_join(clean_caterpillars,messy_caterpillars, by = c("species"="CaterpillarSpecies"))
Alternatively, a user could accomplish this task in one step using cat_join()
.
data("messy_caterpillars")
clean_caterpillars$species
#> [1] "Achatia distincta" "Alsophila pometaria" "Amphipyra pyramidoides"
#> [4] "Crocigrapha normani" "Ennomos subsignaria" "Eutrapela clemataria"
#> [7] "Himella intractata" "Hypagyrtis unipunctata" "Iridopsis ephyraria"
#> [10] "Iridopsis larvaria" "Itame pustularia" "Lithophane antennata"
#> [13] "Lithophane bethunei" "Lymantria dispar" "Malacasoma disstria"
#> [16] "Melanolophia canadaria" "Morrisonia confusa" "Morrisonia latex"
#> [19] "Nematocampa resistaria" "Nola triquetrana" "Orgyia leucostigma"
#> [22] "Orthosia rubescens" "Parallelia bistriaris" "Phigalea titea"
#> [25] "Prochoerodes lineola" "Pyreferra hesperidago" "Zale phoeocapne"
#> [28] "Zale lunata" "Zale lunifera" "Achatia distincta"
#> [31] "Alsophila pometaria" "Amphipyra pyramidoides" "Crocigrapha normani"
#> [34] "Ennomos subsignaria" "Eutrapela clemataria" "Himella intractata"
#> [37] "Hypagyrtis unipunctata" "Iridopsis ephyraria" "Iridopsis larvaria"
#> [40] "Itame pustularia" "Lithophane antennata" "Lithophane bethunei"
#> [43] "Lymantria dispar" "Malacasoma disstria" "Melanolophia canadaria"
#> [46] "Morrisonia confusa" "Morrisonia latex" "Nematocampa resistaria"
#> [49] "Nola triquetrana" "Orgyia leucostigma" "Orthosia rubescens"
#> [52] "Parallelia bistriaris" "Phigalea titea" "Prochoerodes lineola"
#> [55] "Achatia distincta" "Alsophila pometaria" "Amphipyra pyramidoides"
#> [58] "Crocigrapha normani" "Ennomos subsignaria" "Eutrapela clemataria"
#> [61] "Himella intractata" "Hypagyrtis unipunctata" "Iridopsis ephyraria"
#> [64] "Iridopsis larvaria" "Itame pustularia" "Lithophane antennata"
#> [67] "Lithophane bethunei" "Lymantria dispar" "Malacasoma disstria"
#> [70] "Melanolophia canadaria" "Morrisonia confusa" "Morrisonia latex"
#> [73] "Nematocampa resistaria" "Nola triquetrana"
cat_join(messy_df = messy_caterpillars, clean_df = clean_caterpillars, by = c("CaterpillarSpecies", "species"), method="jaccard", threshold = .49,join="left")
#> # A tibble: 62 x 5
#> species `Avg Weight (mg)` `Avg Length (cm)` count year
#> <chr> <dbl> <dbl> <int> <dbl>
#> 1 Achatia distincta 0.809 2.64 24 2021
#> 2 Achatia distincta 0.809 2.64 14 2020
#> 3 Achatia distincta 0.809 2.64 16 2019
#> 4 Alsophila pometaria 2.03 1.73 8 2021
#> 5 Alsophila pometaria 2.03 1.73 18 2020
#> 6 Alsophila pometaria 2.03 1.73 11 2019
#> 7 Amphipyra pyramidoides 0.914 1.76 26 2021
#> 8 Amphipyra pyramidoides 0.914 1.76 26 2020
#> 9 Amphipyra pyramidoides 0.914 1.76 9 2019
#> 10 Himella intractata 1.53 2.54 3 2021
#> # ... with 52 more rows
data("mtcars")
mtcars_colnames_messy = mtcars
colnames(mtcars_colnames_messy)[1:5] = paste0(colnames(mtcars)[1:5], "_17")
colnames(mtcars_colnames_messy)[6:11] = paste0(colnames(mtcars)[6:11], "_2017")
Another messy dataset problem that our package hopes to help solve is row binding two datasets with different columns names. fuzzy_rbind()
allows a user to join columns in dataframes using string distance matching. Any two columns with similar enough names will be bound together, and fuzzy_rbind()
takes similar arguments as the rest of the functions in messy.cats
to allow the user to fine tune their string distance matching.
fuzzy_rbind(df1 = mtcars, df2 = mtcars_colnames_messy, threshold = .5,
method = "jw")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#> 33 110.0 6 160.0 110 3.90 2.620 16.46 0 4 4 4
#> 34 110.0 6 160.0 110 3.90 2.875 17.02 0 4 4 4
#> 35 93.0 4 108.0 93 3.85 2.320 18.61 1 1 1 1
#> 36 110.0 6 258.0 110 3.08 3.215 19.44 1 1 1 1
#> 37 175.0 8 360.0 175 3.15 3.440 17.02 0 2 2 2
#> 38 105.0 6 225.0 105 2.76 3.460 20.22 1 1 1 1
#> 39 245.0 8 360.0 245 3.21 3.570 15.84 0 4 4 4
#> 40 62.0 4 146.7 62 3.69 3.190 20.00 1 2 2 2
#> 41 95.0 4 140.8 95 3.92 3.150 22.90 1 2 2 2
#> 42 123.0 6 167.6 123 3.92 3.440 18.30 1 4 4 4
#> 43 123.0 6 167.6 123 3.92 3.440 18.90 1 4 4 4
#> 44 180.0 8 275.8 180 3.07 4.070 17.40 0 3 3 3
#> 45 180.0 8 275.8 180 3.07 3.730 17.60 0 3 3 3
#> 46 180.0 8 275.8 180 3.07 3.780 18.00 0 3 3 3
#> 47 205.0 8 472.0 205 2.93 5.250 17.98 0 4 4 4
#> 48 215.0 8 460.0 215 3.00 5.424 17.82 0 4 4 4
#> 49 230.0 8 440.0 230 3.23 5.345 17.42 0 4 4 4
#> 50 66.0 4 78.7 66 4.08 2.200 19.47 1 1 1 1
#> 51 52.0 4 75.7 52 4.93 1.615 18.52 1 2 2 2
#> 52 65.0 4 71.1 65 4.22 1.835 19.90 1 1 1 1
#> 53 97.0 4 120.1 97 3.70 2.465 20.01 1 1 1 1
#> 54 150.0 8 318.0 150 2.76 3.520 16.87 0 2 2 2
#> 55 150.0 8 304.0 150 3.15 3.435 17.30 0 2 2 2
#> 56 245.0 8 350.0 245 3.73 3.840 15.41 0 4 4 4
#> 57 175.0 8 400.0 175 3.08 3.845 17.05 0 2 2 2
#> 58 66.0 4 79.0 66 4.08 1.935 18.90 1 1 1 1
#> 59 91.0 4 120.3 91 4.43 2.140 16.70 0 2 2 2
#> 60 113.0 4 95.1 113 3.77 1.513 16.90 1 2 2 2
#> 61 264.0 8 351.0 264 4.22 3.170 14.50 0 4 4 4
#> 62 175.0 6 145.0 175 3.62 2.770 15.50 0 6 6 6
#> 63 335.0 8 301.0 335 3.54 3.570 14.60 0 8 8 8
#> 64 109.0 4 121.0 109 4.11 2.780 18.60 1 2 2 2
fuzzy_rbind(df1 = mtcars, df2 = mtcars_colnames_messy, threshold = .2,
method = "jw")
#> mpg cyl disp drat qsec gear carb
#> 1 21.0 6 160.0 3.90 16.46 4 4
#> 2 21.0 6 160.0 3.90 17.02 4 4
#> 3 22.8 4 108.0 3.85 18.61 4 1
#> 4 21.4 6 258.0 3.08 19.44 3 1
#> 5 18.7 8 360.0 3.15 17.02 3 2
#> 6 18.1 6 225.0 2.76 20.22 3 1
#> 7 14.3 8 360.0 3.21 15.84 3 4
#> 8 24.4 4 146.7 3.69 20.00 4 2
#> 9 22.8 4 140.8 3.92 22.90 4 2
#> 10 19.2 6 167.6 3.92 18.30 4 4
#> 11 17.8 6 167.6 3.92 18.90 4 4
#> 12 16.4 8 275.8 3.07 17.40 3 3
#> 13 17.3 8 275.8 3.07 17.60 3 3
#> 14 15.2 8 275.8 3.07 18.00 3 3
#> 15 10.4 8 472.0 2.93 17.98 3 4
#> 16 10.4 8 460.0 3.00 17.82 3 4
#> 17 14.7 8 440.0 3.23 17.42 3 4
#> 18 32.4 4 78.7 4.08 19.47 4 1
#> 19 30.4 4 75.7 4.93 18.52 4 2
#> 20 33.9 4 71.1 4.22 19.90 4 1
#> 21 21.5 4 120.1 3.70 20.01 3 1
#> 22 15.5 8 318.0 2.76 16.87 3 2
#> 23 15.2 8 304.0 3.15 17.30 3 2
#> 24 13.3 8 350.0 3.73 15.41 3 4
#> 25 19.2 8 400.0 3.08 17.05 3 2
#> 26 27.3 4 79.0 4.08 18.90 4 1
#> 27 26.0 4 120.3 4.43 16.70 5 2
#> 28 30.4 4 95.1 3.77 16.90 5 2
#> 29 15.8 8 351.0 4.22 14.50 5 4
#> 30 19.7 6 145.0 3.62 15.50 5 6
#> 31 15.0 8 301.0 3.54 14.60 5 8
#> 32 21.4 4 121.0 4.11 18.60 4 2
#> 33 21.0 6 160.0 3.90 16.46 4 4
#> 34 21.0 6 160.0 3.90 17.02 4 4
#> 35 22.8 4 108.0 3.85 18.61 4 1
#> 36 21.4 6 258.0 3.08 19.44 3 1
#> 37 18.7 8 360.0 3.15 17.02 3 2
#> 38 18.1 6 225.0 2.76 20.22 3 1
#> 39 14.3 8 360.0 3.21 15.84 3 4
#> 40 24.4 4 146.7 3.69 20.00 4 2
#> 41 22.8 4 140.8 3.92 22.90 4 2
#> 42 19.2 6 167.6 3.92 18.30 4 4
#> 43 17.8 6 167.6 3.92 18.90 4 4
#> 44 16.4 8 275.8 3.07 17.40 3 3
#> 45 17.3 8 275.8 3.07 17.60 3 3
#> 46 15.2 8 275.8 3.07 18.00 3 3
#> 47 10.4 8 472.0 2.93 17.98 3 4
#> 48 10.4 8 460.0 3.00 17.82 3 4
#> 49 14.7 8 440.0 3.23 17.42 3 4
#> 50 32.4 4 78.7 4.08 19.47 4 1
#> 51 30.4 4 75.7 4.93 18.52 4 2
#> 52 33.9 4 71.1 4.22 19.90 4 1
#> 53 21.5 4 120.1 3.70 20.01 3 1
#> 54 15.5 8 318.0 2.76 16.87 3 2
#> 55 15.2 8 304.0 3.15 17.30 3 2
#> 56 13.3 8 350.0 3.73 15.41 3 4
#> 57 19.2 8 400.0 3.08 17.05 3 2
#> 58 27.3 4 79.0 4.08 18.90 4 1
#> 59 26.0 4 120.3 4.43 16.70 5 2
#> 60 30.4 4 95.1 3.77 16.90 5 2
#> 61 15.8 8 351.0 4.22 14.50 5 4
#> 62 19.7 6 145.0 3.62 15.50 5 6
#> 63 15.0 8 301.0 3.54 14.60 5 8
#> 64 21.4 4 121.0 4.11 18.60 4 2
The second fuzzy_rbind()
call results in fewer bound columns because the user asked for a lower threshold.