Edge retention slicing cardboard (15 dps, x-coarse DMT)

Cliff Stamp · #1

Knives, steels :

-$1 kitchen knife, 3Cr13/420, (very soft)
-Lum Chinese, VG-10, 59/60 HRC (Spyderco)
-k390, OTK 63/64 HRC (Peter's)
-s45,Kyle Bettleyon, M4 63 HRC

All knives have edge bevels of 6-8 dps, 15 dps micro-bevels with an x-coarse DMT. They were used to cut cardboard at a given speed/force and the sharpness measured by cutting light thread under a given load. I picked the x-coarse DMT because I had an idea that if the finish was very coarse then the apex would be thicker (you can see this in Verhoeven's work) and this might stabilize the carbides in the high carbide steels and increase the relative performance. However at the same time the coarse finish might lower the chip resistance that much that they chip out even though the apex is thicker. Hence do an experiment to find out which factor is larger and the resulting relative performance.

At the same time I wanted to check an idea I had for some time that the influence of cardboard itself is far greater than the knife steel. I normally random sample to prevent any bias from one knife seeing a different sample of cardboard than another. This time I didn't do that and so each knife on a particular run cut very different cardboard than another. Now to be clear it was all 1/8" ridged stock, all cut across the ridges. It was just different boxes used for each blade. I had seen from past results that cardboard can be extremely variable and easily 10:1 differences can be seen but I had never actually done a full run to measure its effect.

I tried a few charts to see the best way to look at the data, I think the stacked column shows two things clearly :

-the total edge retention of all runs added up

-each individual run (is color coded)

If you look at the results of one run with each (the blue one), it is almost the opposite of what you would expect from the steel as the 420J2 has the highest performance. However on the second run the high carbide blades catch up significantly, but on the third run some of the do but others fall back again.

In short, it shows clearly that even if you constrain everything very closely, the difference in cardboard -even of almost the same type visually- is so large than even looking at steels like 420 vs M4 you are not guaranteed to see consistent performance in the steel. All you do see, in regards to a difference, is which steel had the easiest cardboard to cut as the cardboard is making a much larger difference than the steel.

Now if you know a little statistics then you could wonder since this bias is just random it should normalize out, however I did a few calculations and since the cardboard variation is so high it looked like I would need at least 10 runs to even be able to say with confidence that there was a difference in the steels and even with ten runs I would not be guaranteed to see the actual correct steel influence.

I then did some monte carlo simulations to verify it and that was indeed the case. A monte carlo simulation is when you actually generate multiple data sets and look at how they compare with each other. For those curious the baselines are (1, 2, 2.5, 3 - this comes from other work on hemp/cardboard) and this is then each run modified by a random number from 1 to 10 representing the random nature of the cardboard. Here are three such simulated experiments :

+

As predicted, even with 10 runs the most you would be able to conclude with confidence is that it isn't likely the steels have the same performance and that you might see a significant difference between something like 420 vs M2. However 10 runs isn't enough to determine the difference between something like VG-10 vs k390 vs M4.

Now if you are a a little curious about statistics you might ask how big of a set you need and you can estimate it with a little calculations (a sum increases ~N, the error only increases ~root(n), this the larger the sample the smaller the percent error). It looked like 50 samples would in generate show the difference and again a MC simulation shows it to be likely :

In short, if you are trying to do edge retention comparisons, and you don't heavily control the material cut, then it will take a LOT of work for the biases to randomize out and reveal the nature of the steel. If you are really curious, if I wanted to actually generate that chart on the bottom with physical data I would have to cut ~500 km of cardboard. I am not likely to do that, but I will at least do 5 runs and maybe 10.

If you think about this a little it should be obvious why we can also see huge differences in how different people experience steels. It is very likely that since people don't control materials in normal cutting, often what is seen as conclusions are just which steels tended to have the best luck in getting easy to cut cardboard. The unfortunate thing is that once that conclusion is formed it becomes self-reinforcing due to things like cognitive dissonance which causes conclusion bias. Hence the importance of things like at least partial blinding.

As a side note, if I had to pick a knife to use for this work it would be the OTK because the handle is the most comfortable in a hammer grip which is what is used for this work. Followed by a close second with the s45, a distant third with the Lum and I would never use the kitchen knife as it is too long and awkward and floppy on the stiffer cardboards.

Bodog · #2

So what you're saying is that everything is relative, one or two tests on a given steel are unlikely to prove anything unless the cut material is so similar as to be unrealistic, that people will believe whatever new information arises as long as it reinforces their own currently held beliefs, and that once something is published and it reinforces someone's beliefs, it's extremely hard to suppress that information as inaccurate or otherwise faulty. Sounds about right.

paladin · #3

Cliff Stamp wrote:Knives, steels :

-$1 kitchen knife, 3Cr13/420, (very soft)
-Lum Chinese, VG-10, 59/60 HRC (Spyderco)
-k390, OTK 63/64 HRC (Peter's)
-s45,Kyle Bettleyon, M4 63 HRC

Wow thanks for all your hard work...

So, I see you proved VG10 is the best since its bar is the tallest...

Sorry I didn't have time to read through all the rest of the writing... but they're getting ready to introduce "Best Musical Score for a Documentary About Previously Undiscovered Amazonian Tribes" on the Golden Globes and I don't want to miss who wins it.

Cliff Stamp · #4

Bodog wrote:... one or two tests on a given steel are unlikely to prove anything unless the cut material is so similar as to be unrealistic

Yes. I was discussing this with a friend of mine who was asking about some steels and I noted that the performance was about 2:1 on cardboard or hemp with the caveat that you had to very carefully make sure the materials were the same. He then asked well what if you didn't? And that was one of the things which prompted me to change the way I did this comparison to put some real numbers into a hypothetical.

... that people will believe whatever new information arises as long as it reinforces their own currently held beliefs, and that once something is published and it reinforces someone's beliefs, it's extremely hard to suppress that information as inaccurate or otherwise faulty.

Yes, it is called conclusion bias.

This is evident really strongly in one very common example. For example it is pretty easy to find a post like this :

"Hey, I just got a new knife in Y45M75, however after I sharpened it and did very little work the knife went blunt very quickly and the edge had visible damage. What's up with that?"

If y45M45 is thought to be a "good" steel the responses are typically :

-you didn't sharpen it right
-was the material dirty
-did you cut on a plate or similar

These are trying to explain away the negative performance to support the conclusion. However if it is thought to be a "bad" steel then the response is likely to be :

-its a cheap steel newb

Note that the exact same experiment is interpreted in two different ways based on the preconceived conclusion bias. This is why even single blinds are very useful.

Cliff Stamp · #5

paladin wrote:...

S... but they're getting ready to introduce "Best Musical Score for a Documentary About Previously Undiscovered Amazonian Tribes" on the Golden Globes and I don't want to miss who wins it.

Justin Bieber.

tvenuto · #6

Awesome post, the stacked graphs really drive home the point.

senorsquare · #7

Some of our discussions about which steels are awesomer remind me of this scene from "This Is Spinal Tap" with guitarist Nigel Tufnel discussing his amplifier setup with interviewer Marty DiBergi:

Nigel Tufnel: The numbers all go to eleven. Look, right across the board, eleven, eleven, eleven and...
Marty DiBergi: Oh, I see. And most amps go up to ten?
Nigel Tufnel: Exactly.
Marty DiBergi: Does that mean it's louder? Is it any louder?
Nigel Tufnel: Well, it's one louder, isn't it? It's not ten. You see, most blokes, you know, will be playing at ten. You're on ten here, all the way up, all the way up, all the way up, you're on ten on your guitar. Where can you go from there? Where?
Marty DiBergi: I don't know.
Nigel Tufnel: Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
Marty DiBergi: Put it up to eleven.
Nigel Tufnel: Eleven. Exactly. One louder.
Marty DiBergi: Why don't you just make ten louder and make ten be the top number and make that a little louder?
Nigel Tufnel: [pause] These go to eleven.

Why do I like steel X? Because it's one louder!

Cliff Stamp · #8

tvenuto wrote:Awesome post, the stacked graphs really drive home the point.

Yeah, I was trying a few things and that really make the visuals obvious. Unfortunately I can't put error bars on stacked charts so I am going to have to switch to using R or another program.

Cliff Stamp · #9

Here is another round with all steels :

Here is a bit of statistics :

Code: Select all

ANOVA - Two Factor						
Alpha	0.05					
						
Groups	Count	Sum	Mean	Variance		
Column 1	4.00	30.06	7.52	33.51		
Column 2	4.00	71.74	17.94	34.36		
Column 3	4.00	31.57	7.89	49.81	P-value	F critical
Column 4	4.00	25.14	6.29	3.91	0.14	8.63
Within Groups	33.00	3.00	11.00			
Row 1	4.00	35.63	8.91	38.32	P-value	F critical
Row 2	4.00	57.19	14.30	76.13	0.00	1.72
Row 3	4.00	32.98	8.24	14.97		
Row 4	4.00	32.72	8.18	73.92		
						
Source of Variation	SS	df	MS	F	P-value	F critical
Rows	104.11	3.00	34.70	1.20	0.36	3.86
Columns	349.42	3.00	116.47	4.02	0.05	3.86
Error	260.64	9.00	28.96			
Total	714.18	15.00

There are many ways of looking at data, this is just a simple check to see if there is any difference among steels or among trials in each steel. Based on what I noted in the above you would expect to see significance among each round in a steel (that is the cardboard bias) but not from one steel to the next as there isn't enough data. This is what comes out of the statistics.

I was not planning on going past five runs, but friend is renovating a condo and I might go up to 10 runs just to see how the numbers play out.

timlara · #10

Very, very cool analysis, Cliff! I actually have some project management software that uses monte carlo simulations to help you estimate how long projects are going to take based on your performance on previous projects. It works pretty well and gets "smarter" over time.

Donut · #11

I think from the testing in this thread that I can conclude that M4 is awesome and we still need a Para in M4. :)

On Edge · #12

Cliff Stamp wrote:" ... In short, if you are trying to do edge retention comparisons, and you don't heavily control the material cut, then it will take a LOT of work for the biases to randomize out and reveal the nature of the steel."

And that is precisely what I took away from this. And little else.

I do not pretend to have anywhere near the level of knowledge on this subject as the OP and/or others in this community and while I do appreciate all the work put in to this study ... with regard to steel performance comparisons, there seems to be little about it that is conclusive other than the fact that there is little about it that is conclusive.

I am curious as to why more effort was not put in to closely controlling the material cut by each steel and potentially harvesting a more defined glimpse into steel performance ... ?

~ edge

nirvanero · #13

Donut wrote:I think from the testing in this thread that I can conclude that M4 is awesome and we still need a Para in M4. :)

You're good at summarising... :D

Cliff Stamp · #14

timlara wrote:Very, very cool analysis, Cliff! I actually have some project management software that uses monte carlo simulations to help you estimate how long projects are going to take based on your performance on previous projects.

It is unfortunate they don't teach it in basic statistics as it is very easy to do with even spreadsheets and allows some very complicated questions to be answered. For example often times here on the forums people will complain "there are too many variables!" when people talk about edge retention, however these types of issues are dealt with with the most basic of statistical knowledge. For example lets say you did a edge retention comparison and you didn't control :

-the type of cardboard
-the edge angle or grit finish
-how you cut the cardboard

etc. .

A lot of people think it is impossible to get information from such work, but that is completely false and a little MC simulation shows it to be so. Lets assume for example :

-cardboard makes a 10:1 difference
-edge angle/grit makes a 5:1 difference
-there are three other 2:1 differences

Yet if you had 50 people do such comparisons and compiled them you can easily get very accurate information because of how random errors normalize out very efficiently in large samples :

The problem is getting people to be unbiased/honest. I tried to do this a long time ago but what happens is that people filter their results. If they use the 420 knife and the edge retention was "high" they conclude it was a fluke and discard it. Similar if the VG10 edge retention was "low" they also think it was a fluke and discard it. Thus if you compile the results what you get isn't a random distribution around the actual performance of the steel, you just get what people think should happen which is often what they read/see what is promoted.

Cliff Stamp · #15

Donut wrote:I think from the testing in this thread that I can conclude that M4 is awesome ...

M4 actually had the lowest performance to date in total, but as noted the random error is to high it would take at least 10 runs to even hint that the group above 420 was different individually.

On Edge wrote:
[...]

I am curious as to why more effort was not put in to closely controlling the material cut by each steel and potentially harvesting a more defined glimpse into steel performance ... ?

The variability of the cardboard was the variable I was actually measuring, if I constrained it then the measurement would be lost. If you want to look at comparisons where I have constrained the material, then I have done many of them, such as :

Ref : http://www.cliffstamp.com/knives/forum/read.php?3,34787" target="_blank

On Edge · #16

Cliff Stamp wrote:" ... The variability of the cardboard was the variable I was actually measuring, if I constrained it then the measurement would be lost. If you want to look at comparisons where I have constrained the material, then I have done many of them, such as :

Ref : http://www.cliffstamp.com/knives/forum/read.php?3,34787" target="_blank.

Thank you.

~ edge

KevinOubre · #17

So, at least according to this data, it seems like anything less than an astronomical amount of cardboard is not really of much value in a test of edge retention unless you somehow got extremely similar groups of cardboard, which is unrealistic as pointed out above. Do you think we could then make the conclusion that cardboard is not a very suitable material to test with, and if so, what do you think could be a replacement? As a side note, I am trying to get my brother to grab me some fire blanket that he uses at his plant job and see how that stuff does on an edge. It seems like an interesting material that he has to cut on a daily basis.

Cliff Stamp · #18

KevinOubre wrote:So, at least according to this data, it seems like anything less than an astronomical amount of cardboard is not really of much value in a test of edge retention unless you somehow got extremely similar groups of cardboard, which is unrealistic as pointed out above.

Statistics to the rescue.

In the above what I am doing is taking a knife, taking a particular pile of cardboard and cutting it up. Then another knife gets another pile and these piles can be very different. This difference is systematic meaning the entire pile can be more/less damaging to the edge to cut and thus the edge retention seen is more about the cardboard then the steel.

The problem is that there is a systematic error from one type of cardboard to another, what you need to do is make this a random error because if you do then again it will normalize out. The solution is really simple, it is called using random sampling. What you do is take all the cardboard you have and make a large pile and mix it up. Now when you go to do some cutting you take some pieces at random.

This may seem like you made the problem worse but you have not because that sample, even though it will be made out of a bunch of random cardboard will be very consistent in how it effects an edge from one run to the next. When you get new cardboard you just keep adding it to the main pile. Ideally you make the main pile so large that it is basically infinite and very consistent, have it 10X the size of one run is good, 100X is great.

From time to time you can take the same knife and do runs with it and check the long term drift of your cardboard pile. In this way you can calibrate and correct for any drifting - but to be frank this is starting to get to the point you have to be pretty cereal about cardboard cutting.

In short, it isn't a hard problem - science deals with such all the time and much more complicated. It only takes a little methodology to sort it out. If anyone likes doing the cutting but not dealing with the numbers just send them to me, I will happily run the statistics on them and even correct for sampling drift and similar.

KevinOubre · #19

Cliff Stamp wrote:
KevinOubre wrote:So, at least according to this data, it seems like anything less than an astronomical amount of cardboard is not really of much value in a test of edge retention unless you somehow got extremely similar groups of cardboard, which is unrealistic as pointed out above.
Statistics to the rescue.

In the above what I am doing is taking a knife, taking a particular pile of cardboard and cutting it up. Then another knife gets another pile and these piles can be very different. This difference is systematic meaning the entire pile can be more/less damaging to the edge to cut and thus the edge retention seen is more about the cardboard then the steel.

The problem is that there is a systematic error from one type of cardboard to another, what you need to do is make this a random error because if you do then again it will normalize out. The solution is really simple, it is called using random sampling. What you do is take all the cardboard you have and make a large pile and mix it up. Now when you go to do some cutting you take some pieces at random.

This may seem like you made the problem worse but you have not because that sample, even though it will be made out of a bunch of random cardboard will be very consistent in how it effects an edge from one run to the next. When you get new cardboard you just keep adding it to the main pile. Ideally you make the main pile so large that it is basically infinite and very consistent, have it 10X the size of one run is good, 100X is great.

From time to time you can take the same knife and do runs with it and check the long term drift of your cardboard pile. In this way you can calibrate and correct for any drifting - but to be frank this is starting to get to the point you have to be pretty cereal about cardboard cutting.

In short, it isn't a hard problem - science deals with such all the time and much more complicated. It only takes a little methodology to sort it out. If anyone likes doing the cutting but not dealing with the numbers just send them to me, I will happily run the statistics on them and even correct for sampling drift and similar.

I may do that next time I do a run. I know absolutely nothing about the data compilation and statistical analysis. Guess I need to add more stuff to the research list.

Cliff Stamp · #20

The interesting thing is that there is no real requirement for precision aside from the fact that it decreases the time to get an answer. Here is a perfectly valid way to compare the edge retention in use of two knives :

-record how often you sharpen each knife, days between sharpening
-after a year calculate the average (and if desired CI)

Look at the averages and see if they are far enough apart to make it a practical difference. Now while what you cut is random from day to day, if you also change the knives on a random basis then it will balance itself out and provide a perfectly valid comparison.

The hard part is being honest and unbiased and that is a lot harder than you might expect. What you will tend to do is use what you think you know to filter the results and thus you often end up just producing data which matches what you think would happen. This is why unless you want to introduced non-biased means of filtering, your best method is no filter at all, just use all the raw data.

It is in fact trivial to extend this comparison to look at two steels. Just take all the knives you have in one steel and all the knives you have in another and do the same thing. Yes there will be random variations among the geometries and of course the HT, but again statistics to the rescue, it will all tend to balance out. After a year, if you are honest and unbiased, you are likely to produce a very interesting set of data.

Now if you want an answer in a few hours, well you have to try to constrain all the sources of error so that you can infer the behavior of one variable (the edge retention) from one other variable (the amount of material cut) and then realize that all you have is the edge retention with one particular geometry with one particular finish, cutting one type of material in one particular way.

Spyderco Forums

Edge retention slicing cardboard (15 dps, x-coarse DMT)

Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)

Re: Edge retention slicing cardboard (15 dps, x-coarse DMT)