Comments on Paul's Pontifications: When Haskell is faster than C

Guys, just to remind everyone that, as fast as C i...

2016-06-07T23:56:07.309-07:00

Guys, just to remind everyone that, as fast as C is, if you really need speed, you take the latest-and-greatest shiny new intel Fortran, Sun Studio, or PGI Group's Fortran compiler and code away. Yes, Fortran. And yes, there are ultramodern Fotran compilers still produced, and then some. And yes, they are used because short of coding in assembler, the Fortran compilers produce code closest to the hand coded assembler.

And yes, I know, this is just anecdotal evidence from someone on the Internet. Yes, I understand. But if I didn't think this important enough to mention, I wouldn't have bothered. The reason I am mentioning this is just so that everyone gets the full picture, that there are languages and compilers even faster than C.

In finite element analysis, for the most intensive computing work (solvers), we used Fortran. Matrix computations on the order of 100 million by 100 million. intel AVX optimizations to the hilt. Why wasn't C used? The assembler code generated wasn't fast enough. Last thing we were looking at before I left that field was running PGI Fortran CUDA compilers, which enabled us to compile the code into Nvidia GPU assembler. Again, not C, Fortran. Why? Speed.

2016-06-07T23:39:10.453-07:00

This comment has been removed by the author.

My last post had a couple errors. I see now why th...

2014-02-10T16:08:34.697-08:00

My last post had a couple errors. I see now why the filter is required. I also see now that GHC has a rewrite rule to replace break with breakByte when appropriate. However, the double scanning is still unfortunate (it might even be good to use readLn after setting the handle buffer size explicitly—not sure), and I think I found a bigger problem: as written, I believe the table will be rebuilt for every block, unless full laziness is enabled (a bad idea, generally). Making the table a top-level variable seems the most sensible approach.

The Haskell code here could be made more efficient...

2014-02-05T15:19:26.439-08:00

The Haskell code here could be made more efficient. The most obvious thing is that break should be replaced with breakByte. The second most obvious thing is that you scan the header text twice, first for the greater-than symbol and then for the newline. The third most obvious thing (I think...) is that the "filter out nulls" operation doesn't seem to serve any obvious purpose. If you need it, you should probably indicate why.

2013-03-15T01:44:59.920-07:00

This comment has been removed by a blog administrator.

In short, you're wrong. C is just as close to ...

2013-01-21T14:34:09.394-08:00

In short, you're wrong. C is just as close to the machine now as it was before. The only difference is that "the machine" is now actually rather complex. The machine does not offer some kind of "faster_dereference" which is faster than C's de-reference but with slightly different semantics. C's de-reference is all the machine offers. The fact that loading from main memory can now be very expensive is irrelevant.

Also, a linked list of characters? hahahahahaha. The buffer algorithm will be much faster. No wonder your performance was rubbish.

IOW, your entire article is "I don't understand CPUs, and I'm going to write some terrible code that uses the CPU very badly, and this proves my point.".

Readability *does* matter. And being more readable *is* a big advantage. Performance *isn't* everything. But you have definitely failed to demonstrate that Haskell can be equally as performant as C, and secondly, it's just a pity that Haskell is *less* readable than C, in general.

Using the extra options to ghc, I still can't ...

2013-01-19T14:29:58.378-08:00

Using the extra options to ghc, I still can't reproduce the results:

Haskell (with -O2 -fllvm and -funbox-strict-fields):

$ time ./main < in.txt > /dev/null

real 0m15.509s
user 0m14.065s
sys 0m1.316s

C (with -O2):

$ time ./a.out < in.txt > /dev/null

real 0m2.895s
user 0m2.864s
sys 0m0.024s

@Anonymous: >Does it make PHP more suitable? ...

2013-01-19T06:42:14.731-08:00

@Anonymous:

>Does it make PHP more suitable?

Maybe not "suitable", but it makes PHP certainly more usable than C. And that seems to be what this post is mostly concerned about.

Btw, fun fact: the PHP version in the benchmarks game for this benchmark does manage to beat this blogger's C version by 1.6 seconds, and it's only 30 lines long.

Its nice to see that you stacked this test on Hask...

2013-01-18T20:20:43.591-08:00

Its nice to see that you stacked this test on Haskells side by choosing the benchmark test that it is best at. I think you should face the facts that Haskell will almost never be anywhere near as fast or efficient as well written c

< less than & lt; & #60; > greater t...

2013-01-18T20:18:36.322-08:00

< less than & lt; & #60;
> greater than & gt; & #62;

remove space after &

< less than < < > greater than > &g...

2013-01-18T20:17:02.874-08:00

< less than < <
> greater than > >

" would have been a good alternative to LT GT symbols

>2. out of a lineup of n programmers picked at ...

2013-01-18T19:40:58.572-08:00

>2. out of a lineup of n programmers picked at random a sizable chunk
>wouldn't even be capable of writing I/O code in Haskell, while almost
>certainly all of them would still manage an "everyday" C solution that
>beats yours.

I don't think that this argument holds. The fact that few programmers are able to comprehend and use Haskell only means that they don't have yet that skill. It is not something that may never change. 40 years ago people capable of coding in C weren't legion either. Out of the same random chunk of programmers, how many more of them would be able to code up that programme in PHP rather than C? Does it make PHP more suitable?

I find it hard to take this post seriously, since ...

2013-01-18T18:42:05.792-08:00

I find it hard to take this post seriously, since failing to realize the impact I/O will have on the benchmark results seems like a pretty big methodological flaw. In fact microbenchmarks, pretty much by definition, try to avoid interacting with any system that is not part of the core concern of the benchmark (as much as possible)..

In fact the benchmark does explicitly say "read line-by-line", so your program probably even fails to qualify as a solution.

The motivation for the original benchmark may have been measuring the performance of data structures, but that's certainly not what you are measuring. Which brings me to the question of what exactly is being measured here?

The I/O throughput of the standard libraries of language implementations? The ease by which this implementations can be understood and utilized? I think we can both agree that on both counts Haskell doesn't have a realistic chance of beating C, regardless of the amounts of magic that the compiler may be capable of utilizing..

This brings me to my final point, which is that I fail to see how you can claim in your conclusion that "conventional wisdom is wrong", given that
1. everyday C code will most certainly not be doing character-based I/O
2. out of a lineup of n programmers picked at random a sizable chunk wouldn't even be capable of writing I/O code in Haskell, while almost certainly all of them would still manage an "everyday" C solution that beats yours.

I can not reproduce your results either. Perhaps y...

2013-01-18T17:52:11.054-08:00

I can not reproduce your results either. Perhaps you made a mistake when you were reading the results.
Also jumping to the conclusion that Haskell will be as close to as efficient as c everytime based on a single test with a single program is a big issue. In order to really be able to say that with any level of confidence you should really test several programs in both languages and do that for several different types of problems.

as a follow up to my previous comment. Since every...

2013-01-18T17:36:30.770-08:00

as a follow up to my previous comment. Since everyone is saying that it is getc/putc which is causing C to be slow, I thought I should also try redirecting to a file:

C version:

time ./a.out < in.txt > cresult

real 0m4.250s
user 0m2.888s
sys 0m1.352s

Haskell Version:

time ./main < in.txt > hsresult

real 0m16.507s
user 0m14.581s
sys 0m1.888s

As you would expect, the sys time goes up but the time spent in userspace doesn't really change.

That you can get great performance from C doesn...

2013-01-18T17:24:58.870-08:00

That you can get great performance from C doesn't mean you can't get bad performance if you work for it. Code using getc(), putc() and so on is never going to be fast. It's just poor C.

High-level languages will end up eating C's cake, but the above is not a good example of why.

I can't reproduce your results. I took the in...

2013-01-18T17:18:33.945-08:00

I can't reproduce your results.

I took the input file from the shootout page:

http://benchmarksgame.alioth.debian.org/u32/iofile.php?test=revcomp&file=input

and fixed the C code so it compiled by adding the brackets for the includes. I then compiled it with "gcc -O3" and ran it on the input. It was nearly instantaneous, so I made a new input file by repeating the input file lots of times (it is now 127 MB instead of 11 KB).

Now when I run it as:

time ./a.out < in.txt > /dev/null

I get:
real 0m2.903s
user 0m2.864s
sys 0m0.032s

For the haskell side, because I don't really know what I'm doing, I commented out the first line of the file and compile with "ghc -O" (without commenting out that line, it wouldn't link). I then run it:

time ./main < in4.txt > /dev/null

and get:

real 0m14.554s
user 0m14.433s
sys 0m0.088s

Even if I don't turn on the optimizer for the C code, I still get much faster times (a bit over 4 seconds).

Compiler versions:
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
The Glorious Glasgow Haskell Compilation System, version 7.4.2

"(aka the Shootout) ... Shootout ... Shootout...

2013-01-18T15:53:28.729-08:00

"(aka the Shootout) ... Shootout ... Shootout ... Shootout ... Shootout ... Shootout"

No, No, No, No, No, No! Shootout.

"There was no wish to be associated with or to trivialise the slaughter behind the phrase shootout so the project was renamed back on 20th April 2007."

5 years on, please use the correct name -- the benchmarks game.

Just a quick nitpick (because I'm always bugge...

2013-01-18T15:51:06.933-08:00

Just a quick nitpick (because I'm always bugged by overuse of $...): if (not $ B.null t2) then ... can just be if not (B.null t2) then ...

Excellent way of looking at it. As we tackle probl...

2013-01-18T15:09:39.691-08:00

Excellent way of looking at it. As we tackle problems of increasing complexity it is good to focus on solutions that give more correctness and maintainability while not losing much performance.

It looks Haskell compilers are going to make C go the way of assembly.

2013-01-18T14:50:33.067-08:00

This comment has been removed by the author.

Instead of "TL;DR" where you mean "...

2013-01-18T14:08:23.128-08:00

Instead of "TL;DR" where you mean "Summary" or "Abstract", why not consider "Summary" or "Abstract" in future? It will enhance the impression of competence and make your readers feel less like you're talking down to them.

You could probably make the C version about twice ...

2013-01-18T12:58:00.571-08:00

You could probably make the C version about twice as fast just by using getc_unlocked and putc_unlocked. (Which is of course not quite portable.)

It seems impossible to post comments without cookies, that's annoying. Maybe it'll work this time?