Tim Bray's Erlang Exercise "WideFinder" - After Find Wide
===>>> Updated Nov 7:
A new version tbray9a.erl which uses ets instead of dict, with 116 LoC, took about 3.8 sec on 5 million lines log file, 0.93 sec on 1 million lines file on my 4-core linux box.
* Results on T5120, an 8-core 1.4 GHz machine with 2 integer instruction threads per core and support for 8 thread contexts per core. Solaris thinks it sees 64 CPUs:
| Schedulers# | Elapsed(s) | User(s) | System(s) | (User+System)/Elapsed |
|---|---|---|---|---|
| 1 | 37.58 | 35.51 | 7.82 | 1.15 |
| 2 | 20.14 | 35.31 | 8.28 | 2.16 |
| 4 | 11.81 | 35.37 | 8.25 | 3.69 |
| 8 | 7.63 | 35.28 | 8.33 | 5.72 |
| 16 | 5.60 | 36.08 | 8.27 | 7.92 |
| 32 | 5.29 | 36.64 | 8.11 | 8.46 |
| 64 | 5.45 | 36.79 | 8.23 | 8.26 |
| 128 | 5.26 | 36.75 | 8.39 | 8.58 |
When schedulers was 16, (User+System)/Elapsed was 7.92, and the elapsed time reached 5.60 sec (near the best), so, the 8-core computing ability and 2 x 8 =16 integer instruction threads ability almost reached the maxima. The 8 x 8 thread contexts seemed to do less help on gaining more performance improvement.
It seems that T5120 is a box with 8-core parallel computing ability and 16-thread for scheduler?
On my 4-core linux box, the slowest time (1 scheduler) vs the fastest time (128 schedulers) was 6.627/3.763 = 1.76. On this T5120, was 37.58/5.26 = 7.14. So, with the 8-core and 16-integer-instruction-thread combination, the T5120 is pretty good on parallel computing.
An interesting point is, when schedulers increased, Elpased time dropped along with User time and System time keeping almost constancy. This may because I separated the parallelized / sequential part completely in my code.
* Results on 2.80Ghz 4-core Intel xeon linux box (5 million lines log file):
| Schedulers# | Elapsed(s) | User(s) | System(s) | (User+System)/Elapsed |
|---|---|---|---|---|
| 1 | 6.627 | 5.356 | 4.248 | 1.45 |
| 2 | 4.486 | 6.176 | 3.936 | 2.25 |
| 4 | 4.299 | 8.989 | 4.156 | 3.06 |
| 8 | 3.960 | 9.629 | 3.644 | 3.35 |
| 16 | 3.826 | 9.101 | 3.696 | 3.34 |
| 32 | 3.858 | 9.029 | 3.840 | 3.34 |
| 64 | 3.763 | 8.801 | 3.820 | 3.35 |
| 128 | 3.920 | 9.137 | 3.980 | 3.35 |
========
I'm a widefinder these days, and after weeks found wide, I worte another concise and fast widefinder tbray9.erl, which is based on Steve, Anders and my previous code, with Boyer-Moore searching (It seems Python's findall uses this algorithm) and parallelized file reading. It's in 120 LoC (a bit shorter than Fredrik Lundh's wf-6.py), took about 1 sec for 1 million lines log file, and 5.2 sec for 5 million lines on my 4-core linux box. Got 5.29 sec on T5120 per Tim's testing.
To evaluate:
erlc -smp tbray9.erl erl -smp +A 1024 +h 10240 -noshell -run tbray9 start o1000k.ap
BTW since I use parallelized io heavily, by adding flag +A 1024, the code can get 4.3 sec for 5 million lines log file on my 4-core linux box.
Binary efficiency in Erlang is an interesting topic, except some tips I talked about in previouse blog, it seems also depending on binary size, memory size etc., The best buffer size for my code seems to be around 20000k to 80000k, which is the best range on my 4-core linux box and T5120, but it may vary for different code.
Note: There is a maximum element size limit of 2^27 - 1 (about 131072k) for binary pattern matching in current Erlang, this would be consistent with using a 32-bit word to store the size value (with 4 of those bits used for a type identifier and 1 bit for a sign indicator) (For this topic, please see Philip Robinson's blog). So, the buffer size can not be great than 131072k.
I. Boyer-Moore searching
Thanks to Steve and Anders, they've given out a concise Boyer-Moore searching algorithm in Erlang, I can modify it a bit to get a general BM searching module for ASCII encoded binary:
%% Boyer-Moore searching on ASCII encoded binary
-module(bmsearch).
-export([compile/1, match/3]).
-record(bmCtx, {pat, len, tab}).
compile(Str) ->
Len = length(Str),
Default = dict:from_list([{C, Len} || C <- lists:seq(1, 255)]),
Dict = set_shifts(Str, Len, 1, Default),
Tab = list_to_tuple([Pos || {_, Pos} <- lists:sort(dict:to_list(Dict))]),
#bmCtx{pat = lists:reverse(Str), len = Len, tab = Tab}.
set_shifts([], _, _, Dict) -> Dict;
set_shifts([C|T], StrLen, Pos, Dict) ->
set_shifts(T, StrLen, Pos + 1, dict:store(C, StrLen - Pos, Dict)).
%% @spec match(Bin, Start, #bmCtx) -> {true, Len} | {false, SkipLen}
match(Bin, S, #bmCtx{pat=Pat, len=Len, tab=Tab}) ->
match_1(Bin, S + Len - 1, Pat, Len, Tab, 0).
match_1(Bin, S, [C|T], Len, Tab, Count) ->
<<_:S/binary, C1, _/binary>> = Bin,
case C1 of
C ->
match_1(Bin, S - 1, T, Len, Tab, Count + 1);
_ ->
case element(C1, Tab) of
Len -> {false, Len};
Shift when Shift =< Count -> {false, 1};
Shift -> {false, Shift - Count}
end
end;
match_1(_, _, [], Len, _, _) -> {true, Len}.
Usage:
> Pattern = bmsearch:compile("is a").
{bmCtx,"a si",4,{4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,...}}
> Bin = <<"this is a test">>.
<<"this is a test">>
> bmsearch:match(Bin, 1, Pattern).
{false,1}
> bmsearch:match(Bin, 5, Pattern).
{true, 4}
> bmsearch:match(Bin, 7, Pattern).
{false, 4}
II. Reading file in parallel and scan
To read file in parallel, we should open a new file handle for each process. To resolve the line break bound, we just split each chunk to first line (Head), line-bounded Data, and last line (Tail), and mark Head, Tail with the serial number as {I * 10, Head}, {I * 10 + 1 Tail}, so we can join all pending segments (Head and Tail of each chunk) later in proper order.
scan_file({FileName, Size, I, BmCtx}) ->
{ok, File} = file:open(FileName, [raw, binary]),
{ok, Bin} = file:pread(File, Size * I, Size),
file:close(File),
HeadL = split_on_first_newline(Bin),
TailS = split_on_last_newline(Bin),
DataL = TailS - HeadL,
<<Head:HeadL/binary, Data:DataL/binary, Tail/binary>> = Bin,
{scan_chunk({Data, BmCtx}), {I * 10, Head}, {I * 10 + 1, Tail}}.
III. Spawn workers
Luke's spawn_worker are two small functions, they are very useful, stable and good abstract on processing workers:
spawn_worker(Parent, F, A) ->
erlang:spawn_monitor(fun() -> Parent ! {self(), F(A)} end).
wait_result({Pid, Ref}) ->
receive
{'DOWN', Ref, _, _, normal} -> receive {Pid, Result} -> Result end;
{'DOWN', Ref, _, _, Reason} -> exit(Reason)
end.
So, I can start a series of workers simply by:
read_file(FileName, Size, ProcN, BmCtx) ->
[spawn_worker(self(), fun scan_file/1, {FileName, Size, I, BmCtx})
|| I <- lists:seq(0, ProcN - 1)].
And then collect the results by:
Results = [wait_result(Worker) || Worker <- read_file(FileName, ?BUFFER_SIZE, ProcN1, BmCtx)],
In this case, returning Results is a list of {Dict, {Seq1, Head}, {Seq2, Tail}}
IV. Concat segments to a binary and scan it
Each chunk is slipt to Head (first line), line-bounded Data, and Tail (last line). The Head and Tail segments are pending for further processing. After all workers finished scanning Data (got a Dict), we can finally sort these pending segments by SeqNum, concat and scan them in main process.
Unzip the results to Dicts and Segs, sort Segs by SeqNum:
{Dicts, Segs} = lists:foldl(fun ({Dict, Head, Tail}, {Dicts, Segs}) ->
{[Dict | Dicts], [Head, Tail | Segs]}
end, {[], []}, Results),
Segs1 = [Seg || {_, Seg} <- lists:keysort(1, Segs)],
The sorted Segments is a list of binary, list_to_binary/1 can concat them to one binary efficently, you do not need to care about if it's a deep list, they will be flatten automatically:
Dict = scan_chunk({list_to_binary(Segs1), BmCtx}),
Conclution
Thinking in parallel is fairly simple in Erlang, right? Most of the code can run in one process, and if you want to spawn them for parallel, you do not need to modify the code too much. In this case, scan_file/4 is sequential, which just return all you want, you can spawn a lot of workers which do scan_file/4 work, then collect the results later. That's all.
![(please configure the [header_logo] section in trac.ini)](/chrome/!97aa87b5/site/your_project_logo.png)
rss
Comments
Hi, Caoyuan, Since AIOTrader topic has closed I write here, sorry. And sorry for my English. I've just download and tried AIOTrader and impressed by it. I can do a little Smalltalk and want to use AIOTrader as a user defined indicator data representing tool. Is it possible for AIOTrader to open an API that would support UDI data exchange/importing and maybe quote data? I can provide with end-of-day quotes and related indicator data (daily, weekly, even quarterly, separated). In this way, customer needs no Java skills to add any indicators with no limitation. I've looking for such kind of free software for a long time but not found.
emptist,
But I don't know what is UDI data.
hi, I'm sorry but it's User Defined Indicator.
Say, I don't know any Java, but I can do some other language or I just have the data from a friend then may it be possible and easy for you to open some API(for programmer of other languages and even worse, perhaps not that skillful) and some Importing approach to let them watch the indicator data in your wonderful AIOTrader? The user may or may not want to import his own quote data, too.
For example, we have stock 000001.sz, shenzhen dev. bank in other words, and user want to display a home made indicator, and user can provide it as a file or as through API call to AIOTrader.
Say the data knows these: a indicator name, a wanted display color, a display object type(line or stick line or signal, etc,), and of course the data table, maybe separated daily table, weekly table, etc. such as: daily data: date .............value 20050101.....20.05 20050102.....20.15
monthly data: date ............value
quarterly data: etc. and he might want to supply with his own quote data, maybe also as separated daily, monthly, etc tables.
In short use AIOTrader to display but not to develop. Will you think this none sense?
I'm asking since AIOTrader is open-sourced, and you are so open-minded and generous in many ways :)
Sorry for bothering.
hi, User Defined Data, sir. Sorry.
In short, APIs(for programmer) and Importing tools(for programmer and non-programmer) for using AIOTrader more as a ready data presenting platform but not a indicator developing platform.
This is very useful for those who can't program in Java and has already his own developing language.
I've been looking for this kind of platform for a long time but failed to find any.
Sorry again. It should have been Use Defined Indicator, UDI.
In fact I typed correctly in my previous response but the post is too long for your spam remover.....
Hi, I'm back to see if there's new response. In your another post I learned that now you're still focusing on features but not APIs. Well, I'm not an expert but in my situation I can export User defined Indicator data files or exchange data through web services (SOAP). Both will be platform independent. Thanks :)
Hi emptist,
AIOTrade currently supports import data from csv file. But for other options your mentioned, will not be supported in near future.
Not only the API is not stable yet, but also I'm thinking new architecture for AIOTrade.