I've a work in a course with mapReduce algorithem so i built an ets table in Erlang from a big data file and I would like to work on it concurrently. The table turned out to be very big and I would like to know if there is a way to split the one big table into a few smaller tables so that I can search the table concurrently using mapReduce algo, Is there any way to split one big table into sub tables??? Thnx.
Split ets in erlang
277 Views Asked by Alon Rolnik At
2
There are 2 best solutions below
0
On
I have worked on an intranet app in which i had to keep things in RAM most of the time. I created a stable caching library which helped me abstract the ETS mechanisms. In this library, i create to worker gen_servers whose work is to create, own and expose methods for ETS tables. I named them: cache1 and cache2. These two keep transferring ownership to each other in a redundant fashion in case one of them gets a problem. Get application: http://www.4shared.com/zip/z_VgKLpa/cache-10.html
Just unzip it and use the Emake file to re-compile it, and then put it into your Erlang Lib directory
To see how it works, here is a shell intraction.
F:\programming work\cache-1.0>erl -pa ebin
Eshell V5.9 (abort with ^G)
1> application:start(cache).
ok
2> rd(student,{name,age,sex}).
student
3> cache_server:new(student,set,2).
ok
4> cache_server:write(#student{name = "Muzaaya Joshua",
sex = "Male",age = (2012 - 1987) }).
ok
5> cache_server:write(student,[#student{name = "Joe",sex = "Male"},
#student{name = "Mike",sex = "Male"}]).
ok
6> cache_server:read({student,"Muzaaya Joshua"}).
[#student{name = "Muzaaya Joshua",age = 25,sex = "Male"}]
7> cache_server:read({student,"Joe"}).
[#student{name = "Joe",age = undefined,sex = "Male"}]
8> cache_server:get_tables().
[{cache1,[student]},{cache2,[]}]
9> rd(class,{class,no_of_students}).
class
10> cache_server:get_tables().
[{cache1,[student]},{cache2,[]}]
11> cache_server:new(class,set,2).
ok
12> cache_server:get_tables().
[{cache1,[student]},{cache2,[class]}]
13> cache_server:write(class,[
#class{class = "Primary " ++ integer_to_list(N),
no_of_students = random:uniform(50)} || N <- lists:seq(1,7)])
.
ok
14> cache_server:read({class,"Primary 6"}).
[#class{class = "Primary 6",no_of_students = 30}]
15> cache_server:delete({class,"Primary 2"}).
ok
16> cache_server:get_cache_state().
[{server_state,cache1,1,[student]},
{server_state,cache2,1,[class]}]
17> rd(food,{name,type,value}).
food
18> cache_server:new(food,set,2).
ok
19> cache_server:write(food,[#food{name = "Orange",
type = "fruit",value = "Vitamin C"}]).
ok
20> cache_server:get_cache_state().
[{server_state,cache1,2,[food,student]},
{server_state,cache2,1,[class]}]
21>
Now, to understand the importance of ets:give_away/3, lets see what happens when either cache1 or cache2 crashes. Remember that the current server state (which shows the current owner of a table) is:
21> cache_server:get_cache_state().
[{server_state,cache1,2,[food,student]},
{server_state,cache2,1,[class]}]
22>
Let me crash cache1 and we see.
22> gen_server:cast(cache1,stop).
ok
Cache Server: cache2 has taken over table: food from server: cache1
23>
Cache Server: cache2 has taken over table: student from server: cache1
23> cache_server:get_cache_state().
[{server_state,cache1,0,[]},
{server_state,cache2,3,[student,food,class]}]
24>
And likewise the other one:
24> gen_server:cast(cache2,stop).
ok
Cache Server: cache1 has taken over table: student from server: cache2
25>
Cache Server: cache1 has taken over table: food from server: cache2
25>
Cache Server: cache1 has taken over table: class from server: cache2
25> cache_server:get_cache_state().
[{server_state,cache1,3,[class,food,student]},
{server_state,cache2,0,[]}]
26>
Thats it ! You could use the concepts in the source code to create something on your own. The ETS tables created by that library are public and named , so you can directly access them using ETS functions.
You can search an ETS table concurrently without any need to split the table already:
http://www.erlang.org/doc/man/ets.html#new_2_read_concurrency
If the table is large, I would recommend you use a good match pattern to help reduce the search size: http://www.erlang.org/doc/man/ets.html#select-2