A Closer Look at Android RunTime (ART) in Android L# MobileDevelopment - 移动开发
c*z
1 楼
This is something I am working on and would like to hear if you have any
clue.
Say we have millions of product names, such as "Xbox 360", "Playstation 4",
etc.
We want to extract (tokenize) meaningful information from billions of URLs (
click history), and want to distinguish the 360 in "Xbox 360" (useful) and
the 360 in session ids (garbage).
For example, given
www.amazon.com/nike/running-shoes%09mens/buy?q=abc&x=123&ref=hello%09there
The first 09 is size (keep) and the second 09 is garbage (drop)
We want: amazon nike running shoes 09 mens buy hello there; but we want to
drop: abc 123, as well as the second 09
Due to the size of the data, manually checking the names is impossible. Does
anyone have a clue?
I am thinking about hashing table, but that means the parsing time raises
from O(1) to O(N), and N is millions!
Thanks!
clue.
Say we have millions of product names, such as "Xbox 360", "Playstation 4",
etc.
We want to extract (tokenize) meaningful information from billions of URLs (
click history), and want to distinguish the 360 in "Xbox 360" (useful) and
the 360 in session ids (garbage).
For example, given
www.amazon.com/nike/running-shoes%09mens/buy?q=abc&x=123&ref=hello%09there
The first 09 is size (keep) and the second 09 is garbage (drop)
We want: amazon nike running shoes 09 mens buy hello there; but we want to
drop: abc 123, as well as the second 09
Due to the size of the data, manually checking the names is impossible. Does
anyone have a clue?
I am thinking about hashing table, but that means the parsing time raises
from O(1) to O(N), and N is millions!
Thanks!