1. You have a class that supports to input sample records and to compute the
average of the samples. The class has two members: total and count. How
would you make the class thread-safe? If 99% of the time average() is called
, how to optimize for that?
2. Talk about your recent interesting project/bug.
3. You have 100 files, each containing 10G sorted integers. How to merge all
integers into one sorted file?
4. Write a function to reverse digits of an integer. E.g. 123 --> 321, -890
--> -98.
5. Discuss design challenges of a distributed web crawler running on
commercial PCs. How to utilize network bandwidth of those PCs efficiently?
6. How to test if an API is thread-safe or not?
7. How to convert a single threaded application to multiple threaded?
8. Given a non-negative integer array representing an elevation map whereas
the width of each bar is 1, compute how much water it can contain after
raining. For example, given [0,1,0,2,1,0,1,3,2,1,2,1], returns 6. What is
the complexity of your solution?
9. You have an API and you want to wrap it up such that it cannot be called
more than N times a second. How to do this in a multiple-threaded
environment?
10. Design a system to provide suggestions to the search box. How would you
update such a system with new search data when it is running?
11. Introduce your research.
12. Design a BigInt class to represent unlimited size of integers. Implement
a member function add(int).
13. You have two movie tickets and want to invite a friend to watch movie
together. You can invite several friends simultaneously. Each one of them
independently has probability p to accept your invitation. How many friends
should you invite to maximize the probability that exactly one friend
accepts your invitation?
谨以此感谢版上提供经验的各位前辈并顺求祝福!
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
自战解说:
1. For the 1st part of the question, I answered that read-write mutex is a possible solution and to design a good solution we'd like to know the usage pattern. Thus the 2nd part of the question and I answered that the member to input sample records can update a new member average, so that average() just needs to read and no lock is needed.
2. ...
3. I proposed two solutions: n-way merge and counting sort. Then I was asked to decide which one is better and I said counting sort, because it is easier to paralellize and uses less memory. Then I was asked to implement it but I ran out of time for the details. As a hindsight, I should make sure of these in implementation: what's the range of possible input? Any known distribution? Will my program be 32-bit or 64-bit? Is the count of some number possible to overflow? Can INT_MIN be counted correctly?
4. I came up with a recursion solution at first and changed it to an iterative version per requested. Hindsight: not all input integer can be reversed correctly and the valid input is not a simple continuous range.
5. I have little knowledge about web crawler so I asked a lot of questions. I suggested to utilize the geography information of target websites to dispatch the work load and to pool connection/job at each PC. This did not seem to be the interviewer's expectation as he told me hints about how to resolve DNS to IP addresses efficiently, how to deal with slow web sites, etc.
6. I told him thread-safety should be part of the API's interface and I cannot tell without source code. I just googled and it seems that there's no solution to this problem?
7. I told him that I would first decide if multi-thread is appropriate for the application. And then discussed the strategy of converting thread-unsafe APIs to services. Lots of detailed discussion on this problem about design choices.
8. My idea is to find the global maximum element in the array first, and then look from beginning to the global maximum, identify a "lake" after each current maximum and accumulate the water volume, and then look from the end to the global maximum in symmetry. Time complexity is O(n). Space complexity is O(1).
9. I started with a naive solution using sleep() and figured out the real requirement at that time. Then I talked about using a semaphore and to set up timer/callback to update semaphore. That's not the best solution either and the interviewer hint me to use a queue to record time stamps of the N most recent calls. I should be able to find this solution without hint.
10. A very complicated problem. We discussed browser-side/server-side work, trie (prefix tree) at letter level or word level, client side pre-fetching, how to store/build server-side data structure using lots of commercial PCs, possible data compression schemes, an estimate about how many PCs will I use according to my design, and more stuff I do not remember. For the question about how to update the system with new search trend data, I suggested to process the data offline using another set of PCs and switch live/offline set periodically. I felt quite satisfied about myself on this question.
11. ...
12. I first said that I can use a string to store each digit of the BigInt and realized that it is a dumb design when he asked me to implement it. Then I changed to use vector and a sign bit to store the BigInt. There were continuous questions on my implementation of BigInt::add(int). I am not very satisfied about my solution to this problem as I did not support adding a negative int in the time given.
13. I told him immediately that the answer is floor(1/p). But then he asked me why? I said that I remember this is binomial distribution and wrote down the formula f=n*p*(1-p)^(n-1). The interviewer still asked why and showed me that df/dn=0 does not result in n=1/p. At that time I was puzzled but still insisted that my memory should be correct. I was not able to get the correct deduction in time. So I emailed HR the solution that night and hope he would forward my solution to the interviewer per my request.
心得:本以为会被问到比较难的算法题,比如DP, string matching, suffix tree, graph algo, operations on binary tree啥的。或者OO概念,design pattern之类的设计问题。不过实战都是非常质朴而实际的问题,并没有单纯为了增加难度而出的题。白板编程一定要好好练习,听到问题后第一件事是澄清问题、确认几个测试用例、设法限制输入(assert)、明确边界条件,同时最快时间考虑出一个基本解决方案,然后再考虑优化。即使运气好碰到做过的题,也务必按照顺序给出分析思考解决优化的全过程。白板编程要注意程序篇幅的控制,根据内容多少选择恰当的字体大小;如有可能,程序中可以使用辅助函数并多半不用实现之;遇到程序分支先实现错误处理、特殊情况等短分支,再实现主体算法分支,以便控制篇幅及避免遗忘。
总体看来,G家面官准备的技术问题都颇靠谱。完全没有behavior question这一点也甚合我意。唯一不爽的是好几个面官都时间耗尽,没有机会问他们问题;倒也有几个面官比较体贴,技术问题前让我先问问题,不过他们似乎对我的问题都无准备,回答并不能使我满意。