Redian新闻
>
Quant面试『真题』系列:第三期

Quant面试『真题』系列:第三期

财经


量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据领域的主流自媒体公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。


量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列:



QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验!


往期回顾:第一期第二期



第三期


▌题目难度:Medium


题目

If your Time-Series Dataset is very long, what architecture would you use?


答案

If the dataset for time-series is very long, LSTMs are ideal for it because it can not only process single data points, but also entire sequences of data. A time-series being a sequence of data makes LSTM ideal for it.

For an even stronger representational capacity, making the LSTM's multi-layered is better.

Another method for long time-series dataset is to use CNNs to extract information.

---

▌题目难度:Medium


题目

How do you Normalise Time-Series Data?


答案

Two normalization methods that are commonly used are:

Range-based Normalization: In range-based normalization, the minimum and maximum values of the time series are determined. Let these values be denoted by min and max, respectively. Then, the time series value is mapped to the new value in the range (0,1) as follows:

Standardization: In standardization, the mean and standard deviation of the series are used for normalization. This is essentially the Z-value of the time series. Let and represent the mean and standard deviation of the values in the time series. Then, the time series value is mapped to a new value as follows:

---

▌题目难度:Hard


题目

Can Hidden Markov Models be used to model Time-Series data?


答案

Yes, any time-series can be fit using HMM, but there are some constraints:

It should follow the Markov property.

There is some variance that other models are not able to capture (in other words, the system is partially observable).

---

▌题目难度:Hard

题目

Explain briefly the different methods of Noise-Removal for Time-Series Data


答案

Noise-prone hardware, such as sensors, is often used for time-series data collection. The approach used by most of the noise removal methods is to remove short-term fluctuations.

It should be pointed out that the distinction between noise and interesting outliers is often a difficult one to make. Two methods, referred to as binning and smoothing, are often used for noise removal. 

Binning

The method of binning divides the data into time intervals of size denoted by , etc. It is assumed that the timestamps are equally spaced. Therefore, each bin is of the same size, and it contains an equal number of points.

The average value of the data points in each interval are reported as the smoothed values.

Let be the values at timestamps . Then, the new binned value will be , where
Therefore, this approach uses the mean of the values in the bins. It is also possible to use the median of the behavioral attribute values.

Moving-Average Smoothing

Moving-average methods reduce the loss in binning by using overlapping bins, over which the averages are computed.

As in the case of binning, averages are computed over windows of the time series. The main difference is that a bin is constructed starting at each timestamp in the series rather than only the timestamps at the boundaries of the bins.

Therefore, the bin intervals are chosen to be , etc. This results in a set of overlapping intervals. The time series values are averaged over each of these intervals. Moving averages are also referred to as rolling averages and they reduce the noise in the time series because of the smoothing effect of averages.

In a real-time application, the moving average becomes available only after the last timestamp of the window. Therefore, moving averages introduce lags into the analysis and also lose some points at the beginning of the series because of boundary effects. Furthermore, short-term trends are sometimes lost because of smoothing. Larger bin sizes result in greater smoothing and lag. Because of the impact of lag, it is possible for the moving average to contain troughs (or downtrends) where there are peaks (or uptrends) in the original series and vice versa. This can sometimes lead to a misleading understanding of recent trends.

---

▌题目难度:Hard

题目

What are some different ways of Trajectory Patterns Mining?


答案

There are many different ways in which the problem of trajectory pattern mining may be formulated. This is because of the natural complexity of trajectory data that allows for multiple ways of defining patterns.

Frequent Trajectory Paths

A key problem is that of determining frequent sequential paths in trajectory data. To determine the frequent sequential paths from a set of trajectories, the first step is to convert the multidimensional trajectory (with numeric coordinates) to a 1-dimensional discrete sequence. Once this conversion has been performed, any sequential pattern mining algorithm can be applied to the transformed data.

Colocation Patterns

Colocation patterns are designed to discover social connections between the trajectories of different individuals. The basic idea of colocation patterns is that individuals who frequently appear at the same point at the same time are likely to be related to one another.

Colocation pattern mining attempts to discover patterns of individuals, rather than patterns of spatial trajectory paths. Because of the complementary nature of this analysis, a vertical representation of the sequence database is particularly convenient.

---

▌题目难度:Hard


题目

Compare State-Space Models and ARIMA models


答案

ARIMA is a universal approximator - you don't care what is the true model behind your data and you use universal ARIMA diagnostic and fitting tools to approximate this model. It is like a polynomial curve fitting - you don't care what is the true function, you always can approximate it with a polynomial of some degree.

Compared to ARIMA, state-space models allow you to model more complex processes, have interpretable structure, and easily handle data irregularities; but for this, you pay with increased complexity of a model, harder calibration, less community knowledge.

Because there is such a great variety of state-space models formulations (much richer than the class of ARIMA models), the behavior of all these potential models is not well studied, and if the model you formulated is complicated - it's hard to say how it will behave under different circumstances. Of course, if your state-space model is simple or composed of interpretable components, there is no such problem.

ARIMA is always the same well studied ARIMA so it should be easier to anticipate its behavior even if you use it to approximate some complex process.

Because state-space allows you to directly and exactly model complex/nonlinear models, then for these complex/nonlinear models, you may have problems with the stability of filtering/prediction (EKF/UKF divergence, particle filter degradation). You may also have problems with calibrating a complicated model's parameters - it's a computationally hard optimization problem.

ARIMA is simple, has fewer parameters (1 noise source instead of 2 noise sources, no hidden variables) so its calibration is simpler.

---


相关阅读

干翻机器学习面试!

全程干货!Citadel在职Quant求职经验分享

G-Research:量化研究员面试『真题』

小编尽力了!G-Research量化面试『真题』答案出炉!

Quant Puzzle:高级享受!

独家!中国量化私募面试Q&A系列——鸣石投资

独家!中国量化私募面试Q&A系列——白鹭资管

Quant求职系列:Jane Street烧脑Puzzle(2019-2020)

Two Sigma:面试还是挺难(附面经)!

你能做几道?Jane Street烧脑面试题!

独家!全球顶尖对冲基金LeetCode面试题汇总

挑战Man Group!顶级对冲基金的10道Python面试题

微信扫码关注该文公众号作者

戳这里提交新闻线索和高质量文章给我们。
相关阅读
重振PointNet++雄风!PointNeXt:通过改进的模型训练和缩放策略重新审视PointNet++山东靓妞(单妈),美国圆梦冷门专业推荐系列:斯坦福大学的STS专业到底在学什么?那些记忆深刻的基金经理投资“金句”(第三期)高考后的升学选择 | 直播话《留学》第三期回顾:多种路径解析,解锁升学的多元选择早春时节犹他行 (五)Quant面试『真题』系列:第一期面试官亲授 | 惊了!Meta面试题竟有如此高的适配度!三诺生物点评:第三代CGM注册申请获得受理,商业化前景值得期待!【东吴医药朱国广团队】投出超十亿美金独角兽:以YY为例,投资者如何做出自己的判断? | GGV投资笔记第一百一十三期行者有云第三期:用数字钥匙打开创新出行新商机今晚19:30!高考后的升学选择|直播话《留学》第三期如何"借"一双慧眼看穿波动?1987年大崩盘留下的启示:"价值先生"才是同盟,请忘掉"市场先生"最大回撤超30%,"固收+"怎么变成"固收-"?昔日"爆款"遭质疑,如今如何再出发?赴华:第三国在加拿大转机赴华的注意事项(附:踢人情况|达美AA上海出港情况)高盛、麦肯锡、亚麻面试考什么?网友扒出的这400道真题全剧透了…Hiring | Real Estate Senior Accountant / Accounting Manager【全球市场】警惕金价修复相对实际利率高估—贵金属月度交易计划2022年第三期今日聚焦:第三艘航母核心技术,外国专家曾断言:绝不可能!中国,就是不信邪!校友风采丨鹏宇集团董事长朱新红采访纪:“十年——春华秋实,一起向未来”系列第三期2022年《译林》杂志第三期会议预告丨人大区域国别论坛第三期:作为中欧能源枢纽的哈萨克斯坦一张照片引发的事件重磅!上海加入全球抢人行列:世界名校生直接落“沪”?!【明日开播】“感知无源·智能悠远”——无源物联网主题直播第三期来了!马克谈天下(274) 梁建章:生命损失最小化的防疫策略基业长青系列:中国家族企业的过去、现在与未来 | 特别策划特大新闻续: 又收到新的奇怪逻辑成立5个月月销破千万,Pinpoint的"爆品公式"与"营销法则"5点之后的武汉,哪里最“sán”?二季度市场上演"V型"反转,丘栋荣、祁禾等却被"扫货"!来看"圈里人"怎么选基金昕诺医学:第三方动物实验中心,助力创新器械的科研成果转化
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。