贴几个参考的课程描述
统计和ml
With exponential increases in the amount of data becoming available in
fields such as finance and biology, and on the web, there is an ever-greater
need for methods to detect interesting patterns in that data, and classify
novel data points based on curated data sets. Statistical machine learning
and evolutionary computation provide the means to perform this analysis
automatically, and in doing so to enhance understanding of general processes
or to predict future events.
Topics covered will include: association rules, clustering, instance-based
learning, statistical learning, evolutionary algorithms, swarm intelligence,
neural networks, numeric prediction, weakly supervised classification,
discretisation, feature selection and classifier combination.
This subject is intended to introduce graduate students to machine learning
though a mixture of theoretical methods and hands-on practical experience in
applying those methods to real-world problems
dp
Declarative programming languages provide elegant and powerful programming
paradigms which every programmer should know. This subject presents
declarative programming languages and techniques.
nosql
Many applications require access to very large amounts of data. These
applications often require reliability (data must not be lost even in the
presence of hardware failures), and the ability to retrieve and process the
data very efficiently.
The subject will cover the technologies used in advanced database systems.
Topics covered will include: transactions, including concurrency,
reliability (the ACID properties) and performance; and indexing of both
structured and unstructured data. The subject will also cover additional
topics such as: uncertain data; Xquery; the Semantic Web and the Resource
Description Framework; dataspaces and data provenance; datacentres; and data
archiving.
cloud
The growing popularity of the Internet along with the availability of
powerful computers and high-speed networks as low-cost commodity components
are changing the way we do parallel and distributed computing (PDC). The PDC
on local-area-networks is called "cluster computing " and wide-area
networks is called "grid computing" . Clusters employ cost-effective
commodity components for building powerful computers within local-area
networks, and Grids allow to share and aggregate geographically distributed
resources. Recently, “cloud computing” emerged as the new paradigm for
delivery of computing as services in a pay-as-you-go-model via the Internet.
This revolutionary new paradigm has its roots, and therefore shares many
characteristics, with grids.
Some examples of scientific and industrial applications that use these
computing platforms are: system simulations, weather forecasting, climate
prediction, automobile modelling and design, high-energy physics, movie
rendering, business intelligence, bigdata computing, and delivering various
business and consumer applications on a pay-as-you-go basis.
This subject will enable students to understand these technologies, its
goals, characteristics, and limitations, and develop both middleware
supporting them and scalable applications supported by these platforms.
This subject is an elective subject in the Master of Information Technology
and a mandatory for the Distributed Computing Specialisation. It can also be
taken as an Advanced Elective subject in the Master of Engineering (
Software).
streaming
AIMS
With exponential growth in data generated from sensor data streams, search
engines, spam filters, medical services, online analysis of financial data
streams, and so forth, there is demand for fast monitoring and storage of
huge amounts of data in real-time. Traditional technologies were not aimed
to such fast streams of data. Usually they required data to be stored and
indexed before it could be processed.
Stream computing was created to tackle those problems that require
processing and classification of continuous, high volume of data streams. It
is highly used on applications such as Twitter, Facebook, High Frequency
Trading and so forth.
The Stream computing course will interest students who want to learn more
about real-time processing and its applications. It will be taught both from
atheoretical and practical point of view. The course will cover underlying
fundamentals of stream processing systems, particularly architectural issues
and algorithms for stream processing, mining and analysis. It will also
include tutorials on how to develop and deploy applications into platforms
such as IBM InfoSphere Streams®.
INDICATIVE CONTENT
Why stream processing is important
Data streams model
Data streams algorithms: Sampling, sketching, distinct items, frequent items
, etc.
Data streams synopses: Histograms, sketches, wavelets, etc.
Stream processing platforms: Infosphere Streams, Storm, Spark Streaming, etc.
Data streams mining: Classification, clustering, etc.