Training programmes

We have a series of training courses on Big Data Analytics delivered by Dr Simon Fong from the University of Macau.

ICBDBI 2017: International Conference on Big Data Analytics and Business Intelligence

Big Data Analytics training TR-003-004-005

The medium of instruction is English. The speaker is Dr Simon Fong from the University of Macau. The fee for the whole set of training courses TR003-004-005 is RMB 3600 (RMB 1200 per training course).

To apply please email xiaoxiao.wang@xjtlu.edu.cn. We will confirm your registration within two working days with information about making payment. We reserve the right to cancel the training due to insufficient enrolment and we will inform all registered applicants three days in advance for a full refund.

Data Mining I – Predictive Analytics TR-003

About the course

Predictive analytics is the most prevalent form of data mining central on the prediction of future outcomes and trends.

The core element is the predictor, which is a variable that can be quantitatively measured for predicting the future behaviour of a subject. For instance, a marketing company considers various predictors such as age, gender, lifestyle and past purchase records, so to decide which products are likely to be interested to its clients. Multiple predictors are used to induce a predictive model which learns the relations between predictors and outcomes. So the model can be used to predict future results when new or additional data becomes available. Predictive analytics have successful applications in many areas, ranging from business forecast, meteorology, bioinformatics, to virtual and physical security.

In this course you will be introduced to the fundamental of predictive analytics, which is often known as predictive modelling, a very popular category of data mining. The course includes two main streams of theories and applications of predictive analytics: classification and prediction. In classification, the target class to be predicted is often a label or category to which a sample of testing data should belong. In prediction, the target variable is numeric, taking into account of the predictors that contribute to this predicted outcome.

Participants will learn how to first visually explore the data, followed by statistical assessment and data-processing. Popular predictive algorithms will be introduced, such as decision trees, neural networks, Bayesian reasoning network, support vector machine, and incremental learners etc. Advanced models such as ensembles, optimised data mining models, and how different models could be combined for producing better results, will be shown. All the course contents will be supported by demonstration with examples and hands-on exercises.

Who should take this course

Data analyst, data scientists and industrial professionals, or anybody with interest in big data and data mining who want to tackle challenges of analysing large data sets and want to explore about the data mining power of predictive analytics. This course is suitable for those who need to analyse some archive of data and try make sense from them. This course is suitable for anybody that is working with applications that are designed for forecasting and prediction. Research students are encouraged to take this course too, as some novel data mining techniques published in SCI/E journals would be covered too.

Industrial or commercial professionals in areas ranging from marketing, finance, accountants, to academia, with interest in data mining are especially welcomed to take this course. This course provides you with insights and practical skills in understanding what predictive analytics can do for your organisation, and how to do it effectively. Course participants will have access to open source data mining tools with no-cost license.

Time

Three day course, six hours per day

Data Mining II – Clustering, Association Rules and Anomaly Detection TR-004

About the course

Data mining is an art and science of discovering insights from massive data. In addition to predictive modelling, other most popular techniques are clustering, association rule mining and anomaly detection. Each of these unsupervised learning techniques has been proven its efficacy in wide range of applications, from medical, security to business. Clustering groups similar data together. Association rules mining finds the pairings of data instances that most frequently appear together. Anomaly detection detects and discovers outliers from the data. They generally offer data mining results in terms of visual patterns from which users can spot anything interesting. Interesting patterns from these techniques include but not limited to: densities of data, market segmentation, hotspots, by clustering; relation links, association, causality, by association rules mining; and suspicious activities, frauds, irregularity, by anomaly detection.

In this course the participants will learn the concepts and unlock the power of unsupervised learning techniques. The course will cover some examples where supervised and unsupervised data mining techniques are integrated together, complementing one another for enhancing the performance and effects. Participants in the course will have access to a collection of open source data mining tools,
guided through in class demonstrations, and hands-on exercises. You will learn also, most importantly, how to interpret the results derived from these techniques, for decision supports.

Who should take this course

Data analyst, data scientists and industrial professionals, or anybody with interest in big data and data mining who want to tackle challenges of analysing large data sets and want to explore about the data mining powers of finding patterns, relations and unusual events. This course is suitable for those who need to analyse some archive of data and try make sense from them. This course is suitable for anybody that is working with applications that are designed for discovering interesting patterns from the data. Research students are encouraged to take this course too, as some novel data mining techniques published in SCI/E journals would be covered too.

Industrial or commercial professionals in areas ranging from marketing, finance, accountants, to academia, with interest in data mining are especially welcomed to take this course. This course provides you with insights and practical skills in understanding what clustering, association rule mining and anomaly detection can do for your organisation, and how to do them effectively. Course participants will have access to open source data mining tools with no-cost license.

Time

Three day course, six hours per day

Big Data Analytics TR-005

About the course

Big Data is a buzzword nowadays referring to both opportunities and challenges for analytics professionals who want to extract valuable insight from massive amount of data. With a new buffet of exciting modern applications and machine learning techniques recently emerged from ubiquitous computing, big data analytics play an important and central role in enabling analysis of datasets over a distributed file system called Hadoop Distributed File System (HDFS). HDFS is designed to support high throughput access to application data, therefore suitable for applications that generate very large data sets or continuous data stream. The volume, speed and variety of such big data are far greater than those that could be easily analysed in the past. New analytical algorithms are hence needed in HDFS and similar environments.

In this course the methodology and workflows for using Hadoop will be explained. Subsequent analytics including the latest machine learning algorithms and data stream mining methods will be introduced. The big data analytics will be shown via hands-on demonstrations and a series of exercises to be conducted in class. In particular, the course provides hands-on training for answering the following questions:

  • How big data analytic is done hands-on in Hadoop or similar big data environment
  • How Hadoop or other big data platform is setup and interfaced with different software components
  • How data stream mining is done from very large scale datasets in Hadoop environment or Scalable Advanced Massive Online Analysis (SAMOA) Massive Online Analysis (MOA) environment and others.

Who should take this course

Data analyst, data scientists and industrial professionals, or anybody with interest in big data and data mining who want to tackle challenges of analysing large data sets and want to explore about the distributed computing power of Hadoop and MOA environments. This course is suitable for those who need to deal with very large amount of high-speed and continuous data that come in un- or semi-structured format. Research students are encouraged to take this course too, as some novel data mining techniques published in SCI/E journals would be covered too. Course participants will have access to open source data mining tools with no-cost license.

Time

Three day course, six hours per day

About the speaker

Simon Fong graduated from La Trobe University, Australia, with a first class honours BEng Computer Systems degree and a PhD Computer Science degree in 1993 and 1998 respectively. Simon is now working as an Associate Professor at the Computer and Information Science Department of the University of Macau. He is a co-founder of the Data Analytics and Collaborative Computing Research Group in the Faculty of Science and Technology. Prior to his academic career, Simon took up various managerial and technical posts, such as systems engineer, IT consultant and e-commerce director in Australia and Asia. Dr Fong has published over 365 international conference and peer-reviewed journal papers, mostly in the areas of data mining, big data analytics, meta-heuristics optimisation algorithms, and their applications. He serves on the editorial boards of the Journal of Network and Computer Applications of Elsevier, IEEE IT Professional magazine, and various special issues of SCIE-indexed journals.