New Software Development Kit offers high-level object-oriented API for full suite of forecasting, data mining and text mining algorithms, easily used in C++, C#, Java, Python and R applications, with built-in access to Apache Spark Big data clusters.
INCLINE VILLAGE, NV – February 16, 2016 – Frontline Systems is releasing XLMiner SDK, a new Software Development Kit for data mining, text mining, forecasting, and predictive analytics. XLMiner SDK offers application developers working in C++, C#, Java, Python or R a powerful, high-level API for quickly creating applications that use predictive analytics.
“If you’re a developer, you can find lots of code on the Web, but it’s been hard to find a complete, fully supported toolkit for data mining and machine learning – especially if you’re working in C++, C# or Java,” said Daniel Fylstra, Frontline’s President and CEO. “XLMiner SDK is exactly that – a toolkit that developers can count on to build commercial-grade applications.”
From Exploration to High-Performance Applications
XLMiner SDK supports data analysis for exploration and education, as well as for high-performance production applications. Applications can range from just a few high-level lines of code, to a customized workflow where the developer can control all aspects of data analysis, machine learning, and scoring.
XLMiner SDK's easy-to-use API provides everything a developer needs for a complete application. For example, an “app” could draw a sample from Big Data stored across the cluster, use Feature Selection to determine the best inputs to a supervised algorithm, partition the data, fit a model, and score new data.
High-Level Objects Cover Full Range of Machine Learning Tasks
XLMiner SDK uses three core components: DataFrames, Models, and Estimators.
- A DataFrame represents a dataset composed of named columns of numeric, categorical, or textual data. XLMiner SDK algorithms accept DataFrames as inputs, and produce DataFrames as results. Operations on DataFrames may be easily combined to create a “pipeline” of high-level operations.
- An Estimator represents a machine learning algorithm. It accepts a DataFrame as input, and returns a Model object “trained” on the data in the DataFrame. This Model may be used to score/transform new data, compute evaluation metrics, and extract other information.
- Each Model contains core methods that accept a DataFrame as input, perform a transformation or prediction, and return a new DataFrame. For example, a Model fitted by a classification or prediction algorithm transforms a DataFrame with features into scores/predictions to predict/classify new data.
Support for Big Data and the Full Cycle of Data Mining Projects
XLMiner SDK is designed for use on ‘ordinary’ Windows PCs and servers, with conventional data files and SQL databases, but it includes an unusual capability to draw a statistically representative sample from an Apache Spark Big Data cluster, running a Frontline-supplied component on one of the cluster nodes. As Frontline has previously shown, training machine learning algorithms on an appropriate sample can be just as effective as training such algorithms on the cluster itself – and far simpler for the developer.
XLMiner SDK provides more than just predictive algorithms – it also provides powerful methods to explore, transform and “clean” data, partition data into “training” and “validation” sets, train one or more machine learning models, evaluate and compare the performance of those models, and for the “final” model, create a self-contained version in code suitable for deployment – all with the full power of the user’s preferred programming language available to control each step.
Full Support for Popular Programming Languages
XLMiner SDK provides full API support for five popular programming languages: C++ 11 or later, C# 4.0 or later, Java 8, Python 2.7 or 3.4 (both are supported), and R 3.2 with R6 classes. In most interactive development environments such as Microsoft Visual Studio and R Studio, developers will benefit from automatic recognition and “command completion” for XLMiner’s objects, properties and methods.
XLMiner SDK’s R support uses R-native types, including R’s own DataFrame type; hence it can be used easily with a wide range of R packages from CRAN. XLMiner SDK provides its own “R package” that can be loaded with one command from R itself, or an IDE such as R Studio.
For C++, C# and Java developers, XLMiner SDK should be especially welcome, since quality data mining tools have been hard to find for these popular languages. But even R and Python developers will find that XLMiner SDK offers a far better integrated, comprehensive data mining and text mining toolkit.
Powerful Text Mining Included
XLMiner SDK includes powerful text mining methods that are often found in separately priced (and expensive) text mining software, including:
- Stemming, term normalization, and vocabulary reduction
- Creation of a term-document matrix with a full range of weightings
- Concept extraction with latent semantic indexing
The results of text mining analysis are DataFrames, which may be used in any further data exploration, unsupervised, or supervised machine learning algorithm.
Data Exploration, Feature Selection, and ‘Unsupervised’ Learning
A data mining project often begins with efforts to find patterns in the data, reduce its dimensionality, and identify the variables or “features” that contribute the most to a successful classification or prediction. XLMiner SDK provides a full set of tools for:
- k-Means Clustering and Hierarchical Clustering for “unsupervised” learning
- Principal Components Analysis for dimensionality reduction
- Feature Selection with filter, wrapper and embedded methods to identify explanatory variables
Multiple Classification and Prediction Algorithms
For applications that require classification or prediction, such as customer churn prediction or fraud detection, XLMiner SDK provides powerful, high-performance implementations of the most popular machine learning algorithms, including:
- Logistic Regression, Discriminant Analysis, k-Nearest Neighbors, Naïve Bayes, Classification Trees, and Neural Networks for classification
- Multiple Linear Regression, k-Nearest Neighbors, Regression Trees, and Neural Networks for prediction
- Ensembles of “weak learners” from the above methods, combined via boosting, bagging and “random forests” to produce a better predictive model
Time Series Forecasting Methods
For time series forecasting, used in sales and inventory planning, or forecasting of financial series such as stock prices and exchange rates, XLMiner SDK provides high-equality algorithms for:
- Exponential smoothing methods, with or without seasonality
- ARIMA (Auto-Regressive Integrated Moving-Average) methods, with or without seasonality
Association Rules for Affinity Analysis
For applications such as “market basket” analysis and recommendation systems, XLMiner SDK provides powerful tools to automatically construct a model consisting of association rules, and use that model to identify opportunities or make new recommendations.
Free Trial Version Available for Download
XLMiner SDK is available now for both 32-bit and 64-bit Windows. Developers can register for a free account at https://www.solver.com, and download and install a fully-functional version of XLMiner SDK with a free 15-day trial license. Paid licenses are available at two levels of power and capacity – XLMiner SDK Pro and XLMiner SDK Platform – with several options for license terms, and for single or multiple developers.
Frontline Systems Inc. (https://www.solver.com) is the leader in analytics for spreadsheets and the web, helping managers gain insights and make better decisions for an uncertain future. Its products integrate forecasting and data mining for “predictive analytics,” Monte Carlo simulation and risk analysis, and conventional and stochastic optimization for “prescriptive analytics.” Founded in 1987, Frontline is based in Incline Village, Nevada (775-831-0300).
XLMiner® is a registered trademark of Frontline Systems, Inc.