SQL Server Data File Autogrowth Management

A SQL Server database consists of at least two types of file on disk – a transaction log file, and at least one data file where the data and indexes persist. The best practice is to have one transaction log on a disk (or disks) dedicated to database transaction logs. For data files, it is strongly recommended leave the primary filegroup for database metadata operations, and create at least one secondary filegroup. This secondary filegroup should be set to be default so new data objects are created here, and having four datafiles within this filegroup would be a good start. These four datafiles should be identically sized, perhaps placed on several disks depending on the requirements of the application. All these files should have autogrowth enabled to allow SQL Server to extend as the database grows – the autogrowth setting should be the same across all the datafiles, and be proportionate for the size of the database.

As data is put into this database, proportional fill will spread the data relatively evenly over these evenly sized datafiles. A traditional challenge for a DBA is that if autogrowth is enabled, one datafile will extend – and the other three will not, leading to imbalance. And as the extended datafile has more free space, proportional fill will place more data in that datafile, and the imbalance will grow.

In SQL Server 2016 or later, you can set SQL to autogrow all datafiles to grow together – this is set at the individual filegroup level (not datafile) within a specified database. Unfortunately, it is not exposed within Management Studio in either SQL Server 2016 or SQL Server 2017.

ALTER DATABASE TEST MODIFY FILEGROUP SECONDARY AUTOGROW_ALL_FILES

The filegroup property ‘AUTOGROW_ALL_FILES’ has been set.

Can I truncate a table that is participating in transaction replication?

Can I truncate a table that is participating in transaction replication?

If you think about it conceptually, the table exists within the publication database, and the table is expected to exist on any databases that are subscribing to the publication. SQL Server Replication works by monitoring the database transaction log for inserts, updates, and deletes on objects that are replicated, and by transmitting those changes to subscribers. Truncate is a minimally logged transaction that essentially empties the table, but leaves the schema intact.

So, if I try to truncate a table that is in a published database by running TRUNCATE TABLE TRADES, I get an error:

Msg 4711, Level 16, State 1, Line 1
Cannot truncate table ‘trades’ because it is published for replication or enabled for Change Data Capture.

If I convert the truncate into a DELETE FROM TRADES statement, this works perfectly – and is replicated to any subscribers.

If I try the same TRUNCATE TABLE TRADES statement on a subscriber database, this also works perfectly.

MongoDB Failed To Connect – Error 10061

I recently built a single MongoDB server on a Ubuntu 16.04 LTS server, and I was trying to connect to it with mongoimport from a Windows machine.

C:\Program Files\MongoDB\Server\3.2\bin>mongo –host ubuntu-mongodb
MongoDB shell version: 3.2.8
connecting to: ubuntu-mongodb:27017/test
2017-11-26T17:22:21.224+0000 W NETWORK [thread1] Failed to connect to 192.168.1.150:27017, reason: errno:10061 No connection could be made because the target machine actively refused it.
2017-11-26T17:22:21.224+0000 E QUERY [thread1] Error: couldn’t connect to server ubuntu-mongodb:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14
@(connect):1:6

A local connection was working using the mongo shell, but not remotely, so I started puzzling about firewalls either on Windows or Ubuntu. As a further test, I tried a connection from another Windows machine – a Windows 10 laptop – with the same result. The remote Ubuntu server obviously responded to pings.

Drawing a blank with firewalls, I did start to consider whether MongoDB disables remote connections by default, something that has become increasingly common with database platforms for obvious reasons. I had a quick look at the mongodb.conf file, and the local loopback IP address stood out for the bind_ip parameter:

# Listen to local interface only. Comment out to listen on all interfaces.
bind_ip=127.0.0.1

I opened the mongodb.conf file in VI, commented out the line with bind_ip – and then spend far too many minutes wondering why it wouldn’t let me save it. I’d forgotten to sudo to VI.

SQL Server Parameter Sniffing

MVP Gail Shaw has produced an excellent video tutorial on SQL Server Parameter Sniffing, available on the Pluralsight network. The video was released in August 2016, and covers SQL Server 2016. At 90 minutes long, it’s nicely concise – and it covers what parameter sniffing is, identifying and capturing parameter sniffing via techniques such as DMVs, Extended Events, and the Query Store; and naturally covers some of the options available to work

Hadoop – Getting Started

Big Data has been one of the buzzwords of the last few years, and has become a “thing” because of the enormous amounts of data being generated in recent years – primarily because of the internet. Hadoop is one of the core technologies in the Big Data space, and is certainly the starting point in any Big Data conversation. Hadoop’s components consist of the HDFS file system, MapReduce, and Yarn.

The Hadoop ecosystem includes Hive, HBase, Pig, Flume/Sqoop, Spark, and Oozie. Hive provide a SQL-like interface to Hadoop, and provides a bridge to developers who don’t have Java experience. Hbase is a NoSQL database platform. Pig is a data manipulation language, helping to transform unstructured data into a structured format. We can query this data using some like Hive. Spark is a distributed computing engine used alongside Hadooop – it’s fast, intuitive, and has a large library. Oozie is a tool to schedule workflows on all the Hadoop ecosystem components. Flume\Sqoop are tools that transfer data between other systems and Hadoop.

Three install modes of Hadoop – Standalone which is default, and runs on a single node using a single JVM process. The local file system is used for storage. HDFS and Yarn do not run as they are not required. Standalone is used to test MapReduce programs before deploying to a cluster.

Pseudo-distributed is the second install mode, and runs on a single node. However, there are two JVM processes to simulate two nodes. HDFS is used for stotage, and YARN is used to manage task. It is used as a fully fledged test environment.

Fully distributed runs on a cluster of machines. Manual configuration of a cluster is complicated, so enterprise editions are used to simplify.

Python – Error: list index out of range

I’ve been developing in Python for just over two years since 2014, so I’m still a newbie, and I hit this error recently: “Error: list index out of range”

It took a little bit longer than perhaps it should have done to figure out what I’d done wrong.

The code was something like this pseudo-function, which counts the number of even numbers passed in a list:

def count_evens(nums):
count = 0
for i in nums:
if nums[i] % 2 == 0:
count += 1
return count

Microsoft Dynamics CRM – Filtered Views and SQL Server 2014

Microsoft Dynamics CRM makes extensive use of filtered views Рthese are standard views in the database, but they prevent unauthorised users (including DBAs) from seeing sensitive data by the way they are coded. During several migrations of Microsoft Dynamics CRM from both the 2011 and 2013 versions to 2015, my project teams have hit performance issues which center on filtered views and the new Cardinality Estimator in SQL Server 2014. The Cardinality Estimator is the part of the database engine that works out how to actually do the work for a particular query, and it was largely based on a code base written for SQL Server 7.0.

SQL Server Management Studio Standard Reports

This week I found, by chance, that on a couple of servers when I tried to run a Standard Report such as Disk Usage in SQL Server Management Studio (SSMS), it would crash. Both SQL Servers were SQL Server 2014, running on top of Windows Server 2012 R2. The actual application crash was that the Visual Basic Command Line Compiler had stopped working.

I had to pause for thought – whilst I’m an experienced database specialist, I don’t normally have to troubleshoot application crashes.

I initially suspected it might be a permissions issue, but it turned out to be a bug with Windows Server 2012 R2 (together with Windows 8)

Kathi Kellenberger’s Pluralsight course : Configuring Kerberos For SSRS

It’s a relatively new course from February 2016, and Kathi explains – and demonstrates – the double-hop problem on SQL Server 2014 in a Azure lab. It’s certainly true that SSRS is the most common occurence where database specialists hit the double-hop problem, but it’s not exclusively a SSRS issue. Kathi also outlines some possible security workarounds (and makes the caveats clear), and then Kathi runs through how NTLM and Kerberos work. My overall conclusion is that this course is a good, yet relatively brief, tutorial about Kerberos for SQL Server professionals

Another reason to sign-up for Visual Studio Dev Essentials

The March update for Visual Studio Dev Essentials has announced that SQL Server 2014 Developer Edition is now free for Dev Essentials members! It was always great value for non-production development and testing, but now – it really shows how Microsoft SQL Server is really gunning for Oracle’s database customers.

Exciting news! Starting today, SQL Server 2014 Developer Edition is available as a free download for Dev Essentials members. Build almost any kind of database application and unlock insights faster with a complete business intelligence solution. SQL Server 2014 was designed to work in hybrid environments, making it easier for you to build back-up and disaster recovery solutions with Azure, as well as delivering a consistent experience across on-premises and cloud.