Minified source code is especially useful for interpreted languages deployed and transmitted on the Internet (such as JavaScript), because it reduces the amount of data that needs to be transferred. Minified source code may also be used as a kind of obfuscation, though the term obfuscation may be distinguished as a form of false cryptography while a minified code instance may be reversed using a pretty-printer. In programmer culture, aiming at extremely minified source code is the purpose of recreational code golf competitions.

Minification can be distinguished from the more general concept of data compression in that the minified source can be interpreted immediately without the need for an uncompression step: the same interpreter can work with both the original as well as with the minified source.

In 2003 Douglas Crockford introduced tool JSMin, which only removed comments and whitespace. It was followed by YUI Compressor shortly thereafter. In 2009, Google opened up its Closure toolkit, including Closure Compiler which contained a source mapping feature together with a Firefox extension called Closure Inspector. In 2010, Mihai Bazon introduced UglifyJS, which was superseded by UglifyJS2 in 2012; the rewrite was to allow for source map support.

Source maps allow tools to display unminified code from minified code with an optimized "mapping" between them. The original format was created by Joseph Schorr as part of the Closure Inspector minification project. Updates as versions 2 and 3 reduced the size of the map files.

Components and libraries for Web applications and websites have been developed to optimize file requests and quicken page load times by reducing the size of various files. JavaScript and CSS resources may be minified, preserving their behavior while considerably reducing their file size. The Closure Tools project is an effort by Google engineers to open source the tools used in many of Google's sites and web applications for use by the wider Web development community. Closure Compiler compiles JavaScript into compact, high-performance code, and can perform aggressive global transformations in order to achieve high compression and advanced optimization. Other libraries available online are also capable of minification and optimization to varying degrees.

Some libraries also merge multiple script files into a single file for client download. This fosters a modular approach to development. A novel approach to server-side minification is taken by Ziproxy, a forwarding, non-caching, compressing HTTP proxy targeted for traffic optimization. It minifies and optimizes HTML, CSS, and JavaScript resources and, in addition, re-compresses pictures. Content encoding is an approach taken by compatible web servers and modern web browsers to compress HTML and related textual content, often in the gzip format. An alternative to content encoding in the server-client layer is given by the off-line CrunchMe tool, which can create self extracting JavaScript programs using the DEFLATE compression algorithm. JavaScript source maps can make code readable and more importantly debuggable even after it has been combined and minified

How do you parse and process HTML/XML in PHP?

I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup.


The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.

DOM is capable of parsing and modifying real world (broken) HTML and it can do XPath queries. It is based on libxml.

It takes some time to get productive with DOM, but that time is well worth it IMO. Since DOM is a language-agnostic interface, you'll find implementations in many languages, so if you need to change your programming language, chances are you will already know how to use that language's DOM API then.

A basic usage example can be found in Grabbing the href attribute of an A element and a general conceptual overview can be found at DOMDocument in php

How to use the DOM extension has been covered extensively on StackOverflow, so if you choose to use it, you can be sure most of the issues you run into can be solved by searching/browsing Stack Overflow.


The XMLReader extension is an XML pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

XMLReader, like DOM, is based on libxml. I am not aware of how to trigger the HTML Parser Module, so chances are using XMLReader for parsing broken HTML might be less robust than using DOM where you can explicitly tell it to use libxml's HTML Parser Module.

A basic usage example can be found at getting all values from h1 tags using php

XML Parser

This extension lets you create XML parsers and then define handlers for different XML events. Each XML parser also has a few parameters you can adjust.

The XML Parser library is also based on libxml, and implements a SAX style XML push parser. It may be a better choice for memory management than DOM or SimpleXML, but will be more difficult to work with than the pull parser implemented by XMLReader.


The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.

SimpleXML is an option when you know the HTML is valid XHTML. If you need to parse broken HTML, don't even consider SimpleXml because it will choke.

A basic usage example can be found at A simple program to CRUD node and node values of xml file and there is lots of additional examples in the PHP Manual.

3rd Party Libraries (libxml based)

If you prefer to use a 3rd-party lib, I'd suggest using a lib that actually uses DOM/libxml underneath instead of string parsing.


FluentDOM provides a jQuery-like fluent XML interface for the DOMDocument in PHP. Selectors are written in XPath or CSS (using a CSS to XPath converter). Current versions extend the DOM implementing standard interfaces and add features from the DOM Living Standard. FluentDOM can load formats like JSON, CSV, JsonML, RabbitFish and others. Can be installed via Composer.


Wa72\HtmlPageDom` is a PHP library for easy manipulation of HTML documents using It requires DomCrawler from Symfony2 components for traversing the DOM tree and extends it by adding methods for manipulating the DOM tree of HTML documents.

phpQuery (not updated for years)

phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).

Also see:


Zend_Dom provides tools for working with DOM documents and structures. Currently, we offer Zend_Dom_Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.


QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources. It implements much of the jQuery interface (including CSS-style selectors), but it is heavily tuned for server-side use. Can be installed via Composer.


fDOMDocument extends the standard DOM to use exceptions at all occasions of errors instead of PHP warnings or notices. They also add various custom methods and shortcuts for convenience and to simplify the usage of DOM.


sabre/xml is a library that wraps and extends the XMLReader and XMLWriter classes to create a simple "xml to object/array" mapping system and design pattern. Writing and reading XML is single-pass and can therefore be fast and require low memory on large xml files.


FluidXML is a PHP library for manipulating XML with a concise and fluent API. It leverages XPath and the fluent programming pattern to be fun and effective.

3rd-Party (not libxml-based)

The benefit of building upon DOM/libxml is that you get good performance out of the box because you are based on a native extension. However, not all 3rd-party libs go down this route. Some of them listed below

PHP Simple HTML DOM Parser

I generally do not recommend this parser. The codebase is horrible and the parser itself is rather slow and memory hungry. Not all jQuery Selectors (such as child selectors) are possible. Any of the libxml based libraries should outperform this easily.

PHP Html Parser

PHPHtmlParser is a simple, flexible, html parser which allows you to select tags using any css selector, like jQuery. The goal is to assiste in the development of tools which require a quick, easy way to scrap html, whether it's valid or not! This project was original supported by sunra/php-simple-html-dom-parser but the support seems to have stopped so this project is my adaptation of his previous work.

Again, I would not recommend this parser. It is rather slow with high CPU usage. There is also no function to clear memory of created DOM objects. These problems scale particularly with nested loops. The documentation itself is inaccurate and misspelled, with no responses to fixes since 14 Apr 16.


Never used it. Can't tell if it's any good.


You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like


A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.

We might see more dedicated parsers once HTML5 is finalized. There is also a blogpost by the W3's titled How-To for html 5 parsing that is worth checking out.


If you don't feel like programming PHP, you can also use web services. In general, I found very little utility for these, but that's just me and my use cases.


The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. YQL statements have a SQL-like syntax, familiar to any developer with database experience.


ScraperWiki's external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.

Regular Expressions

Last and least recommended, you can extract data from HTML with regular expressions. In general using Regular Expressions on HTML is discouraged.

Most of the snippets you will find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Tiny markup changes, like adding whitespace somewhere, or adding or changing attributes in a tag, can make the regex fails when it's not properly written. You should know what you are doing before using regex on HTML.

HTML parsers already know the syntactical rules of HTML. Regular expressions have to be taught for each new regex you write. Regex are fine in some cases, but it really depends on your use-case.

You can write more reliable parsers, but writing a complete and reliable custom parser with regular expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.

Also see Parsing Html The Cthulhu Way

Managing CSS Explosion

This is a very good question. Everywhere I look, CSS files tend to get out of control after a while - especially, but not only, when working in a team.

The following are the rules I am myself trying to adhere to (not that I always manage to.)

Building sensible classes

This is how I like to build sensible classes.

I apply global settings first:

body { font-family: .... font-size ... color ... }
a { text-decoration: none; }

Then, I identify the main sections of the page's layout - e.g. the top area, the menu, the content, and the footer. If I wrote good markup, these areas will be identical with the HTML structure.

Then, I start building CSS classes, specifying as much ancestry as possible and sensible, and grouping related classes as closely as possible.

div.content ul.table_of_contents
div.content ul.table_of_contents li
div.content ul.table_of_contents li h1
div.content ul.table_of_contents li h2
div.content ul.table_of_contents li span.pagenumber

Think of the whole CSS structure as a tree with increasingly specific definitions the further away from the root you are. You want to keep the number of classes as low as possible, and you want to repeat yourself as seldom as possible.

For example, let's say you have three levels of navigational menus. These three menus look different, but they also share certain characteristics. For example, they are all <ul>, they all have the same font size, and the items are all next to each other (as opposed to the default rendering of an ul). Also, none of the menus has any bullet points (list-style-type).

First, define the common characteristics into a class named menu:

div.navi { display: ...; list-style-type: none; list-style-image: none; }
div.navi li { float: left }

then, define the specific characteristics of each of the three menus. Level 1 is 40 pixels tall; levels 2 and 3 20 pixels.

Note: you could also use multiple classes for this but Internet Explorer 6 has problems with multiple classes, so this example uses ids.

div.navi { height: 40px; }
div.navi { height: 20px; }
div.navi { height: 16px; }

The markup for the menu will look like this:

<ul id="level1" class="menu"><li> ...... </li></ul>
<ul id="level2" class="menu"><li> ...... </li></ul>
<ul id="level3" class="menu"><li> ...... </li></ul>

If you have semantically similar elements on the page - like these three menus - try to work out the commonalities first and put them into a class; then, work out the specific properties and apply them to classes or, if you have to support Internet Explorer 6, ID's.

Miscellaneous HTML tips

If you add these semantics into your HTML output, designers can later customize the look of web sites and/or apps using pure CSS, which is a great advantage and time-saver.

Note that this assigning of multiple classes as outlined in the example above does not work properly in IE6. There is a workaround to make IE6 able to deal with multiple classes; I haven't tried it yet but looks very promising, coming from Dean Edwards. Until then, you will have to set the class that is most important to you (item number, active or first/last) or resort to using IDs. (booo IE6!)

Online code beautifier and formatter

CSS: code beautifier

HTML: HTML Tidy, CleanUp HTML or the general purpose Pretty Diff




Online SQL Formatter: Online SQL Formatter


Colour all:

What is the PastryKit Framework?

I'm trying to find any information I can on the PastryKit Javascript Framework. It appears to be in use on the iPhone User Guide that is displayed on the iPhone itself in Mobile Safari, but I cannot find any documentation or API. If you want to see it in action, open Safari 4, set your user agent to iPhone 3 (In the Develop menu) and check out the guide.

Overall, it seems to be a way to write an HTML/CSS/Javascript application that acts like a native iPhone app.

When it comes to Javascript, I used the JS Beautifier on (what I assume to be) the framework fileand it was over 3,400 lines! Beautified, (again what I assume to be) their implementation of it was over 1,200 lines.

On the CSS side, I used Clean CSS on (again what I assume to be) the framework CSS, and it came out to over 700 lines. Their implementation was shy of 500.

Does anybody have, or know where to find, any information, documentation, or APIs on PastryKit? Or, can anybody figure out how to implement it?

Concat and minify JS files in Node

I recommend using UglifyJS which is a JavaScript parser / mangler / compressor / beautifier library for NodeJS.

If you are interested in automation tools that do more than simply concatenate and minify files, there are the following solutions:

Besides this tasks there's a lot of plugins available.

How to format/tidy/beautify in JavaScript

How can I format/tidy/beautify HTML in JavaScript? I have tried doing a search/replace for angle brackets (<>) and indenting accordingly. But of course it does not take into account when the is JS or CSS etc inside the HTML.

The reason I want to do this is I have made a content editor (CMS) which has both WYSIWYG and source code views. The problem the code written by the WYSIWYG editor is normally a single line. So I would like a JavaScript that could format this into a more readable form on demand.

Grunt: Watch multiple files, Compile only Changed

npm install grunt-newer --save-dev

Then in your Gruntfile (after loading the task in grunt):

 files: 'assets/javascript/**/*.coffee'
 tasks: ["newer:coffee"]

And that's it! The Awesome grunt-newer is awesome!

Beautify Javascript and CSS in Firebug?

CSS is already beautified in Firebug, as clearly seen by comparing the CSS tab or CSS pane with the raw source.

JavaScript, alas, is not. The best you can do, for now, is to paste the code into something like 

However, if you write a Firebug extension that does this, you will have all of our gratitude. ;-)

Are there any Sass code formatters?

Paste your CSS/SCSS/LESS into and hit Clean

Via Command Line

Via command line you can re-format CSS/SCSS/Sass using the sass-convert script:

$ sass-convert messy.scss clean.scss

or CSS to SCSS:

$ sass-convert messy.css clean.scss

or SCSS to Sass:

$ sass-convert messy.scss clean.sass

The sass-convert script is installed when you install Sass. It can convert any direction between: css, scss, and sass.

Learn more about sass-convert:

Advanced user tip: Because the sass syntax is much stricter (only allowing one property: value pair per line) you're less likely to run into an issue with messy code.

What is the best IDE for PHP?

I'm a PHP developer and now I use Notepad++ for code editing, but lately I've been searching for an IDE to ease my work.

I've looked into Eclipse, Aptana Studio and several others, but I'm not really decided, they all look nice enough but a bit complicated. I'm sure it'll all get easy once I get used to it, but I don't want to waste my time.

This is what I'm looking for:

Are you sure you're looking for an IDE? The features you're describing, along with the impression of being too complicated that you got from e.g. Aptana, suggest that perhaps all you really want is a good editor with syntax highlighting and integration with some common workflow tools. For this, there are tons of options.

I've used jEdit on several platforms successfully, and that alone puts it above most of the rest (many of the IDEs are cross-platform too, but Aptana and anything Eclipse-based is going to be pretty heavy-weight, if full-featured). jEdit has ready-made plugins for everything on your list, and syntax highlighting for a wide range of languages. You can also bring up a shell in the bottom of your window, invoke scripts from within the editor, and so forth. It's not perfect (the UI is better than most Java UIs, but not perfect yet I don't think), but I've had good luck with it, and it'll be a hell of a lot simpler than Aptana/Eclipse.

That said, I do like Aptana quite a bit for web development, it does a lot of the grunt work for you once you're over the learning curve.

Are there any command line validation tools for HTML and CSS?

There is tidy for HTML. It's more than a validator: it doesn't only check if your HTML is valid, but also tries to fix it. But you can just look at the errors and warnings and ignore the fix if you want.

I'm not sure how well it works with HTML5, but take a look at Wanted: Command line HTML5 beautifier, there are some parameter suggestions.

For CSS there is CSSTidy (I have never used it though.)

Regarding the W3C validator: if you happen to use debian/ubuntu, the package w3c-markup-validator is in the repositories and very easy to install via package management. Packages for other distos are also available.

Simple HTML Pretty Print

Don't be so sure you have gotten all there is to pretty-printing HTML in so few lines. It took me a little more than a year and 2000 lines to really nail this topic. You can just use my code directly or refactor it to fit your needs: (and Github project)

You can demo it at

The reason why it takes so much code is that people really don't seem to understand or value the importance of text nodes. If you are adding new and empty text nodes during beautification then you are doing it wrong and are likely corrupting your content. Additionally, it is also really ease to screw it up the other way and remove white space from inside your content. You have to be careful about these or you will completely destroy the integrity of your document.

Also, what if your document contains CSS or JavaScript. Those should be pretty printed as well, but have very different requirements from HTML. Even HTML and XML have different requirements. Please take my word for it that this is not a simple thing to figure out. HTML Tidy has been at this for more than a decade and still screws up a lot of edge cases.

As far as I know my markup_beauty.js application is the most complete pretty-printer ever written for HTML/XML. I know that is a very bold statement, and perhaps arrogant, but so far its never been challenged. Look my code and if there is something you need that it is not doing please let me know and I will get around to adding it in.

Tools to optimize (remove redundancy and merge) CSS?

I don't particularly understand what you mean by "clean unused CSS", but in any case, I'll throw two tools at you, and maybe one will work (the good ol' shotgun approach).

CSS Lint seems to point out "duplicate properties". There are a range of articles covering some of what it does. But a test with the two definitions you had,

a { color: #fff; }
a { color: #000; }

it didn't do much of anything. While ...

Code Beautifier did combine the two selectors, opting for the latter of the two (i.e. the style that's actually applied). Resulting in:

a {

Dreamweaver extension to beautify PHP/JavaScript/jQuery code

Detect if source is CSS/HTML/JavaScript

Short answer: Almost impossible.

- Thanks to Katana's input

The reason: A valid HTML can contain JS and CSS (and it usually does). JS can contain both css and html (i.e.: var myContent = '< div >< style >CSS-Rules< script >JS Commands';). And even CSS can contain both in comments.

So writing a parser for this close to impossible. You just cannot separate them easily.

The languages have rules upon how to write them, what you want to do is reverse architect something and check whether those rules apply. That's probably not worth the effort.

Approach 1

If the requirement is worth the effort, you could try to run different parsers on the source and see if they throw errors. I.e. Java is likely to not be a valid HTML/JS/CSS but a valid Java-Code (if written properly).

Approach 2 - Thanks to Bram's input

However if you know the source very well and have the assumption that these things don't occur in your code, you could try the following with Regular Expressions.


<code><div>This div is HTML var i=32;</div></code>
<code>#thisiscss { margin: 0; padding: 0; }</code>
<code>.thisismorecss { border: 1px solid; background-color: #0044FF;}</code>
<code>function jsfunc(){ { var i = 1; i+=1;<br>}</code>


$("code").each(function() {
 code = $(this).text();
 if (code.match(/<(br|basefont|hr|input|source|frame|param|area|meta|!--|col|link|option|base|img|wbr|!DOCTYPE).*?>|<(a|abbr|acronym|address|applet|article|aside|audio|b|bdi|bdo|big|blockquote|body|button|canvas|caption|center|cite|code|colgroup|command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frameset|head|header|hgroup|h1|h2|h3|h4|h5|h6|html|i|iframe|ins|kbd|keygen|label|legend|li|map|mark|menu|meter|nav|noframes|noscript|object|ol|optgroup|output|p|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video).*?<\/\2/)) {
 $(this).after("<span>This is HTML</span>");
 else if (code.match(/(([ trn]*)([a-zA-Z-]*)([.#]{1,1})([a-zA-Z-]*)([ trn]*)+)([{]{1,1})((([ trn]*)([a-zA-Z-]*)([:]{1,1})((([ trn]*)([a-zA-Z-0-9#]*))+)[;]{1})*)([ trn]*)([}]{1,1})([ trn]*)/)) {
 $(this).after("<span>This is CSS</span>");
 else {
 $(this).after("<span>This is JS</span>");

What does it do: Parse the text.


If it contains characters like '<' followed by br (or any of the other tags above) and then '>' then it's html. (Include a check as well since you could compare numbers in js as well).


If it is made out of the pattern name(optional) followed by . or # followed by id or class followed by { you should get it from here... In the pattern above I also included possible spaces and tabs.


Else it is JS.

You could also do Regex like: If it contains '= {' or 'function...' or ' then JS. Also check further for Regular Expressions to check more clearly and/or provide white- and blacklists (like 'var' but no < or > around it, 'function(asdsd,asdsad){assads}' ..)

Bram's Start with what I continued was:

$("code").each(function() {
 code = $(this).text();
 if (code.match(/^<[^>]+>/)) {
 $(this).after("<span>This is HTML</span>");
 else if (code.match(/^(#|\.)?[^{]+{/)) {
 $(this).after("<span>This is CSS</span>");

For more Information: is a good reference. Also check for inspiration.

Angular 2 production bundle file is too large

I was trying out angular 2 the other day and I faced the same issue as you do, my vendor.js was 6M and this was a simple "Hello World" app...

I've found the following post that helped a lot in understanding how we should act on this issue (for now):

He uses several optimizing and compression techniques (precompile,treeshake, minify, bundle and gzip) on his 1.5M app to reduce its size to just 50kb.

Check it out, hope it helps! :)

EDIT: I've had a few runs with Angular since, and for me the best working approach was to use the angular-cli, which is at v1.0 when I'm writing this and when you run the build with --prod it does everything I wrote in my original post +a usual web server gzips your files. My complete site is under 1MB with this and his has a lot of functionality and also plenty third party stuff as well.

Why is there a default margin on the <body> element?

Languages are originally built to work independently. So that you could technically use that particular language for what is intended for only. In the case of HTML, it is only supposed to allow you to display something on a browser. CSS on the other hand, (and as you surely know), it is intended to create all the beautification process. So, with that in mind, Anyone should be able to write an HTML document without any CSS at all and browsers should display it in the most legible form. Now, for this to happen as consistent as possible, browsers have something called "sane defaults". These defaults cover the margin and padding on the body, some fonts, the most legible font size, etc. And they leave it up to you to overwrite as needed with CSS.

Without the margin and padding on the body, everything would be completely flushed to the browser window. That is not the best practice if you were reading a document


The links below show Firefox and Webkit CSS defaults. This will help you troubleshoot those defaults that you have no idea where they came from or whay they exist.

Sublime Text 2 Code Formatting

First let me say I come from a Microsoft background and Visual Studio is my bread and butter. It has a command (keybind is arbitrary) that auto-formats any code syntax. The same command works in HTML, CSS, Javascript, C#, etc.

I have tried plugins for ST2 and so far I've found most don't work on a Windows box and if they do, it's for a very specific purpose like just Javascript.

I have tried (and opened Issues where appropriate): (this one actually works)

Have any Windows users of ST2 found anything that works to format CSS/HTML/Javascript, preferably in one shot?

Edit: Since this question is getting lots of views with no activity, I'll say that I am still looking for a plugin that can format various script types within the same command.

October 2013 Still haven't found anything that covers JS+CSS+HTML well however I have settled on JsFormat as by far the most effective and bug free with the least amount of configuration for just JavaScript.

Beautifying a Windows Form application

You can't use CSS to style a WinForms application, but I don't think that's what you mean anyway.

As far as "beautifying" your application, there are a number of 3rd-party tools available. The most popular ones are (in no particular order):

People tend to get religious about their 3rd-party design tooling, and a lot of ink has been spilled on SO going over the benefits of each 3rd-party design tool.